feat: Add SECURITY and META agents to multi-agent discovery

Expand the 4-agent system to 6 agents (5 analysis + 1 meta) with enhanced security analysis and self-improving prompt optimization. New Agents: - SECURITY: Identifies sensitive data (PII, credentials, financial), assesses access patterns, identifies vulnerabilities, and provides compliance assessment (GDPR, PCI-DSS) - META: Analyzes report quality by section, identifies gaps, suggests specific prompt improvements for future runs Protocol Changes: - Expanded from 4 rounds to 5 rounds - Round 5 is Meta Analysis (META agent only) - META agent does not participate in rounds 1-4 New Report Sections: - 5. SECURITY ANALYSIS with data classification (PUBLIC/INTERNAL/ CONFIDENTIAL/RESTRICTED) - E. Security data classification appendix New Output: - Separate META ANALYSIS document with: - Section quality ratings (depth, completeness) - Specific prompt improvement suggestions - Gap identification - Evolution history tracking This enables continuous prompt optimization through multiple discovery iterations, with each run informing improvements for the next.
4 months ago · 130981d1be
parent 82d7f0c87f
commit 130981d1be
3 changed files with 223 additions and 39 deletions
--- a/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/README.md
+++ b/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/README.md
@ -4,14 +4,15 @@ Multi-agent database discovery system for comprehensive analysis through MCP (Mo

 ## Overview

-This directory contains scripts for running **4-agent collaborative database discovery** in headless (non-interactive) mode using Claude Code.
+This directory contains scripts for running **6-agent collaborative database discovery** in headless (non-interactive) mode using Claude Code.

 **Key Features:**
- **4 Collaborating Agents:** STRUCTURAL, STATISTICAL, SEMANTIC, QUERY
- **4-Round Protocol:** Blind exploration → Pattern recognition → Hypothesis testing → Final synthesis
+- **6 Agents (5 Analysis + 1 Meta):** STRUCTURAL, STATISTICAL, SEMANTIC, QUERY, SECURITY, META
+- **5-Round Protocol:** Blind exploration → Pattern recognition → Hypothesis testing → Final synthesis → Meta analysis
 - **MCP Catalog Collaboration:** Agents share findings via catalog
 - **Comprehensive Reports:** Structured markdown with health scores and prioritized recommendations
- **Evidence-Based:** 15+ hypothesis validations with direct database evidence
+- **Evidence-Based:** 20+ hypothesis validations with direct database evidence
+- **Self-Improving:** META agent analyzes report quality and suggests prompt improvements

 ## Quick Start

@ -46,36 +47,44 @@ python ./headless_db_discovery.py --verbose

 ## Multi-Agent Discovery Architecture

-### The 4 Agents
+### The 6 Agents

-| Agent | Focus | Key MCP Tools |
-|-------|-------|---------------|
-| **STRUCTURAL** | Schemas, tables, relationships, indexes, constraints | `list_schemas`, `list_tables`, `describe_table`, `get_constraints`, `suggest_joins` |
-| **STATISTICAL** | Data distributions, quality, anomalies | `table_profile`, `sample_rows`, `column_profile`, `sample_distinct`, `run_sql_readonly` |
-| **SEMANTIC** | Business domain, entities, rules, terminology | `sample_rows`, `sample_distinct`, `run_sql_readonly` |
-| **QUERY** | Index efficiency, query patterns, optimization | `describe_table`, `explain_sql`, `suggest_joins`, `run_sql_readonly` |
+| Agent | Type | Focus | Key MCP Tools |
+|-------|------|-------|---------------|
+| **STRUCTURAL** | Analysis | Schemas, tables, relationships, indexes, constraints | `list_schemas`, `list_tables`, `describe_table`, `get_constraints`, `suggest_joins` |
+| **STATISTICAL** | Analysis | Data distributions, quality, anomalies | `table_profile`, `sample_rows`, `column_profile`, `sample_distinct`, `run_sql_readonly` |
+| **SEMANTIC** | Analysis | Business domain, entities, rules, terminology | `sample_rows`, `sample_distinct`, `run_sql_readonly` |
+| **QUERY** | Analysis | Index efficiency, query patterns, optimization | `describe_table`, `explain_sql`, `suggest_joins`, `run_sql_readonly` |
+| **SECURITY** | Analysis | Sensitive data, access patterns, vulnerabilities | `sample_rows`, `sample_distinct`, `column_profile`, `run_sql_readonly` |
+| **META** | Meta | Report quality analysis, prompt improvement suggestions | `catalog_search`, `catalog_get` (reads findings) |

-### 4-Round Protocol
+### 5-Round Protocol

 1. **Round 1: Blind Exploration** (Parallel)
-   - All 4 agents explore independently
+   - All 5 analysis agents explore independently
   - Each discovers patterns without seeing others' findings
   - Findings written to MCP catalog

 2. **Round 2: Pattern Recognition** (Collaborative)
-   - All agents read each other's findings via `catalog_search`
+   - All 5 analysis agents read each other's findings via `catalog_search`
   - Identify cross-cutting patterns and anomalies
   - Collaborative analysis documented

 3. **Round 3: Hypothesis Testing** (Validation)
-   - Each agent validates 3-4 specific hypotheses
+   - Each analysis agent validates 3-4 specific hypotheses
   - Results documented with PASS/FAIL/MIXED and evidence
-   - 15+ hypothesis validations total
+   - 20+ hypothesis validations total

 4. **Round 4: Final Synthesis**
-   - All findings synthesized into comprehensive report
+   - All 5 analysis agents synthesize findings into comprehensive report
   - Written to MCP catalog and local file

+5. **Round 5: Meta Analysis** (META agent only)
+   - META agent reads the complete final report
+   - Analyzes each section for depth, completeness, quality
+   - Identifies gaps and suggests prompt improvements
+   - Writes separate meta-analysis document to MCP catalog
+
 ## What Gets Discovered

 ### 1. Structural Analysis
@ -108,6 +117,32 @@ python ./headless_db_discovery.py --verbose
 - Query pattern identification
 - Optimization recommendations with expected improvements

+### 5. Security Analysis
+- **Sensitive Data Identification:**
+  - PII: names, emails, phone numbers, SSN, addresses
+  - Credentials: passwords, API keys, tokens
+  - Financial data: credit cards, bank accounts
+  - Health data: medical records
+- **Access Pattern Analysis:**
+  - Overly permissive schemas
+  - Missing row-level security
+- **Vulnerability Assessment:**
+  - SQL injection vectors
+  - Weak authentication patterns
+  - Missing encryption indicators
+- **Compliance Assessment:**
+  - GDPR indicators (personal data)
+  - PCI-DSS indicators (payment data)
+  - Data retention patterns
+- **Data Classification:**
+  - PUBLIC, INTERNAL, CONFIDENTIAL, RESTRICTED
+
+### 6. Meta Analysis
+- Report quality assessment by section (depth, completeness)
+- Gap identification (what was missed)
+- Prompt improvement suggestions for future runs
+- Evolution history tracking
+
 ## Output Format

 The generated report includes:
@ -117,9 +152,9 @@ The generated report includes:

 ## Executive Summary
 - Database identity (system type, purpose, scale)
- Critical findings (top 3)
+- Critical findings (top 5 - one from each agent)
 - Health score: current X/10 → potential Y/10
- Top 3 recommendations (prioritized)
+- Top 5 recommendations (prioritized)

 ## 1. STRUCTURAL ANALYSIS
 - Schema inventory
@ -145,10 +180,17 @@ The generated report includes:
 - Optimization opportunities
 - Expected improvements

-## 5. CRITICAL FINDINGS
+## 5. SECURITY ANALYSIS
+- Sensitive data identification
+- Access pattern analysis
+- Vulnerability assessment
+- Compliance indicators
+- Security recommendations
+
+## 6. CRITICAL FINDINGS
 - Each with: description, impact quantification, root cause, remediation

-## 6. RECOMMENDATIONS ROADMAP
+## 7. RECOMMENDATIONS ROADMAP
 - URGENT: [actions with impact/effort]
 - HIGH: [actions]
 - MODERATE: [actions]
@ -159,8 +201,15 @@ The generated report includes:
 - B. Query examples with EXPLAIN
 - C. Statistical distributions
 - D. Business glossary
+- E. Security data classification
 ```

+Additionally, a separate **META ANALYSIS** document is generated with:
+- Section quality ratings (depth, completeness)
+- Specific prompt improvement suggestions
+- Gap identification
+- Evolution history
+
 ## Command-Line Options

 | Option | Short | Description | Default |
--- a/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/headless_db_discovery.py
+++ b/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/headless_db_discovery.py
@ -258,19 +258,23 @@ Examples:
 Environment Variables:
  CLAUDE_PATH    Path to claude executable

-The discovery uses a 4-agent collaborative approach:
+The discovery uses a 6-agent collaborative approach:
  - STRUCTURAL: Schemas, tables, relationships, indexes, constraints
  - STATISTICAL: Data distributions, quality, anomalies
  - SEMANTIC: Business domain, entities, rules, terminology
  - QUERY: Index efficiency, query patterns, optimization
+  - SECURITY: Sensitive data, access patterns, vulnerabilities
+  - META: Report quality analysis, prompt improvement suggestions

-Agents collaborate through 4 rounds:
-  1. Blind Exploration (independent discovery)
+Agents collaborate through 5 rounds:
+  1. Blind Exploration (5 analysis agents, independent discovery)
  2. Pattern Recognition (cross-agent collaboration)
  3. Hypothesis Testing (validation with evidence)
  4. Final Synthesis (comprehensive report)
+  5. Meta Analysis (META agent analyzes report quality)

 Findings are shared via MCP catalog and output as a structured markdown report.
+The META agent also generates a separate meta-analysis document with prompt improvement suggestions.
        """
    )

--- a/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/prompts/multi_agent_discovery_prompt.md
+++ b/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/prompts/multi_agent_discovery_prompt.md
@ -1,7 +1,7 @@
 # Database Discovery - Concise System Prompt

 ## Mission
-Perform comprehensive database discovery through 4 collaborating subagents using ONLY MCP server tools (`mcp__proxysql-stdio__*`). Output: Single comprehensive markdown report.
+Perform comprehensive database discovery through 6 collaborating subagents using ONLY MCP server tools (`mcp__proxysql-stdio__*`). Output: Single comprehensive markdown report.

 ## Agent Roles

@ -11,28 +11,41 @@ Perform comprehensive database discovery through 4 collaborating subagents using
 | **STATISTICAL** | Data distributions, quality, anomalies | `table_profile`, `sample_rows`, `column_profile`, `sample_distinct`, `run_sql_readonly` |
 | **SEMANTIC** | Business domain, entities, rules, terminology | `sample_rows`, `sample_distinct`, `run_sql_readonly` |
 | **QUERY** | Index efficiency, query patterns, optimization | `describe_table`, `explain_sql`, `suggest_joins`, `run_sql_readonly` |
+| **SECURITY** | Sensitive data, access patterns, vulnerabilities | `sample_rows`, `sample_distinct`, `column_profile`, `run_sql_readonly` |
+| **META** | Report quality analysis, prompt improvement suggestions | `catalog_search`, `catalog_get` (reads all findings) |

-## 4-Round Protocol
+## 5-Round Protocol

 ### Round 1: Blind Exploration (Parallel)
- Launch all 4 agents simultaneously
+- Launch all 5 analysis agents simultaneously (STRUCTURAL, STATISTICAL, SEMANTIC, QUERY, SECURITY)
 - Each explores independently using their tools
- Write findings to catalog: `kind="structural|statistical|semantic|query"`, `key="round1_*"`
+- Write findings to catalog: `kind="structural|statistical|semantic|query|security"`, `key="round1_*"`
+- META agent does NOT participate in this round

 ### Round 2: Collaborative Analysis
- All agents read each other's findings via `catalog_search`
+- All 5 analysis agents read each other's findings via `catalog_search`
 - Identify cross-cutting patterns and anomalies
 - Write collaborative findings: `kind="collaborative_round2"`
+- META agent does NOT participate in this round

 ### Round 3: Hypothesis Testing
- Each agent validates 3-4 specific hypotheses
+- Each of the 5 analysis agents validates 3-4 specific hypotheses
 - Document: hypothesis, test method, result (PASS/FAIL), evidence
 - Write: `kind="validation_round3"`
+- META agent does NOT participate in this round

 ### Round 4: Final Synthesis
- Synthesize ALL findings into comprehensive report
+- All 5 analysis agents collaborate to synthesize findings into comprehensive report
 - Write: `kind="final_report"`, `key="comprehensive_database_discovery_report"`
 - Also create local file: `database_discovery_report.md`
+- META agent does NOT participate in this round
+
+### Round 5: Meta Analysis (META Agent Only)
+- META agent reads the complete final report from catalog
+- Analyzes each section for depth, completeness, and quality
+- Identifies gaps, missed opportunities, or areas for improvement
+- Suggests specific prompt improvements for future discovery runs
+- Write: `kind="meta_analysis"`, `key="prompt_improvement_suggestions"`

 ## Report Structure (Required)

@ -41,9 +54,9 @@ Perform comprehensive database discovery through 4 collaborating subagents using

 ## Executive Summary
 - Database identity (system type, purpose, scale)
- Critical findings (top 3)
+- Critical findings (top 5 - one from each agent)
 - Health score: current X/10 → potential Y/10
- Top 3 recommendations (prioritized)
+- Top 5 recommendations (prioritized, one from each agent)

 ## 1. STRUCTURAL ANALYSIS
 - Schema inventory (tables, columns, indexes)
@ -69,10 +82,18 @@ Perform comprehensive database discovery through 4 collaborating subagents using
 - Optimization opportunities (prioritized)
 - Expected improvements

-## 5. CRITICAL FINDINGS
+## 5. SECURITY ANALYSIS
+- Sensitive data identification (PII, credentials, financial data)
+- Access pattern analysis (overly permissive schemas)
+- Vulnerability assessment (SQL injection vectors, weak auth)
+- Data encryption needs
+- Compliance considerations (GDPR, PCI-DSS, etc.)
+- Security recommendations (prioritized)
+
+## 6. CRITICAL FINDINGS
 - Each with: description, impact quantification, root cause, remediation

-## 6. RECOMMENDATIONS ROADMAP
+## 7. RECOMMENDATIONS ROADMAP
 - URGENT: [actions with impact/effort]
 - HIGH: [actions]
 - MODERATE: [actions]
@ -83,8 +104,113 @@ Perform comprehensive database discovery through 4 collaborating subagents using
 - B. Query examples with EXPLAIN
 - C. Statistical distributions
 - D. Business glossary
+- E. Security data classification
 ```

+## META Agent Output Format
+
+The META agent should produce a separate meta-analysis document:
+
+```markdown
+# META ANALYSIS: Prompt Improvement Suggestions
+
+## Section Quality Assessment
+
+| Section | Depth (1-10) | Completeness (1-10) | Gaps Identified |
+|---------|--------------|---------------------|-----------------|
+| Executive Summary | ?/10 | ?/10 | ... |
+| Structural | ?/10 | ?/10 | ... |
+| Statistical | ?/10 | ?/10 | ... |
+| Semantic | ?/10 | ?/10 | ... |
+| Query | ?/10 | ?/10 | ... |
+| Security | ?/10 | ?/10 | ... |
+| Critical Findings | ?/10 | ?/10 | ... |
+| Recommendations | ?/10 | ?/10 | ... |
+
+## Specific Improvement Suggestions
+
+### For Next Discovery Run
+1. **[Agent]**: Add analysis of [specific area]
+   - Reason: [why this would improve discovery]
+   - Suggested prompt addition: [exact text]
+
+2. **[Agent]**: Enhance [existing analysis] with [additional detail]
+   - Reason: [why this is needed]
+   - Suggested prompt addition: [exact text]
+
+### Missing Analysis Areas
+- [Area not covered by any agent]
+- [Another missing area]
+
+### Over-Analysis Areas
+- [Area that received excessive attention relative to value]
+
+## Prompt Evolution History
+- v1.0: Initial 4-agent system (STRUCTURAL, STATISTICAL, SEMANTIC, QUERY)
+- v1.1: Added SECURITY agent (5 analysis agents)
+- v1.1: Added META agent for prompt optimization (6 agents total, 5 rounds)
+
+## Overall Quality Score: X/10
+
+[Brief summary of overall discovery quality and main improvement areas]
+```
+
+## Agent-Specific Instructions
+
+### SECURITY Agent Instructions
+The SECURITY agent must:
+1. Identify sensitive data columns:
+   - Personal Identifiable Information (PII): names, emails, phone numbers, SSN, addresses
+   - Credentials: passwords, API keys, tokens, certificates
+   - Financial data: credit cards, bank accounts, transaction amounts
+   - Health data: medical records, diagnoses, treatments
+   - Other sensitive: internal notes, confidential business data
+
+2. Assess access patterns:
+   - Tables without proper access controls
+   - Overly permissive schema designs
+   - Missing row-level security patterns
+
+3. Identify vulnerabilities:
+   - SQL injection vectors (text columns concatenated in queries)
+   - Weak authentication patterns (plaintext passwords)
+   - Missing encryption indicators
+   - Exposed sensitive data in column names
+
+4. Compliance assessment:
+   - GDPR indicators (personal data presence)
+   - PCI-DSS indicators (payment data presence)
+   - Data retention patterns
+   - Audit trail completeness
+
+5. Classify data by sensitivity level:
+   - PUBLIC: Non-sensitive data
+   - INTERNAL: Business data not for public
+   - CONFIDENTIAL: Sensitive business data
+   - RESTRICTED: Highly sensitive (legal, financial, health)
+
+### META Agent Instructions
+The META agent must:
+1. Read the complete final report from `catalog_get(kind="final_report", key="comprehensive_database_discovery_report")`
+2. Read all agent findings from all rounds using `catalog_search`
+3. For each report section, assess:
+   - Depth: How deep was the analysis? (1=superficial, 10=exhaustive)
+   - Completeness: Did they cover all relevant aspects? (1=missed a lot, 10=comprehensive)
+   - Actionability: Are recommendations specific and implementable? (1=vague, 10=very specific)
+   - Evidence: Are claims backed by data? (1=assertions only, 10=full evidence)
+
+4. Identify gaps:
+   - What was NOT analyzed that should have been?
+   - What analysis was superficial that could be deeper?
+   - What recommendations are missing or vague?
+
+5. Suggest prompt improvements:
+   - Be specific about what to ADD to the prompt
+   - Provide exact text that could be added
+   - Explain WHY each improvement would help
+
+6. Rate overall quality and provide summary
+
 ## Quality Standards

 | Dimension | Score (0-10) |
@ -94,6 +220,8 @@ Perform comprehensive database discovery through 4 collaborating subagents using
 | Index Coverage | Primary keys, FKs, functional indexes |
 | Query Performance | Join efficiency, aggregation speed |
 | Data Integrity | FK constraints, unique constraints, checks |
+| Security Posture | Sensitive data protection, access controls |
+| Overall Discovery | Synthesis of all dimensions |

 ## Catalog Usage

@ -113,10 +241,11 @@ catalog_get(kind="agent_type", key="specific_id")
 Use `TodoWrite` to track rounds:
 ```python
 TodoWrite([
-    {"content": "Round 1: Blind exploration", "status": "in_progress"},
+    {"content": "Round 1: Blind exploration (5 agents)", "status": "in_progress"},
    {"content": "Round 2: Pattern recognition", "status": "pending"},
    {"content": "Round 3: Hypothesis testing", "status": "pending"},
-    {"content": "Round 4: Final synthesis", "status": "pending"}
+    {"content": "Round 4: Final synthesis", "status": "pending"},
+    {"content": "Round 5: Meta analysis", "status": "pending"}
 ])
 ```

@ -127,12 +256,14 @@ TodoWrite([
 3. **SPECIFIC RECOMMENDATIONS**: Provide exact SQL for all changes
 4. **QUANTIFIED IMPACT**: Include expected improvements with numbers
 5. **PRIORITIZED**: Always prioritize (URGENT → HIGH → MODERATE → LOW)
+6. **CONSTRUCTIVE META**: META agent provides actionable, specific improvements

 ## Output Locations

 1. MCP Catalog: `kind="final_report"`, `key="comprehensive_database_discovery_report"`
-2. Local file: `database_discovery_report.md` (use Write tool)
+2. MCP Catalog: `kind="meta_analysis"`, `key="prompt_improvement_suggestions"`
+3. Local file: `database_discovery_report.md` (use Write tool)

 ---

-**Begin discovery now. Launch all 4 agents for Round 1.**
+**Begin discovery now. Launch all 5 analysis agents for Round 1.**