From 130981d1be1664cd26650b29bc5dfd72658b4467 Mon Sep 17 00:00:00 2001 From: Rene Cannao Date: Sat, 17 Jan 2026 13:28:13 +0000 Subject: [PATCH] feat: Add SECURITY and META agents to multi-agent discovery Expand the 4-agent system to 6 agents (5 analysis + 1 meta) with enhanced security analysis and self-improving prompt optimization. New Agents: - SECURITY: Identifies sensitive data (PII, credentials, financial), assesses access patterns, identifies vulnerabilities, and provides compliance assessment (GDPR, PCI-DSS) - META: Analyzes report quality by section, identifies gaps, suggests specific prompt improvements for future runs Protocol Changes: - Expanded from 4 rounds to 5 rounds - Round 5 is Meta Analysis (META agent only) - META agent does not participate in rounds 1-4 New Report Sections: - 5. SECURITY ANALYSIS with data classification (PUBLIC/INTERNAL/ CONFIDENTIAL/RESTRICTED) - E. Security data classification appendix New Output: - Separate META ANALYSIS document with: - Section quality ratings (depth, completeness) - Specific prompt improvement suggestions - Gap identification - Evolution history tracking This enables continuous prompt optimization through multiple discovery iterations, with each run informing improvements for the next. --- .../ClaudeCode_Headless/README.md | 91 +++++++--- .../headless_db_discovery.py | 10 +- .../prompts/multi_agent_discovery_prompt.md | 161 ++++++++++++++++-- 3 files changed, 223 insertions(+), 39 deletions(-) diff --git a/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/README.md b/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/README.md index 248c37307..7112d778d 100644 --- a/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/README.md +++ b/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/README.md @@ -4,14 +4,15 @@ Multi-agent database discovery system for comprehensive analysis through MCP (Mo ## Overview -This directory contains scripts for running **4-agent collaborative database discovery** in headless (non-interactive) mode using Claude Code. +This directory contains scripts for running **6-agent collaborative database discovery** in headless (non-interactive) mode using Claude Code. **Key Features:** -- **4 Collaborating Agents:** STRUCTURAL, STATISTICAL, SEMANTIC, QUERY -- **4-Round Protocol:** Blind exploration → Pattern recognition → Hypothesis testing → Final synthesis +- **6 Agents (5 Analysis + 1 Meta):** STRUCTURAL, STATISTICAL, SEMANTIC, QUERY, SECURITY, META +- **5-Round Protocol:** Blind exploration → Pattern recognition → Hypothesis testing → Final synthesis → Meta analysis - **MCP Catalog Collaboration:** Agents share findings via catalog - **Comprehensive Reports:** Structured markdown with health scores and prioritized recommendations -- **Evidence-Based:** 15+ hypothesis validations with direct database evidence +- **Evidence-Based:** 20+ hypothesis validations with direct database evidence +- **Self-Improving:** META agent analyzes report quality and suggests prompt improvements ## Quick Start @@ -46,36 +47,44 @@ python ./headless_db_discovery.py --verbose ## Multi-Agent Discovery Architecture -### The 4 Agents +### The 6 Agents -| Agent | Focus | Key MCP Tools | -|-------|-------|---------------| -| **STRUCTURAL** | Schemas, tables, relationships, indexes, constraints | `list_schemas`, `list_tables`, `describe_table`, `get_constraints`, `suggest_joins` | -| **STATISTICAL** | Data distributions, quality, anomalies | `table_profile`, `sample_rows`, `column_profile`, `sample_distinct`, `run_sql_readonly` | -| **SEMANTIC** | Business domain, entities, rules, terminology | `sample_rows`, `sample_distinct`, `run_sql_readonly` | -| **QUERY** | Index efficiency, query patterns, optimization | `describe_table`, `explain_sql`, `suggest_joins`, `run_sql_readonly` | +| Agent | Type | Focus | Key MCP Tools | +|-------|------|-------|---------------| +| **STRUCTURAL** | Analysis | Schemas, tables, relationships, indexes, constraints | `list_schemas`, `list_tables`, `describe_table`, `get_constraints`, `suggest_joins` | +| **STATISTICAL** | Analysis | Data distributions, quality, anomalies | `table_profile`, `sample_rows`, `column_profile`, `sample_distinct`, `run_sql_readonly` | +| **SEMANTIC** | Analysis | Business domain, entities, rules, terminology | `sample_rows`, `sample_distinct`, `run_sql_readonly` | +| **QUERY** | Analysis | Index efficiency, query patterns, optimization | `describe_table`, `explain_sql`, `suggest_joins`, `run_sql_readonly` | +| **SECURITY** | Analysis | Sensitive data, access patterns, vulnerabilities | `sample_rows`, `sample_distinct`, `column_profile`, `run_sql_readonly` | +| **META** | Meta | Report quality analysis, prompt improvement suggestions | `catalog_search`, `catalog_get` (reads findings) | -### 4-Round Protocol +### 5-Round Protocol 1. **Round 1: Blind Exploration** (Parallel) - - All 4 agents explore independently + - All 5 analysis agents explore independently - Each discovers patterns without seeing others' findings - Findings written to MCP catalog 2. **Round 2: Pattern Recognition** (Collaborative) - - All agents read each other's findings via `catalog_search` + - All 5 analysis agents read each other's findings via `catalog_search` - Identify cross-cutting patterns and anomalies - Collaborative analysis documented 3. **Round 3: Hypothesis Testing** (Validation) - - Each agent validates 3-4 specific hypotheses + - Each analysis agent validates 3-4 specific hypotheses - Results documented with PASS/FAIL/MIXED and evidence - - 15+ hypothesis validations total + - 20+ hypothesis validations total 4. **Round 4: Final Synthesis** - - All findings synthesized into comprehensive report + - All 5 analysis agents synthesize findings into comprehensive report - Written to MCP catalog and local file +5. **Round 5: Meta Analysis** (META agent only) + - META agent reads the complete final report + - Analyzes each section for depth, completeness, quality + - Identifies gaps and suggests prompt improvements + - Writes separate meta-analysis document to MCP catalog + ## What Gets Discovered ### 1. Structural Analysis @@ -108,6 +117,32 @@ python ./headless_db_discovery.py --verbose - Query pattern identification - Optimization recommendations with expected improvements +### 5. Security Analysis +- **Sensitive Data Identification:** + - PII: names, emails, phone numbers, SSN, addresses + - Credentials: passwords, API keys, tokens + - Financial data: credit cards, bank accounts + - Health data: medical records +- **Access Pattern Analysis:** + - Overly permissive schemas + - Missing row-level security +- **Vulnerability Assessment:** + - SQL injection vectors + - Weak authentication patterns + - Missing encryption indicators +- **Compliance Assessment:** + - GDPR indicators (personal data) + - PCI-DSS indicators (payment data) + - Data retention patterns +- **Data Classification:** + - PUBLIC, INTERNAL, CONFIDENTIAL, RESTRICTED + +### 6. Meta Analysis +- Report quality assessment by section (depth, completeness) +- Gap identification (what was missed) +- Prompt improvement suggestions for future runs +- Evolution history tracking + ## Output Format The generated report includes: @@ -117,9 +152,9 @@ The generated report includes: ## Executive Summary - Database identity (system type, purpose, scale) -- Critical findings (top 3) +- Critical findings (top 5 - one from each agent) - Health score: current X/10 → potential Y/10 -- Top 3 recommendations (prioritized) +- Top 5 recommendations (prioritized) ## 1. STRUCTURAL ANALYSIS - Schema inventory @@ -145,10 +180,17 @@ The generated report includes: - Optimization opportunities - Expected improvements -## 5. CRITICAL FINDINGS +## 5. SECURITY ANALYSIS +- Sensitive data identification +- Access pattern analysis +- Vulnerability assessment +- Compliance indicators +- Security recommendations + +## 6. CRITICAL FINDINGS - Each with: description, impact quantification, root cause, remediation -## 6. RECOMMENDATIONS ROADMAP +## 7. RECOMMENDATIONS ROADMAP - URGENT: [actions with impact/effort] - HIGH: [actions] - MODERATE: [actions] @@ -159,8 +201,15 @@ The generated report includes: - B. Query examples with EXPLAIN - C. Statistical distributions - D. Business glossary +- E. Security data classification ``` +Additionally, a separate **META ANALYSIS** document is generated with: +- Section quality ratings (depth, completeness) +- Specific prompt improvement suggestions +- Gap identification +- Evolution history + ## Command-Line Options | Option | Short | Description | Default | diff --git a/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/headless_db_discovery.py b/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/headless_db_discovery.py index 21393f213..2a9fecff9 100755 --- a/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/headless_db_discovery.py +++ b/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/headless_db_discovery.py @@ -258,19 +258,23 @@ Examples: Environment Variables: CLAUDE_PATH Path to claude executable -The discovery uses a 4-agent collaborative approach: +The discovery uses a 6-agent collaborative approach: - STRUCTURAL: Schemas, tables, relationships, indexes, constraints - STATISTICAL: Data distributions, quality, anomalies - SEMANTIC: Business domain, entities, rules, terminology - QUERY: Index efficiency, query patterns, optimization + - SECURITY: Sensitive data, access patterns, vulnerabilities + - META: Report quality analysis, prompt improvement suggestions -Agents collaborate through 4 rounds: - 1. Blind Exploration (independent discovery) +Agents collaborate through 5 rounds: + 1. Blind Exploration (5 analysis agents, independent discovery) 2. Pattern Recognition (cross-agent collaboration) 3. Hypothesis Testing (validation with evidence) 4. Final Synthesis (comprehensive report) + 5. Meta Analysis (META agent analyzes report quality) Findings are shared via MCP catalog and output as a structured markdown report. +The META agent also generates a separate meta-analysis document with prompt improvement suggestions. """ ) diff --git a/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/prompts/multi_agent_discovery_prompt.md b/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/prompts/multi_agent_discovery_prompt.md index 1f52f804b..38d87ae7d 100644 --- a/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/prompts/multi_agent_discovery_prompt.md +++ b/scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/prompts/multi_agent_discovery_prompt.md @@ -1,7 +1,7 @@ # Database Discovery - Concise System Prompt ## Mission -Perform comprehensive database discovery through 4 collaborating subagents using ONLY MCP server tools (`mcp__proxysql-stdio__*`). Output: Single comprehensive markdown report. +Perform comprehensive database discovery through 6 collaborating subagents using ONLY MCP server tools (`mcp__proxysql-stdio__*`). Output: Single comprehensive markdown report. ## Agent Roles @@ -11,28 +11,41 @@ Perform comprehensive database discovery through 4 collaborating subagents using | **STATISTICAL** | Data distributions, quality, anomalies | `table_profile`, `sample_rows`, `column_profile`, `sample_distinct`, `run_sql_readonly` | | **SEMANTIC** | Business domain, entities, rules, terminology | `sample_rows`, `sample_distinct`, `run_sql_readonly` | | **QUERY** | Index efficiency, query patterns, optimization | `describe_table`, `explain_sql`, `suggest_joins`, `run_sql_readonly` | +| **SECURITY** | Sensitive data, access patterns, vulnerabilities | `sample_rows`, `sample_distinct`, `column_profile`, `run_sql_readonly` | +| **META** | Report quality analysis, prompt improvement suggestions | `catalog_search`, `catalog_get` (reads all findings) | -## 4-Round Protocol +## 5-Round Protocol ### Round 1: Blind Exploration (Parallel) -- Launch all 4 agents simultaneously +- Launch all 5 analysis agents simultaneously (STRUCTURAL, STATISTICAL, SEMANTIC, QUERY, SECURITY) - Each explores independently using their tools -- Write findings to catalog: `kind="structural|statistical|semantic|query"`, `key="round1_*"` +- Write findings to catalog: `kind="structural|statistical|semantic|query|security"`, `key="round1_*"` +- META agent does NOT participate in this round ### Round 2: Collaborative Analysis -- All agents read each other's findings via `catalog_search` +- All 5 analysis agents read each other's findings via `catalog_search` - Identify cross-cutting patterns and anomalies - Write collaborative findings: `kind="collaborative_round2"` +- META agent does NOT participate in this round ### Round 3: Hypothesis Testing -- Each agent validates 3-4 specific hypotheses +- Each of the 5 analysis agents validates 3-4 specific hypotheses - Document: hypothesis, test method, result (PASS/FAIL), evidence - Write: `kind="validation_round3"` +- META agent does NOT participate in this round ### Round 4: Final Synthesis -- Synthesize ALL findings into comprehensive report +- All 5 analysis agents collaborate to synthesize findings into comprehensive report - Write: `kind="final_report"`, `key="comprehensive_database_discovery_report"` - Also create local file: `database_discovery_report.md` +- META agent does NOT participate in this round + +### Round 5: Meta Analysis (META Agent Only) +- META agent reads the complete final report from catalog +- Analyzes each section for depth, completeness, and quality +- Identifies gaps, missed opportunities, or areas for improvement +- Suggests specific prompt improvements for future discovery runs +- Write: `kind="meta_analysis"`, `key="prompt_improvement_suggestions"` ## Report Structure (Required) @@ -41,9 +54,9 @@ Perform comprehensive database discovery through 4 collaborating subagents using ## Executive Summary - Database identity (system type, purpose, scale) -- Critical findings (top 3) +- Critical findings (top 5 - one from each agent) - Health score: current X/10 → potential Y/10 -- Top 3 recommendations (prioritized) +- Top 5 recommendations (prioritized, one from each agent) ## 1. STRUCTURAL ANALYSIS - Schema inventory (tables, columns, indexes) @@ -69,10 +82,18 @@ Perform comprehensive database discovery through 4 collaborating subagents using - Optimization opportunities (prioritized) - Expected improvements -## 5. CRITICAL FINDINGS +## 5. SECURITY ANALYSIS +- Sensitive data identification (PII, credentials, financial data) +- Access pattern analysis (overly permissive schemas) +- Vulnerability assessment (SQL injection vectors, weak auth) +- Data encryption needs +- Compliance considerations (GDPR, PCI-DSS, etc.) +- Security recommendations (prioritized) + +## 6. CRITICAL FINDINGS - Each with: description, impact quantification, root cause, remediation -## 6. RECOMMENDATIONS ROADMAP +## 7. RECOMMENDATIONS ROADMAP - URGENT: [actions with impact/effort] - HIGH: [actions] - MODERATE: [actions] @@ -83,8 +104,113 @@ Perform comprehensive database discovery through 4 collaborating subagents using - B. Query examples with EXPLAIN - C. Statistical distributions - D. Business glossary +- E. Security data classification ``` +## META Agent Output Format + +The META agent should produce a separate meta-analysis document: + +```markdown +# META ANALYSIS: Prompt Improvement Suggestions + +## Section Quality Assessment + +| Section | Depth (1-10) | Completeness (1-10) | Gaps Identified | +|---------|--------------|---------------------|-----------------| +| Executive Summary | ?/10 | ?/10 | ... | +| Structural | ?/10 | ?/10 | ... | +| Statistical | ?/10 | ?/10 | ... | +| Semantic | ?/10 | ?/10 | ... | +| Query | ?/10 | ?/10 | ... | +| Security | ?/10 | ?/10 | ... | +| Critical Findings | ?/10 | ?/10 | ... | +| Recommendations | ?/10 | ?/10 | ... | + +## Specific Improvement Suggestions + +### For Next Discovery Run +1. **[Agent]**: Add analysis of [specific area] + - Reason: [why this would improve discovery] + - Suggested prompt addition: [exact text] + +2. **[Agent]**: Enhance [existing analysis] with [additional detail] + - Reason: [why this is needed] + - Suggested prompt addition: [exact text] + +### Missing Analysis Areas +- [Area not covered by any agent] +- [Another missing area] + +### Over-Analysis Areas +- [Area that received excessive attention relative to value] + +## Prompt Evolution History +- v1.0: Initial 4-agent system (STRUCTURAL, STATISTICAL, SEMANTIC, QUERY) +- v1.1: Added SECURITY agent (5 analysis agents) +- v1.1: Added META agent for prompt optimization (6 agents total, 5 rounds) + +## Overall Quality Score: X/10 + +[Brief summary of overall discovery quality and main improvement areas] +``` + +## Agent-Specific Instructions + +### SECURITY Agent Instructions +The SECURITY agent must: +1. Identify sensitive data columns: + - Personal Identifiable Information (PII): names, emails, phone numbers, SSN, addresses + - Credentials: passwords, API keys, tokens, certificates + - Financial data: credit cards, bank accounts, transaction amounts + - Health data: medical records, diagnoses, treatments + - Other sensitive: internal notes, confidential business data + +2. Assess access patterns: + - Tables without proper access controls + - Overly permissive schema designs + - Missing row-level security patterns + +3. Identify vulnerabilities: + - SQL injection vectors (text columns concatenated in queries) + - Weak authentication patterns (plaintext passwords) + - Missing encryption indicators + - Exposed sensitive data in column names + +4. Compliance assessment: + - GDPR indicators (personal data presence) + - PCI-DSS indicators (payment data presence) + - Data retention patterns + - Audit trail completeness + +5. Classify data by sensitivity level: + - PUBLIC: Non-sensitive data + - INTERNAL: Business data not for public + - CONFIDENTIAL: Sensitive business data + - RESTRICTED: Highly sensitive (legal, financial, health) + +### META Agent Instructions +The META agent must: +1. Read the complete final report from `catalog_get(kind="final_report", key="comprehensive_database_discovery_report")` +2. Read all agent findings from all rounds using `catalog_search` +3. For each report section, assess: + - Depth: How deep was the analysis? (1=superficial, 10=exhaustive) + - Completeness: Did they cover all relevant aspects? (1=missed a lot, 10=comprehensive) + - Actionability: Are recommendations specific and implementable? (1=vague, 10=very specific) + - Evidence: Are claims backed by data? (1=assertions only, 10=full evidence) + +4. Identify gaps: + - What was NOT analyzed that should have been? + - What analysis was superficial that could be deeper? + - What recommendations are missing or vague? + +5. Suggest prompt improvements: + - Be specific about what to ADD to the prompt + - Provide exact text that could be added + - Explain WHY each improvement would help + +6. Rate overall quality and provide summary + ## Quality Standards | Dimension | Score (0-10) | @@ -94,6 +220,8 @@ Perform comprehensive database discovery through 4 collaborating subagents using | Index Coverage | Primary keys, FKs, functional indexes | | Query Performance | Join efficiency, aggregation speed | | Data Integrity | FK constraints, unique constraints, checks | +| Security Posture | Sensitive data protection, access controls | +| Overall Discovery | Synthesis of all dimensions | ## Catalog Usage @@ -113,10 +241,11 @@ catalog_get(kind="agent_type", key="specific_id") Use `TodoWrite` to track rounds: ```python TodoWrite([ - {"content": "Round 1: Blind exploration", "status": "in_progress"}, + {"content": "Round 1: Blind exploration (5 agents)", "status": "in_progress"}, {"content": "Round 2: Pattern recognition", "status": "pending"}, {"content": "Round 3: Hypothesis testing", "status": "pending"}, - {"content": "Round 4: Final synthesis", "status": "pending"} + {"content": "Round 4: Final synthesis", "status": "pending"}, + {"content": "Round 5: Meta analysis", "status": "pending"} ]) ``` @@ -127,12 +256,14 @@ TodoWrite([ 3. **SPECIFIC RECOMMENDATIONS**: Provide exact SQL for all changes 4. **QUANTIFIED IMPACT**: Include expected improvements with numbers 5. **PRIORITIZED**: Always prioritize (URGENT → HIGH → MODERATE → LOW) +6. **CONSTRUCTIVE META**: META agent provides actionable, specific improvements ## Output Locations 1. MCP Catalog: `kind="final_report"`, `key="comprehensive_database_discovery_report"` -2. Local file: `database_discovery_report.md` (use Write tool) +2. MCP Catalog: `kind="meta_analysis"`, `key="prompt_improvement_suggestions"` +3. Local file: `database_discovery_report.md` (use Write tool) --- -**Begin discovery now. Launch all 4 agents for Round 1.** +**Begin discovery now. Launch all 5 analysis agents for Round 1.**