feat: Add SECURITY and META agents to multi-agent discovery

Expand the 4-agent system to 6 agents (5 analysis + 1 meta) with
enhanced security analysis and self-improving prompt optimization.

New Agents:
- SECURITY: Identifies sensitive data (PII, credentials, financial),
  assesses access patterns, identifies vulnerabilities, and provides
  compliance assessment (GDPR, PCI-DSS)
- META: Analyzes report quality by section, identifies gaps,
  suggests specific prompt improvements for future runs

Protocol Changes:
- Expanded from 4 rounds to 5 rounds
- Round 5 is Meta Analysis (META agent only)
- META agent does not participate in rounds 1-4

New Report Sections:
- 5. SECURITY ANALYSIS with data classification (PUBLIC/INTERNAL/
  CONFIDENTIAL/RESTRICTED)
- E. Security data classification appendix

New Output:
- Separate META ANALYSIS document with:
  - Section quality ratings (depth, completeness)
  - Specific prompt improvement suggestions
  - Gap identification
  - Evolution history tracking

This enables continuous prompt optimization through multiple discovery
iterations, with each run informing improvements for the next.
pull/5318/head
Rene Cannao 3 months ago
parent 82d7f0c87f
commit 130981d1be

@ -4,14 +4,15 @@ Multi-agent database discovery system for comprehensive analysis through MCP (Mo
## Overview
This directory contains scripts for running **4-agent collaborative database discovery** in headless (non-interactive) mode using Claude Code.
This directory contains scripts for running **6-agent collaborative database discovery** in headless (non-interactive) mode using Claude Code.
**Key Features:**
- **4 Collaborating Agents:** STRUCTURAL, STATISTICAL, SEMANTIC, QUERY
- **4-Round Protocol:** Blind exploration → Pattern recognition → Hypothesis testing → Final synthesis
- **6 Agents (5 Analysis + 1 Meta):** STRUCTURAL, STATISTICAL, SEMANTIC, QUERY, SECURITY, META
- **5-Round Protocol:** Blind exploration → Pattern recognition → Hypothesis testing → Final synthesis → Meta analysis
- **MCP Catalog Collaboration:** Agents share findings via catalog
- **Comprehensive Reports:** Structured markdown with health scores and prioritized recommendations
- **Evidence-Based:** 15+ hypothesis validations with direct database evidence
- **Evidence-Based:** 20+ hypothesis validations with direct database evidence
- **Self-Improving:** META agent analyzes report quality and suggests prompt improvements
## Quick Start
@ -46,36 +47,44 @@ python ./headless_db_discovery.py --verbose
## Multi-Agent Discovery Architecture
### The 4 Agents
### The 6 Agents
| Agent | Focus | Key MCP Tools |
|-------|-------|---------------|
| **STRUCTURAL** | Schemas, tables, relationships, indexes, constraints | `list_schemas`, `list_tables`, `describe_table`, `get_constraints`, `suggest_joins` |
| **STATISTICAL** | Data distributions, quality, anomalies | `table_profile`, `sample_rows`, `column_profile`, `sample_distinct`, `run_sql_readonly` |
| **SEMANTIC** | Business domain, entities, rules, terminology | `sample_rows`, `sample_distinct`, `run_sql_readonly` |
| **QUERY** | Index efficiency, query patterns, optimization | `describe_table`, `explain_sql`, `suggest_joins`, `run_sql_readonly` |
| Agent | Type | Focus | Key MCP Tools |
|-------|------|-------|---------------|
| **STRUCTURAL** | Analysis | Schemas, tables, relationships, indexes, constraints | `list_schemas`, `list_tables`, `describe_table`, `get_constraints`, `suggest_joins` |
| **STATISTICAL** | Analysis | Data distributions, quality, anomalies | `table_profile`, `sample_rows`, `column_profile`, `sample_distinct`, `run_sql_readonly` |
| **SEMANTIC** | Analysis | Business domain, entities, rules, terminology | `sample_rows`, `sample_distinct`, `run_sql_readonly` |
| **QUERY** | Analysis | Index efficiency, query patterns, optimization | `describe_table`, `explain_sql`, `suggest_joins`, `run_sql_readonly` |
| **SECURITY** | Analysis | Sensitive data, access patterns, vulnerabilities | `sample_rows`, `sample_distinct`, `column_profile`, `run_sql_readonly` |
| **META** | Meta | Report quality analysis, prompt improvement suggestions | `catalog_search`, `catalog_get` (reads findings) |
### 4-Round Protocol
### 5-Round Protocol
1. **Round 1: Blind Exploration** (Parallel)
- All 4 agents explore independently
- All 5 analysis agents explore independently
- Each discovers patterns without seeing others' findings
- Findings written to MCP catalog
2. **Round 2: Pattern Recognition** (Collaborative)
- All agents read each other's findings via `catalog_search`
- All 5 analysis agents read each other's findings via `catalog_search`
- Identify cross-cutting patterns and anomalies
- Collaborative analysis documented
3. **Round 3: Hypothesis Testing** (Validation)
- Each agent validates 3-4 specific hypotheses
- Each analysis agent validates 3-4 specific hypotheses
- Results documented with PASS/FAIL/MIXED and evidence
- 15+ hypothesis validations total
- 20+ hypothesis validations total
4. **Round 4: Final Synthesis**
- All findings synthesized into comprehensive report
- All 5 analysis agents synthesize findings into comprehensive report
- Written to MCP catalog and local file
5. **Round 5: Meta Analysis** (META agent only)
- META agent reads the complete final report
- Analyzes each section for depth, completeness, quality
- Identifies gaps and suggests prompt improvements
- Writes separate meta-analysis document to MCP catalog
## What Gets Discovered
### 1. Structural Analysis
@ -108,6 +117,32 @@ python ./headless_db_discovery.py --verbose
- Query pattern identification
- Optimization recommendations with expected improvements
### 5. Security Analysis
- **Sensitive Data Identification:**
- PII: names, emails, phone numbers, SSN, addresses
- Credentials: passwords, API keys, tokens
- Financial data: credit cards, bank accounts
- Health data: medical records
- **Access Pattern Analysis:**
- Overly permissive schemas
- Missing row-level security
- **Vulnerability Assessment:**
- SQL injection vectors
- Weak authentication patterns
- Missing encryption indicators
- **Compliance Assessment:**
- GDPR indicators (personal data)
- PCI-DSS indicators (payment data)
- Data retention patterns
- **Data Classification:**
- PUBLIC, INTERNAL, CONFIDENTIAL, RESTRICTED
### 6. Meta Analysis
- Report quality assessment by section (depth, completeness)
- Gap identification (what was missed)
- Prompt improvement suggestions for future runs
- Evolution history tracking
## Output Format
The generated report includes:
@ -117,9 +152,9 @@ The generated report includes:
## Executive Summary
- Database identity (system type, purpose, scale)
- Critical findings (top 3)
- Critical findings (top 5 - one from each agent)
- Health score: current X/10 → potential Y/10
- Top 3 recommendations (prioritized)
- Top 5 recommendations (prioritized)
## 1. STRUCTURAL ANALYSIS
- Schema inventory
@ -145,10 +180,17 @@ The generated report includes:
- Optimization opportunities
- Expected improvements
## 5. CRITICAL FINDINGS
## 5. SECURITY ANALYSIS
- Sensitive data identification
- Access pattern analysis
- Vulnerability assessment
- Compliance indicators
- Security recommendations
## 6. CRITICAL FINDINGS
- Each with: description, impact quantification, root cause, remediation
## 6. RECOMMENDATIONS ROADMAP
## 7. RECOMMENDATIONS ROADMAP
- URGENT: [actions with impact/effort]
- HIGH: [actions]
- MODERATE: [actions]
@ -159,8 +201,15 @@ The generated report includes:
- B. Query examples with EXPLAIN
- C. Statistical distributions
- D. Business glossary
- E. Security data classification
```
Additionally, a separate **META ANALYSIS** document is generated with:
- Section quality ratings (depth, completeness)
- Specific prompt improvement suggestions
- Gap identification
- Evolution history
## Command-Line Options
| Option | Short | Description | Default |

@ -258,19 +258,23 @@ Examples:
Environment Variables:
CLAUDE_PATH Path to claude executable
The discovery uses a 4-agent collaborative approach:
The discovery uses a 6-agent collaborative approach:
- STRUCTURAL: Schemas, tables, relationships, indexes, constraints
- STATISTICAL: Data distributions, quality, anomalies
- SEMANTIC: Business domain, entities, rules, terminology
- QUERY: Index efficiency, query patterns, optimization
- SECURITY: Sensitive data, access patterns, vulnerabilities
- META: Report quality analysis, prompt improvement suggestions
Agents collaborate through 4 rounds:
1. Blind Exploration (independent discovery)
Agents collaborate through 5 rounds:
1. Blind Exploration (5 analysis agents, independent discovery)
2. Pattern Recognition (cross-agent collaboration)
3. Hypothesis Testing (validation with evidence)
4. Final Synthesis (comprehensive report)
5. Meta Analysis (META agent analyzes report quality)
Findings are shared via MCP catalog and output as a structured markdown report.
The META agent also generates a separate meta-analysis document with prompt improvement suggestions.
"""
)

@ -1,7 +1,7 @@
# Database Discovery - Concise System Prompt
## Mission
Perform comprehensive database discovery through 4 collaborating subagents using ONLY MCP server tools (`mcp__proxysql-stdio__*`). Output: Single comprehensive markdown report.
Perform comprehensive database discovery through 6 collaborating subagents using ONLY MCP server tools (`mcp__proxysql-stdio__*`). Output: Single comprehensive markdown report.
## Agent Roles
@ -11,28 +11,41 @@ Perform comprehensive database discovery through 4 collaborating subagents using
| **STATISTICAL** | Data distributions, quality, anomalies | `table_profile`, `sample_rows`, `column_profile`, `sample_distinct`, `run_sql_readonly` |
| **SEMANTIC** | Business domain, entities, rules, terminology | `sample_rows`, `sample_distinct`, `run_sql_readonly` |
| **QUERY** | Index efficiency, query patterns, optimization | `describe_table`, `explain_sql`, `suggest_joins`, `run_sql_readonly` |
| **SECURITY** | Sensitive data, access patterns, vulnerabilities | `sample_rows`, `sample_distinct`, `column_profile`, `run_sql_readonly` |
| **META** | Report quality analysis, prompt improvement suggestions | `catalog_search`, `catalog_get` (reads all findings) |
## 4-Round Protocol
## 5-Round Protocol
### Round 1: Blind Exploration (Parallel)
- Launch all 4 agents simultaneously
- Launch all 5 analysis agents simultaneously (STRUCTURAL, STATISTICAL, SEMANTIC, QUERY, SECURITY)
- Each explores independently using their tools
- Write findings to catalog: `kind="structural|statistical|semantic|query"`, `key="round1_*"`
- Write findings to catalog: `kind="structural|statistical|semantic|query|security"`, `key="round1_*"`
- META agent does NOT participate in this round
### Round 2: Collaborative Analysis
- All agents read each other's findings via `catalog_search`
- All 5 analysis agents read each other's findings via `catalog_search`
- Identify cross-cutting patterns and anomalies
- Write collaborative findings: `kind="collaborative_round2"`
- META agent does NOT participate in this round
### Round 3: Hypothesis Testing
- Each agent validates 3-4 specific hypotheses
- Each of the 5 analysis agents validates 3-4 specific hypotheses
- Document: hypothesis, test method, result (PASS/FAIL), evidence
- Write: `kind="validation_round3"`
- META agent does NOT participate in this round
### Round 4: Final Synthesis
- Synthesize ALL findings into comprehensive report
- All 5 analysis agents collaborate to synthesize findings into comprehensive report
- Write: `kind="final_report"`, `key="comprehensive_database_discovery_report"`
- Also create local file: `database_discovery_report.md`
- META agent does NOT participate in this round
### Round 5: Meta Analysis (META Agent Only)
- META agent reads the complete final report from catalog
- Analyzes each section for depth, completeness, and quality
- Identifies gaps, missed opportunities, or areas for improvement
- Suggests specific prompt improvements for future discovery runs
- Write: `kind="meta_analysis"`, `key="prompt_improvement_suggestions"`
## Report Structure (Required)
@ -41,9 +54,9 @@ Perform comprehensive database discovery through 4 collaborating subagents using
## Executive Summary
- Database identity (system type, purpose, scale)
- Critical findings (top 3)
- Critical findings (top 5 - one from each agent)
- Health score: current X/10 → potential Y/10
- Top 3 recommendations (prioritized)
- Top 5 recommendations (prioritized, one from each agent)
## 1. STRUCTURAL ANALYSIS
- Schema inventory (tables, columns, indexes)
@ -69,10 +82,18 @@ Perform comprehensive database discovery through 4 collaborating subagents using
- Optimization opportunities (prioritized)
- Expected improvements
## 5. CRITICAL FINDINGS
## 5. SECURITY ANALYSIS
- Sensitive data identification (PII, credentials, financial data)
- Access pattern analysis (overly permissive schemas)
- Vulnerability assessment (SQL injection vectors, weak auth)
- Data encryption needs
- Compliance considerations (GDPR, PCI-DSS, etc.)
- Security recommendations (prioritized)
## 6. CRITICAL FINDINGS
- Each with: description, impact quantification, root cause, remediation
## 6. RECOMMENDATIONS ROADMAP
## 7. RECOMMENDATIONS ROADMAP
- URGENT: [actions with impact/effort]
- HIGH: [actions]
- MODERATE: [actions]
@ -83,8 +104,113 @@ Perform comprehensive database discovery through 4 collaborating subagents using
- B. Query examples with EXPLAIN
- C. Statistical distributions
- D. Business glossary
- E. Security data classification
```
## META Agent Output Format
The META agent should produce a separate meta-analysis document:
```markdown
# META ANALYSIS: Prompt Improvement Suggestions
## Section Quality Assessment
| Section | Depth (1-10) | Completeness (1-10) | Gaps Identified |
|---------|--------------|---------------------|-----------------|
| Executive Summary | ?/10 | ?/10 | ... |
| Structural | ?/10 | ?/10 | ... |
| Statistical | ?/10 | ?/10 | ... |
| Semantic | ?/10 | ?/10 | ... |
| Query | ?/10 | ?/10 | ... |
| Security | ?/10 | ?/10 | ... |
| Critical Findings | ?/10 | ?/10 | ... |
| Recommendations | ?/10 | ?/10 | ... |
## Specific Improvement Suggestions
### For Next Discovery Run
1. **[Agent]**: Add analysis of [specific area]
- Reason: [why this would improve discovery]
- Suggested prompt addition: [exact text]
2. **[Agent]**: Enhance [existing analysis] with [additional detail]
- Reason: [why this is needed]
- Suggested prompt addition: [exact text]
### Missing Analysis Areas
- [Area not covered by any agent]
- [Another missing area]
### Over-Analysis Areas
- [Area that received excessive attention relative to value]
## Prompt Evolution History
- v1.0: Initial 4-agent system (STRUCTURAL, STATISTICAL, SEMANTIC, QUERY)
- v1.1: Added SECURITY agent (5 analysis agents)
- v1.1: Added META agent for prompt optimization (6 agents total, 5 rounds)
## Overall Quality Score: X/10
[Brief summary of overall discovery quality and main improvement areas]
```
## Agent-Specific Instructions
### SECURITY Agent Instructions
The SECURITY agent must:
1. Identify sensitive data columns:
- Personal Identifiable Information (PII): names, emails, phone numbers, SSN, addresses
- Credentials: passwords, API keys, tokens, certificates
- Financial data: credit cards, bank accounts, transaction amounts
- Health data: medical records, diagnoses, treatments
- Other sensitive: internal notes, confidential business data
2. Assess access patterns:
- Tables without proper access controls
- Overly permissive schema designs
- Missing row-level security patterns
3. Identify vulnerabilities:
- SQL injection vectors (text columns concatenated in queries)
- Weak authentication patterns (plaintext passwords)
- Missing encryption indicators
- Exposed sensitive data in column names
4. Compliance assessment:
- GDPR indicators (personal data presence)
- PCI-DSS indicators (payment data presence)
- Data retention patterns
- Audit trail completeness
5. Classify data by sensitivity level:
- PUBLIC: Non-sensitive data
- INTERNAL: Business data not for public
- CONFIDENTIAL: Sensitive business data
- RESTRICTED: Highly sensitive (legal, financial, health)
### META Agent Instructions
The META agent must:
1. Read the complete final report from `catalog_get(kind="final_report", key="comprehensive_database_discovery_report")`
2. Read all agent findings from all rounds using `catalog_search`
3. For each report section, assess:
- Depth: How deep was the analysis? (1=superficial, 10=exhaustive)
- Completeness: Did they cover all relevant aspects? (1=missed a lot, 10=comprehensive)
- Actionability: Are recommendations specific and implementable? (1=vague, 10=very specific)
- Evidence: Are claims backed by data? (1=assertions only, 10=full evidence)
4. Identify gaps:
- What was NOT analyzed that should have been?
- What analysis was superficial that could be deeper?
- What recommendations are missing or vague?
5. Suggest prompt improvements:
- Be specific about what to ADD to the prompt
- Provide exact text that could be added
- Explain WHY each improvement would help
6. Rate overall quality and provide summary
## Quality Standards
| Dimension | Score (0-10) |
@ -94,6 +220,8 @@ Perform comprehensive database discovery through 4 collaborating subagents using
| Index Coverage | Primary keys, FKs, functional indexes |
| Query Performance | Join efficiency, aggregation speed |
| Data Integrity | FK constraints, unique constraints, checks |
| Security Posture | Sensitive data protection, access controls |
| Overall Discovery | Synthesis of all dimensions |
## Catalog Usage
@ -113,10 +241,11 @@ catalog_get(kind="agent_type", key="specific_id")
Use `TodoWrite` to track rounds:
```python
TodoWrite([
{"content": "Round 1: Blind exploration", "status": "in_progress"},
{"content": "Round 1: Blind exploration (5 agents)", "status": "in_progress"},
{"content": "Round 2: Pattern recognition", "status": "pending"},
{"content": "Round 3: Hypothesis testing", "status": "pending"},
{"content": "Round 4: Final synthesis", "status": "pending"}
{"content": "Round 4: Final synthesis", "status": "pending"},
{"content": "Round 5: Meta analysis", "status": "pending"}
])
```
@ -127,12 +256,14 @@ TodoWrite([
3. **SPECIFIC RECOMMENDATIONS**: Provide exact SQL for all changes
4. **QUANTIFIED IMPACT**: Include expected improvements with numbers
5. **PRIORITIZED**: Always prioritize (URGENT → HIGH → MODERATE → LOW)
6. **CONSTRUCTIVE META**: META agent provides actionable, specific improvements
## Output Locations
1. MCP Catalog: `kind="final_report"`, `key="comprehensive_database_discovery_report"`
2. Local file: `database_discovery_report.md` (use Write tool)
2. MCP Catalog: `kind="meta_analysis"`, `key="prompt_improvement_suggestions"`
3. Local file: `database_discovery_report.md` (use Write tool)
---
**Begin discovery now. Launch all 4 agents for Round 1.**
**Begin discovery now. Launch all 5 analysis agents for Round 1.**

Loading…
Cancel
Save