mirror of https://github.com/sysown/proxysql
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
258 lines
8.4 KiB
258 lines
8.4 KiB
# Multi-Agent Database Discovery System
|
|
|
|
## Overview
|
|
|
|
This document describes a multi-agent database discovery system implemented using Claude Code's autonomous agent capabilities. The system uses 4 specialized subagents that collaborate via the MCP (Model Context Protocol) catalog to perform comprehensive database analysis.
|
|
|
|
## Architecture
|
|
|
|
```
|
|
┌─────────────────────────────────────────────────────────────────────┐
|
|
│ Main Agent (Orchestrator) │
|
|
│ - Launches 4 specialized subagents in parallel │
|
|
│ - Coordinates via MCP catalog │
|
|
│ - Synthesizes final report │
|
|
└────────────────┬────────────────────────────────────────────────────┘
|
|
│
|
|
┌────────────┼────────────┬────────────┬────────────┐
|
|
│ │ │ │ │
|
|
▼ ▼ ▼ ▼ ▼
|
|
┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐
|
|
│Struct. │ │Statist.│ │Semantic│ │Query │ │ MCP │
|
|
│ Agent │ │ Agent │ │ Agent │ │ Agent │ │Catalog │
|
|
└────────┘ └────────┘ └────────┘ └────────┘ └────────┘
|
|
│ │ │ │ │
|
|
└────────────┴────────────┴────────────┴────────────┘
|
|
│
|
|
▼ ▼
|
|
┌─────────┐ ┌─────────────┐
|
|
│ Database│ │ Catalog │
|
|
│ (testdb)│ │ (Shared Mem)│
|
|
└─────────┘ └─────────────┘
|
|
```
|
|
|
|
## The Four Discovery Agents
|
|
|
|
### 1. Structural Agent
|
|
**Mission**: Map tables, relationships, indexes, and constraints
|
|
|
|
**Responsibilities**:
|
|
- Complete ERD documentation
|
|
- Table schema analysis (columns, types, constraints)
|
|
- Foreign key relationship mapping
|
|
- Index inventory and assessment
|
|
- Architectural pattern identification
|
|
|
|
**Catalog Entries**: `structural_discovery`
|
|
|
|
**Key Deliverables**:
|
|
- Entity Relationship Diagram
|
|
- Complete table definitions
|
|
- Index inventory with recommendations
|
|
- Relationship cardinality mapping
|
|
|
|
### 2. Statistical Agent
|
|
**Mission**: Profile data distributions, patterns, and anomalies
|
|
|
|
**Responsibilities**:
|
|
- Table row counts and cardinality analysis
|
|
- Data distribution profiling
|
|
- Anomaly detection (duplicates, outliers)
|
|
- Statistical summaries (min/max/avg/stddev)
|
|
- Business metrics calculation
|
|
|
|
**Catalog Entries**: `statistical_discovery`
|
|
|
|
**Key Deliverables**:
|
|
- Data quality score
|
|
- Duplicate detection reports
|
|
- Statistical distributions
|
|
- True vs inflated metrics
|
|
|
|
### 3. Semantic Agent
|
|
**Mission**: Infer business domain and entity types
|
|
|
|
**Responsibilities**:
|
|
- Business domain identification
|
|
- Entity type classification (master vs transactional)
|
|
- Business rule discovery
|
|
- Entity lifecycle analysis
|
|
- State machine identification
|
|
|
|
**Catalog Entries**: `semantic_discovery`
|
|
|
|
**Key Deliverables**:
|
|
- Complete domain model
|
|
- Business rules documentation
|
|
- Entity lifecycle definitions
|
|
- Missing capabilities identification
|
|
|
|
### 4. Query Agent
|
|
**Mission**: Analyze access patterns and optimization opportunities
|
|
|
|
**Responsibilities**:
|
|
- Query pattern identification
|
|
- Index usage analysis
|
|
- Performance bottleneck detection
|
|
- N+1 query risk assessment
|
|
- Optimization recommendations
|
|
|
|
**Catalog Entries**: `query_discovery`
|
|
|
|
**Key Deliverables**:
|
|
- Access pattern analysis
|
|
- Index recommendations (prioritized)
|
|
- Query optimization strategies
|
|
- EXPLAIN analysis results
|
|
|
|
## Discovery Process
|
|
|
|
### Round Structure
|
|
|
|
Each agent runs 4 rounds of analysis:
|
|
|
|
#### Round 1: Blind Exploration
|
|
- Initial schema/data analysis
|
|
- First observations cataloged
|
|
- Initial hypotheses formed
|
|
|
|
#### Round 2: Pattern Recognition
|
|
- Read other agents' findings from catalog
|
|
- Identify patterns and anomalies
|
|
- Form and test hypotheses
|
|
|
|
#### Round 3: Hypothesis Testing
|
|
- Validate business rules against actual data
|
|
- Cross-reference findings with other agents
|
|
- Confirm or reject hypotheses
|
|
|
|
#### Round 4: Final Synthesis
|
|
- Compile comprehensive findings
|
|
- Generate actionable recommendations
|
|
- Create final mission summary
|
|
|
|
### Catalog-Based Collaboration
|
|
|
|
```python
|
|
# Agent writes findings
|
|
catalog_upsert(
|
|
kind="structural_discovery",
|
|
key="table_customers",
|
|
document="...",
|
|
tags="structural,table,schema"
|
|
)
|
|
|
|
# Agent reads other agents' findings
|
|
findings = catalog_list(kind="statistical_discovery")
|
|
```
|
|
|
|
## Example Discovery Output
|
|
|
|
### Database: testdb (E-commerce Order Management)
|
|
|
|
#### True Statistics (After Deduplication)
|
|
| Metric | Current | Actual |
|
|
|--------|---------|--------|
|
|
| Customers | 15 | 5 |
|
|
| Products | 15 | 5 |
|
|
| Orders | 15 | 5 |
|
|
| Order Items | 27 | 9 |
|
|
| Revenue | $10,886.67 | $3,628.85 |
|
|
|
|
#### Critical Findings
|
|
1. **Data Quality**: 5/100 (Catastrophic) - 67% data triplication
|
|
2. **Missing Index**: orders.order_date (P0 critical)
|
|
3. **Missing Constraints**: No UNIQUE or FK constraints
|
|
4. **Business Domain**: E-commerce order management system
|
|
|
|
## Launching the Discovery System
|
|
|
|
```python
|
|
# In Claude Code, launch 4 agents in parallel:
|
|
Task(
|
|
description="Structural Discovery",
|
|
prompt=STRUCTURAL_AGENT_PROMPT,
|
|
subagent_type="general-purpose"
|
|
)
|
|
|
|
Task(
|
|
description="Statistical Discovery",
|
|
prompt=STATISTICAL_AGENT_PROMPT,
|
|
subagent_type="general-purpose"
|
|
)
|
|
|
|
Task(
|
|
description="Semantic Discovery",
|
|
prompt=SEMANTIC_AGENT_PROMPT,
|
|
subagent_type="general-purpose"
|
|
)
|
|
|
|
Task(
|
|
description="Query Discovery",
|
|
prompt=QUERY_AGENT_PROMPT,
|
|
subagent_type="general-purpose"
|
|
)
|
|
```
|
|
|
|
## MCP Tools Used
|
|
|
|
The agents use these MCP tools for database analysis:
|
|
|
|
- `list_schemas` - List all databases
|
|
- `list_tables` - List tables in a schema
|
|
- `describe_table` - Get table schema
|
|
- `sample_rows` - Get sample data from table
|
|
- `column_profile` - Get column statistics
|
|
- `run_sql_readonly` - Execute read-only queries
|
|
- `catalog_upsert` - Store findings in catalog
|
|
- `catalog_list` / `catalog_get` - Retrieve findings from catalog
|
|
|
|
### Target Scoping Requirement
|
|
|
|
Discovery and catalog/LLM tools are target-scoped. Always pass `target_id`:
|
|
|
|
- `discovery.run_static(target_id=..., schema_filter=...)`
|
|
- `catalog.*(target_id=..., run_id=...)`
|
|
- `agent.run_start(target_id=..., run_id=...)`
|
|
- `llm.*(target_id=..., run_id=...)`
|
|
|
|
`run_id` resolution is no longer global. The same schema name can exist on multiple targets, so `target_id` is required to resolve the correct discovery run.
|
|
|
|
## Benefits of Multi-Agent Approach
|
|
|
|
1. **Parallel Execution**: All 4 agents run simultaneously
|
|
2. **Specialized Expertise**: Each agent focuses on its domain
|
|
3. **Cross-Validation**: Agents validate each other's findings
|
|
4. **Comprehensive Coverage**: All aspects of database analyzed
|
|
5. **Knowledge Synthesis**: Final report combines all perspectives
|
|
|
|
## Output Format
|
|
|
|
The system produces:
|
|
|
|
1. **40+ Catalog Entries** - Detailed findings organized by agent
|
|
2. **Comprehensive Report** - Executive summary with:
|
|
- Structure & Schema (ERD, table definitions)
|
|
- Business Domain (entity model, business rules)
|
|
- Key Insights (data quality, performance)
|
|
- Data Quality Assessment (score, recommendations)
|
|
|
|
## Future Enhancements
|
|
|
|
- [ ] Additional specialized agents (Security, Performance, Compliance)
|
|
- [ ] Automated remediation scripts
|
|
- [ ] Continuous monitoring mode
|
|
- [ ] Integration with CI/CD pipelines
|
|
- [ ] Web-based dashboard for findings
|
|
|
|
## Related Files
|
|
|
|
- `simple_discovery.py` - Simplified demo of multi-agent pattern
|
|
- `mcp_catalog.db` - Catalog database for storing findings
|
|
|
|
## References
|
|
|
|
- Claude Code Task Tool Documentation
|
|
- MCP (Model Context Protocol) Specification
|
|
- ProxySQL MCP Server Implementation
|