You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
proxysql/doc/multi_agent_database_discov...

8.4 KiB

Multi-Agent Database Discovery System

Overview

This document describes a multi-agent database discovery system implemented using Claude Code's autonomous agent capabilities. The system uses 4 specialized subagents that collaborate via the MCP (Model Context Protocol) catalog to perform comprehensive database analysis.

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│                     Main Agent (Orchestrator)                       │
│  - Launches 4 specialized subagents in parallel                     │
│  - Coordinates via MCP catalog                                      │
│  - Synthesizes final report                                        │
└────────────────┬────────────────────────────────────────────────────┘
                 │
    ┌────────────┼────────────┬────────────┬────────────┐
    │            │            │            │            │
    ▼            ▼            ▼            ▼            ▼
┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐  ┌────────┐
│Struct. │  │Statist.│  │Semantic│  │Query   │  │  MCP   │
│ Agent  │  │ Agent  │  │ Agent  │  │ Agent  │  │Catalog │
└────────┘  └────────┘  └────────┘  └────────┘  └────────┘
     │            │            │            │            │
     └────────────┴────────────┴────────────┴────────────┘
                          │
                   ▼              ▼
              ┌─────────┐  ┌─────────────┐
              │ Database│  │   Catalog   │
              │ (testdb)│  │ (Shared Mem)│
              └─────────┘  └─────────────┘

The Four Discovery Agents

1. Structural Agent

Mission: Map tables, relationships, indexes, and constraints

Responsibilities:

  • Complete ERD documentation
  • Table schema analysis (columns, types, constraints)
  • Foreign key relationship mapping
  • Index inventory and assessment
  • Architectural pattern identification

Catalog Entries: structural_discovery

Key Deliverables:

  • Entity Relationship Diagram
  • Complete table definitions
  • Index inventory with recommendations
  • Relationship cardinality mapping

2. Statistical Agent

Mission: Profile data distributions, patterns, and anomalies

Responsibilities:

  • Table row counts and cardinality analysis
  • Data distribution profiling
  • Anomaly detection (duplicates, outliers)
  • Statistical summaries (min/max/avg/stddev)
  • Business metrics calculation

Catalog Entries: statistical_discovery

Key Deliverables:

  • Data quality score
  • Duplicate detection reports
  • Statistical distributions
  • True vs inflated metrics

3. Semantic Agent

Mission: Infer business domain and entity types

Responsibilities:

  • Business domain identification
  • Entity type classification (master vs transactional)
  • Business rule discovery
  • Entity lifecycle analysis
  • State machine identification

Catalog Entries: semantic_discovery

Key Deliverables:

  • Complete domain model
  • Business rules documentation
  • Entity lifecycle definitions
  • Missing capabilities identification

4. Query Agent

Mission: Analyze access patterns and optimization opportunities

Responsibilities:

  • Query pattern identification
  • Index usage analysis
  • Performance bottleneck detection
  • N+1 query risk assessment
  • Optimization recommendations

Catalog Entries: query_discovery

Key Deliverables:

  • Access pattern analysis
  • Index recommendations (prioritized)
  • Query optimization strategies
  • EXPLAIN analysis results

Discovery Process

Round Structure

Each agent runs 4 rounds of analysis:

Round 1: Blind Exploration

  • Initial schema/data analysis
  • First observations cataloged
  • Initial hypotheses formed

Round 2: Pattern Recognition

  • Read other agents' findings from catalog
  • Identify patterns and anomalies
  • Form and test hypotheses

Round 3: Hypothesis Testing

  • Validate business rules against actual data
  • Cross-reference findings with other agents
  • Confirm or reject hypotheses

Round 4: Final Synthesis

  • Compile comprehensive findings
  • Generate actionable recommendations
  • Create final mission summary

Catalog-Based Collaboration

# Agent writes findings
catalog_upsert(
    kind="structural_discovery",
    key="table_customers",
    document="...",
    tags="structural,table,schema"
)

# Agent reads other agents' findings
findings = catalog_list(kind="statistical_discovery")

Example Discovery Output

Database: testdb (E-commerce Order Management)

True Statistics (After Deduplication)

Metric Current Actual
Customers 15 5
Products 15 5
Orders 15 5
Order Items 27 9
Revenue $10,886.67 $3,628.85

Critical Findings

  1. Data Quality: 5/100 (Catastrophic) - 67% data triplication
  2. Missing Index: orders.order_date (P0 critical)
  3. Missing Constraints: No UNIQUE or FK constraints
  4. Business Domain: E-commerce order management system

Launching the Discovery System

# In Claude Code, launch 4 agents in parallel:
Task(
    description="Structural Discovery",
    prompt=STRUCTURAL_AGENT_PROMPT,
    subagent_type="general-purpose"
)

Task(
    description="Statistical Discovery",
    prompt=STATISTICAL_AGENT_PROMPT,
    subagent_type="general-purpose"
)

Task(
    description="Semantic Discovery",
    prompt=SEMANTIC_AGENT_PROMPT,
    subagent_type="general-purpose"
)

Task(
    description="Query Discovery",
    prompt=QUERY_AGENT_PROMPT,
    subagent_type="general-purpose"
)

MCP Tools Used

The agents use these MCP tools for database analysis:

  • list_schemas - List all databases
  • list_tables - List tables in a schema
  • describe_table - Get table schema
  • sample_rows - Get sample data from table
  • column_profile - Get column statistics
  • run_sql_readonly - Execute read-only queries
  • catalog_upsert - Store findings in catalog
  • catalog_list / catalog_get - Retrieve findings from catalog

Target Scoping Requirement

Discovery and catalog/LLM tools are target-scoped. Always pass target_id:

  • discovery.run_static(target_id=..., schema_filter=...)
  • catalog.*(target_id=..., run_id=...)
  • agent.run_start(target_id=..., run_id=...)
  • llm.*(target_id=..., run_id=...)

run_id resolution is no longer global. The same schema name can exist on multiple targets, so target_id is required to resolve the correct discovery run.

Benefits of Multi-Agent Approach

  1. Parallel Execution: All 4 agents run simultaneously
  2. Specialized Expertise: Each agent focuses on its domain
  3. Cross-Validation: Agents validate each other's findings
  4. Comprehensive Coverage: All aspects of database analyzed
  5. Knowledge Synthesis: Final report combines all perspectives

Output Format

The system produces:

  1. 40+ Catalog Entries - Detailed findings organized by agent
  2. Comprehensive Report - Executive summary with:
    • Structure & Schema (ERD, table definitions)
    • Business Domain (entity model, business rules)
    • Key Insights (data quality, performance)
    • Data Quality Assessment (score, recommendations)

Future Enhancements

  • Additional specialized agents (Security, Performance, Compliance)
  • Automated remediation scripts
  • Continuous monitoring mode
  • Integration with CI/CD pipelines
  • Web-based dashboard for findings
  • simple_discovery.py - Simplified demo of multi-agent pattern
  • mcp_catalog.db - Catalog database for storing findings

References

  • Claude Code Task Tool Documentation
  • MCP (Model Context Protocol) Specification
  • ProxySQL MCP Server Implementation