mirror of https://github.com/sysown/proxysql
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
315 lines
8.9 KiB
315 lines
8.9 KiB
# Headless Database Discovery with Claude Code
|
|
|
|
Multi-agent database discovery system for comprehensive analysis through MCP (Model Context Protocol).
|
|
|
|
## Overview
|
|
|
|
This directory contains scripts for running **4-agent collaborative database discovery** in headless (non-interactive) mode using Claude Code.
|
|
|
|
**Key Features:**
|
|
- **4 Collaborating Agents:** STRUCTURAL, STATISTICAL, SEMANTIC, QUERY
|
|
- **4-Round Protocol:** Blind exploration → Pattern recognition → Hypothesis testing → Final synthesis
|
|
- **MCP Catalog Collaboration:** Agents share findings via catalog
|
|
- **Comprehensive Reports:** Structured markdown with health scores and prioritized recommendations
|
|
- **Evidence-Based:** 15+ hypothesis validations with direct database evidence
|
|
|
|
## Quick Start
|
|
|
|
### Using the Python Script (Recommended)
|
|
|
|
```bash
|
|
# Basic discovery - discovers the first available database
|
|
python ./headless_db_discovery.py
|
|
|
|
# Discover a specific database
|
|
python ./headless_db_discovery.py --database mydb
|
|
|
|
# Specify output file
|
|
python ./headless_db_discovery.py --output my_report.md
|
|
|
|
# With verbose output
|
|
python ./headless_db_discovery.py --verbose
|
|
```
|
|
|
|
### Using the Bash Script
|
|
|
|
```bash
|
|
# Basic discovery
|
|
./headless_db_discovery.sh
|
|
|
|
# Discover specific database
|
|
./headless_db_discovery.sh -d mydb
|
|
|
|
# With custom timeout
|
|
./headless_db_discovery.sh -t 600
|
|
```
|
|
|
|
## Multi-Agent Discovery Architecture
|
|
|
|
### The 4 Agents
|
|
|
|
| Agent | Focus | Key MCP Tools |
|
|
|-------|-------|---------------|
|
|
| **STRUCTURAL** | Schemas, tables, relationships, indexes, constraints | `list_schemas`, `list_tables`, `describe_table`, `get_constraints`, `suggest_joins` |
|
|
| **STATISTICAL** | Data distributions, quality, anomalies | `table_profile`, `sample_rows`, `column_profile`, `sample_distinct`, `run_sql_readonly` |
|
|
| **SEMANTIC** | Business domain, entities, rules, terminology | `sample_rows`, `sample_distinct`, `run_sql_readonly` |
|
|
| **QUERY** | Index efficiency, query patterns, optimization | `describe_table`, `explain_sql`, `suggest_joins`, `run_sql_readonly` |
|
|
|
|
### 4-Round Protocol
|
|
|
|
1. **Round 1: Blind Exploration** (Parallel)
|
|
- All 4 agents explore independently
|
|
- Each discovers patterns without seeing others' findings
|
|
- Findings written to MCP catalog
|
|
|
|
2. **Round 2: Pattern Recognition** (Collaborative)
|
|
- All agents read each other's findings via `catalog_search`
|
|
- Identify cross-cutting patterns and anomalies
|
|
- Collaborative analysis documented
|
|
|
|
3. **Round 3: Hypothesis Testing** (Validation)
|
|
- Each agent validates 3-4 specific hypotheses
|
|
- Results documented with PASS/FAIL/MIXED and evidence
|
|
- 15+ hypothesis validations total
|
|
|
|
4. **Round 4: Final Synthesis**
|
|
- All findings synthesized into comprehensive report
|
|
- Written to MCP catalog and local file
|
|
|
|
## What Gets Discovered
|
|
|
|
### 1. Structural Analysis
|
|
- Complete table schemas (columns, types, constraints)
|
|
- Primary keys, foreign keys, unique constraints
|
|
- Indexes and their purposes
|
|
- Entity Relationship Diagram (ERD)
|
|
- Design patterns and anti-patterns
|
|
|
|
### 2. Statistical Analysis
|
|
- Row counts and cardinality
|
|
- Data distributions for key columns
|
|
- Null value percentages
|
|
- Distinct value counts and selectivity
|
|
- Statistical summaries (min/max/avg)
|
|
- Anomaly detection (duplicates, outliers, skew)
|
|
|
|
### 3. Semantic Analysis
|
|
- Business domain identification (e.g., e-commerce, healthcare)
|
|
- Entity type classification (master vs transactional)
|
|
- Business rules and constraints
|
|
- Entity lifecycles and state machines
|
|
- Domain terminology glossary
|
|
|
|
### 4. Query Analysis
|
|
- Index coverage and efficiency
|
|
- Missing index identification
|
|
- Composite index opportunities
|
|
- Join performance analysis
|
|
- Query pattern identification
|
|
- Optimization recommendations with expected improvements
|
|
|
|
## Output Format
|
|
|
|
The generated report includes:
|
|
|
|
```markdown
|
|
# COMPREHENSIVE DATABASE DISCOVERY REPORT
|
|
|
|
## Executive Summary
|
|
- Database identity (system type, purpose, scale)
|
|
- Critical findings (top 3)
|
|
- Health score: current X/10 → potential Y/10
|
|
- Top 3 recommendations (prioritized)
|
|
|
|
## 1. STRUCTURAL ANALYSIS
|
|
- Schema inventory
|
|
- Relationship diagram
|
|
- Design patterns
|
|
- Issues & recommendations
|
|
|
|
## 2. STATISTICAL ANALYSIS
|
|
- Table profiles
|
|
- Data quality score
|
|
- Distribution profiles
|
|
- Anomalies detected
|
|
|
|
## 3. SEMANTIC ANALYSIS
|
|
- Business domain identification
|
|
- Entity catalog
|
|
- Business rules inference
|
|
- Domain glossary
|
|
|
|
## 4. QUERY ANALYSIS
|
|
- Index coverage assessment
|
|
- Query pattern analysis
|
|
- Optimization opportunities
|
|
- Expected improvements
|
|
|
|
## 5. CRITICAL FINDINGS
|
|
- Each with: description, impact quantification, root cause, remediation
|
|
|
|
## 6. RECOMMENDATIONS ROADMAP
|
|
- URGENT: [actions with impact/effort]
|
|
- HIGH: [actions]
|
|
- MODERATE: [actions]
|
|
- Expected timeline with metrics
|
|
|
|
## Appendices
|
|
- A. Table DDL
|
|
- B. Query examples with EXPLAIN
|
|
- C. Statistical distributions
|
|
- D. Business glossary
|
|
```
|
|
|
|
## Command-Line Options
|
|
|
|
| Option | Short | Description | Default |
|
|
|--------|-------|-------------|---------|
|
|
| `--database` | `-d` | Database name to discover | First available |
|
|
| `--schema` | `-s` | Schema name to analyze | All schemas |
|
|
| `--output` | `-o` | Output file path | `discovery_YYYYMMDD_HHMMSS.md` |
|
|
| `--timeout` | `-t` | Timeout in seconds | 300 |
|
|
| `--verbose` | `-v` | Enable verbose output | Disabled |
|
|
| `--help` | `-h` | Show help message | - |
|
|
|
|
## System Prompts
|
|
|
|
The discovery uses the system prompt in `prompts/multi_agent_discovery_prompt.md`:
|
|
|
|
- **`prompts/multi_agent_discovery_prompt.md`** - Concise system prompt for actual use
|
|
- **`prompts/multi_agent_discovery_reference.md`** - Comprehensive reference documentation
|
|
|
|
## Examples
|
|
|
|
### CI/CD Integration
|
|
|
|
```yaml
|
|
# .github/workflows/database-discovery.yml
|
|
name: Database Discovery
|
|
|
|
on:
|
|
schedule:
|
|
- cron: '0 0 * * 0' # Weekly
|
|
workflow_dispatch:
|
|
|
|
jobs:
|
|
discovery:
|
|
runs-on: ubuntu-latest
|
|
steps:
|
|
- uses: actions/checkout@v3
|
|
- name: Install Claude Code
|
|
run: npm install -g @anthropics/claude-code
|
|
- name: Run Discovery
|
|
env:
|
|
PROXYSQL_MCP_ENDPOINT: ${{ secrets.PROXYSQL_MCP_ENDPOINT }}
|
|
PROXYSQL_MCP_TOKEN: ${{ secrets.PROXYSQL_MCP_TOKEN }}
|
|
run: |
|
|
cd scripts/mcp/DiscoveryAgent/ClaudeCode_Headless
|
|
python ./headless_db_discovery.py \
|
|
--database production \
|
|
--output discovery_$(date +%Y%m%d).md
|
|
- name: Upload Report
|
|
uses: actions/upload-artifact@v3
|
|
with:
|
|
name: discovery-report
|
|
path: discovery_*.md
|
|
```
|
|
|
|
### Monitoring Automation
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# weekly_discovery.sh - Run weekly and compare results
|
|
|
|
REPORT_DIR="/var/db-discovery/reports"
|
|
mkdir -p "$REPORT_DIR"
|
|
|
|
# Run discovery
|
|
python ./headless_db_discovery.py \
|
|
--database mydb \
|
|
--output "$REPORT_DIR/discovery_$(date +%Y%m%d).md"
|
|
|
|
# Compare with previous week
|
|
PREV=$(ls -t "$REPORT_DIR"/discovery_*.md | head -2 | tail -1)
|
|
if [ -f "$PREV" ]; then
|
|
echo "=== Changes since last discovery ==="
|
|
diff "$PREV" "$REPORT_DIR/discovery_$(date +%Y%m%d).md" || true
|
|
fi
|
|
```
|
|
|
|
### Custom Discovery Focus
|
|
|
|
```python
|
|
# Modify the prompt in the script for focused discovery
|
|
def build_discovery_prompt(database: Optional[str]) -> str:
|
|
prompt = f"""Using the 4-agent discovery protocol, focus on:
|
|
1. Security aspects of {database}
|
|
2. Performance optimization opportunities
|
|
3. Data quality issues
|
|
|
|
Follow the standard 4-round protocol but prioritize these areas.
|
|
"""
|
|
return prompt
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### "Claude Code executable not found"
|
|
|
|
Set the `CLAUDE_PATH` environment variable:
|
|
|
|
```bash
|
|
export CLAUDE_PATH="/path/to/claude"
|
|
python ./headless_db_discovery.py
|
|
```
|
|
|
|
Or install Claude Code:
|
|
|
|
```bash
|
|
npm install -g @anthropics/claude-code
|
|
```
|
|
|
|
### "No MCP servers available"
|
|
|
|
Ensure MCP servers are configured in your Claude Code settings or provide MCP configuration via command line.
|
|
|
|
### Discovery times out
|
|
|
|
Increase the timeout:
|
|
|
|
```bash
|
|
python ./headless_db_discovery.py --timeout 600
|
|
```
|
|
|
|
### Output is truncated
|
|
|
|
The multi-agent prompt is designed for comprehensive output. If truncated:
|
|
1. Increase timeout
|
|
2. Check MCP server connection stability
|
|
3. Review MCP catalog for partial results
|
|
|
|
## Directory Structure
|
|
|
|
```
|
|
ClaudeCode_Headless/
|
|
├── README.md # This file
|
|
├── prompts/
|
|
│ ├── multi_agent_discovery_prompt.md # Concise system prompt
|
|
│ └── multi_agent_discovery_reference.md # Comprehensive reference
|
|
├── headless_db_discovery.py # Python script
|
|
├── headless_db_discovery.sh # Bash script
|
|
└── examples/
|
|
├── DATABASE_DISCOVERY_REPORT.md # Example output
|
|
└── DATABASE_QUESTION_CAPABILITIES.md # Feature documentation
|
|
```
|
|
|
|
## Related Documentation
|
|
|
|
- [Multi-Agent Database Discovery System](../../doc/multi_agent_database_discovery.md)
|
|
- [Claude Code Documentation](https://docs.anthropic.com/claude-code)
|
|
- [MCP Specification](https://modelcontextprotocol.io/)
|
|
|
|
## License
|
|
|
|
Same license as the proxysql-vec project.
|