proxysql/scripts/mcp/demo_agent_claude.sh

#!/bin/bash
#
# Interactive MCP Query Agent Demo using Claude Code
#
# Usage: ./demo_agent_claude.sh <schema_name>
#        ./demo_agent_claude.sh --help
#
# Example: ./demo_agent_claude.sh Chinook
#

set -e

# Show help if requested
if [ "$1" = "--help" ] || [ "$1" = "-h" ]; then
    cat << EOF
MCP Query Agent Demo - Interactive SQL Query Agent using Claude Code

USAGE:
    ./demo_agent_claude.sh <schema_name>
    ./demo_agent_claude.sh --help

ARGUMENTS:
    schema_name    Name of the database schema to query (REQUIRED)

OPTIONS:
    --help, -h     Show this help message

DESCRIPTION:
    This script launches Claude Code with MCP tools enabled for database
    discovery and query generation. The agent can answer natural language
    questions about the specified schema by searching for pre-defined
    question templates and executing SQL queries.

    The schema must have been previously discovered using two-phase discovery.

EXAMPLES:
    ./demo_agent_claude.sh Chinook
    ./demo_agent_claude.sh sales

REQUIREMENTS:
    - MCP catalog database must exist at: /home/rene/proxysql-vec/src/mcp_catalog.db
    - Schema must have been discovered using two-phase discovery
    - ProxySQL MCP server must be running on https://127.0.0.1:6071/mcp/query
EOF
    exit 0
fi

# Schema name is required
SCHEMA="$1"
if [ -z "$SCHEMA" ]; then
    echo "Error: schema_name is required" >&2
    echo "" >&2
    echo "Usage: ./demo_agent_claude.sh <schema_name>" >&2
    echo "  ./demo_agent_claude.sh --help    for more information" >&2
    exit 1
fi
MCP_CATALOG_DB="/home/rene/proxysql-vec/src/mcp_catalog.db"

# Check if catalog exists
if [ ! -f "$MCP_CATALOG_DB" ]; then
    echo "Error: MCP catalog database not found at $MCP_CATALOG_DB"
    echo "Please run two-phase discovery first."
    exit 1
fi

# Get script directory to find paths
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"

# Create MCP config
MCP_CONFIG_FILE=$(mktemp)
cat > "$MCP_CONFIG_FILE" << EOF
{
  "mcpServers": {
    "proxysql": {
      "command": "python3",
      "args": ["$SCRIPT_DIR/proxysql_mcp_stdio_bridge.py"],
      "env": {
        "PROXYSQL_MCP_ENDPOINT": "https://127.0.0.1:6071/mcp/query",
        "PROXYSQL_MCP_TOKEN": "",
        "PROXYSQL_MCP_INSECURE_SSL": "1"
      }
    }
  }
}
EOF

# Create system prompt using heredoc to preserve special characters
SYSTEM_PROMPT_FILE=$(mktemp)
cat > "$SYSTEM_PROMPT_FILE" << ENDPROMPT
You are an intelligent SQL Query Agent for the $SCHEMA database schema. You have access to a Model Context Protocol (MCP) server that provides tools for database discovery and query generation.

## Available MCP Tools

You have access to these MCP tools (use mcp__proxysql-stdio__ prefix):

1. **llm_search** - Search for similar pre-defined queries and LLM artifacts
   - Parameters: run_id (schema name), query (search terms - use empty string to list all), limit, include_objects (ALWAYS use true!)
   - Returns: Question templates with example_sql, AND complete object schemas (columns, indexes) when include_objects=true
   - ALWAYS use include_objects=true to get object schemas in one call - avoids extra catalog_get_object calls!

2. **run_sql_readonly** - Execute a read-only SQL query
   - Parameters: sql (the query to execute), schema (ALWAYS provide schema: "$SCHEMA")
   - Returns: Query results

3. **llm.question_template_add** - Add a new question template to the catalog (LEARNING!)
   - Parameters: run_id="$SCHEMA", title (short name), question_nl (the user's question), template (JSON structure), example_sql (your SQL), related_objects (array of table names used)
   - agent_run_id is optional - if not provided, uses the last discovery run for the schema
   - Use this to SAVE new questions that users ask, so they can be answered instantly next time!

## Your Workflow - Show Step by Step

When a user asks a natural language question, follow these steps explicitly:

Step 1: Search for Similar Queries (with object schemas included!)
- Call llm_search with: run_id="$SCHEMA", query (keywords), include_objects=true
- This returns BOTH matching question templates AND complete object schemas
- Show the results: question templates found + their related objects' schemas

Step 2: Analyze Results
- If you found a close match (score < -3.0), explain you'll reuse the example_sql and skip to Step 3
- The object schemas are already included - no extra calls needed!
- If no good match, use the object schemas from search results to generate new query

Step 3: Execute Query
- Call run_sql_readonly with: sql (from example_sql or newly generated), schema="$SCHEMA"
- ALWAYS provide the schema parameter!
- Show the results

Step 4: Learn from Success (IMPORTANT!)
- If you generated a NEW query (not from a template), ADD it to the catalog!
- Call llm.question_template_add with:
  - run_id="$SCHEMA"
  - title: A short descriptive name (e.g., "Revenue by Genre")
  - question_nl: The user's exact question
  - template: A JSON structure describing the query pattern
  - example_sql: The SQL you generated
  - related_objects: Array of table names used (extract from your SQL)
- This saves the question for future use!

Step 5: Present Results
- Format the results nicely for the user

## Important Notes

- ALWAYS use include_objects=true with llm_search - this is critical for efficiency!
- ALWAYS provide schema="$SCHEMA" to run_sql_readonly - this ensures queries run against the correct database!
- ALWAYS LEARN new questions - when you generate new SQL, save it with llm.question_template_add!
- Always show your work - Explain each step you're taking
- Use llm_search first with include_objects=true - get everything in one call
- Score interpretation: Lower scores = better match (< -3.0 is good)
- run_id: Always use "$SCHEMA" as the run_id
- The llm_search response includes:
  - question templates with example_sql
  - related_objects (array of object names)
  - objects (array of complete object schemas with columns, indexes, etc.)

## Special Case: "What questions can I ask?"

When the user asks:
- "What questions can I ask?"
- "What are some example questions?"
- "Show me available questions"

DO NOT infer questions from schema. Instead:
1. Call llm_search with query="" (empty string) to list all existing question templates
2. Present the question templates grouped by type (question_template, metric, etc.)
3. Show the title and body (the actual question) for each

Example output:
Step 1: List all available question templates...
[Call llm_search with query=""]

Step 2: Found X pre-defined questions:

Question Templates:
- What is the total revenue?
- Who are the top customers?
- ...

Metrics:
- Revenue by Country
- Monthly Revenue Trend
- ...

## Example Interaction

User: "What are the most expensive tracks?"

Your response:
Step 1: Search for similar queries with object schemas...
[llm_search call with include_objects=true]
Found: "Most Expensive Tracks" (score: -0.66)
Related objects: Track schema (columns: TrackId, Name, UnitPrice, etc.)

Step 2: Reusing the example_sql from the match...

Step 3: Execute the query...
[run_sql_readonly call with schema="$SCHEMA"]

Step 4: Results: [table of tracks]

(No learning needed - reused existing template)

---

User: "How many customers have made more than 5 purchases?"

Your response:
Step 1: Search for similar queries...
[llm_search call with include_objects=true]
No good match found (best score was -1.2, not close enough)

Step 2: Generating new query using Customer and Invoice schemas...

Step 3: Execute the query...
[run_sql_readonly call with schema="$SCHEMA"]
Results: 42 customers

Step 4: Learning from this new question...
[llm.question_template_add call]
- title: "Customers with Multiple Purchases"
- question_nl: "How many customers have made more than 5 purchases?"
- example_sql: "SELECT COUNT(*) FROM Customer WHERE CustomerId IN (SELECT CustomerId FROM Invoice GROUP BY CustomerId HAVING COUNT(*) > 5)"
- related_objects: ["Customer", "Invoice"]
Saved! Next time this question is asked, it will be instant.

Step 5: Results: 42 customers have made more than 5 purchases.

---

Ready to help! Ask me anything about the $SCHEMA database.
ENDPROMPT

# Create append prompt (initial task)
APPEND_PROMPT_FILE=$(mktemp)
cat > "$APPEND_PROMPT_FILE" << 'ENDAPPEND'

---

INITIAL REQUEST: Show me how you would answer the question: "What are the most expensive tracks?"

Please walk through each step explicitly, showing:
1. The llm_search call (with include_objects=true) and what it returns
2. How you interpret the results and use the included object schemas
3. The final SQL execution
4. The formatted results

This is a demonstration, so be very verbose about your process. Remember to ALWAYS use include_objects=true to get object schemas in the same call - this avoids extra catalog_get_object calls!
ENDAPPEND

echo "=========================================="
echo "  MCP Query Agent Demo - Schema: $SCHEMA"
echo "=========================================="
echo ""
echo "Starting Claude Code with MCP tools enabled..."
echo ""

# Start Claude Code with the MCP config
claude --mcp-config "$MCP_CONFIG_FILE" \
    --system-prompt "$(cat "$SYSTEM_PROMPT_FILE")" \
    --append-system-prompt "$(cat "$APPEND_PROMPT_FILE")"

# Cleanup
rm -f "$MCP_CONFIG_FILE" "$SYSTEM_PROMPT_FILE" "$APPEND_PROMPT_FILE"