proxysql/doc/MCP/MCP_Stats_Implementation_Pl...

# ProxySQL Stats MCP Tools - Implementation Guide

This document provides implementation guidance for the `/stats` endpoint tools, including database access patterns, table mappings, SQL queries, data flow documentation, and design rationale.

## Table of Contents

- [1. Database Access Patterns](#1-database-access-patterns)
- [2. Table-to-Tool Mapping](#2-table-to-tool-mapping)
- [3. Data Flow Patterns](#3-data-flow-patterns)
- [4. Interval-to-Table Resolution](#4-interval-to-table-resolution)
- [5. Tool Implementation Details](#5-tool-implementation-details)
- [6. Helper Functions](#6-helper-functions)
- [7. Error Handling Patterns](#7-error-handling-patterns)
- [8. Testing Strategies](#8-testing-strategies)

---

## 1. Database Access Patterns

### 1.1 Database Types

ProxySQL maintains several SQLite databases:

| Database | Variable | Purpose | Schema Prefix |
|---|---|---|---|
| `admindb` | `GloAdmin->admindb` | Configuration and admin interface | (none) |
| `statsdb` | `GloAdmin->statsdb` | In-memory real-time statistics | `stats.` |
| `statsdb_disk` | `GloAdmin->statsdb_disk` | Persistent historical statistics | `stats_history.` |
| `statsdb_mem` | Internal | Internal metrics collection | N/A (not directly accessible) |

### 1.2 Access Rules

**Real-time stats tables:** Access through `GloAdmin->admindb` with the `stats.` schema prefix.

```cpp
// Example: Query stats_mysql_connection_pool
GloAdmin->admindb->execute_statement(
    "SELECT * FROM stats.stats_mysql_connection_pool",
    &error, &cols, &affected_rows, &resultset
);
```

**Historical data tables:** Access through `GloAdmin->statsdb_disk` directly (no prefix needed as it's the default schema).

```cpp
// Example: Query mysql_connections history (direct access - preferred)
GloAdmin->statsdb_disk->execute_statement(
    "SELECT * FROM mysql_connections WHERE timestamp > ?",
    &error, &cols, &affected_rows, &resultset
);
```

Alternatively, historical tables can be accessed through `GloAdmin->admindb` using the `stats_history.` prefix, as `statsdb_disk` is attached to both databases:

```cpp
// Example: Query mysql_connections history (via admindb with prefix)
GloAdmin->admindb->execute_statement(
    "SELECT * FROM stats_history.mysql_connections WHERE timestamp > ?",
    &error, &cols, &affected_rows, &resultset
);
```

Direct access via `statsdb_disk` is preferred for performance.

**Never use `GloAdmin->statsdb` directly** — it's for internal ProxySQL use only.

### 1.3 Admin Commands vs. Direct Function Calls

Some ProxySQL operations are exposed as admin commands (e.g., `DUMP EVENTSLOG FROM BUFFER TO MEMORY`, `SAVE MYSQL DIGEST TO DISK`). These commands are intercepted by `Admin_Handler.cpp` when received via the MySQL admin interface and routed to the appropriate C++ functions.

When implementing MCP tools, these admin commands cannot be executed via `admindb->execute_statement()` because SQLite doesn't recognize them as valid SQL. Instead, call the underlying C++ functions directly:

| Admin Command | Direct Function Call | Returns |
|---------------|---------------------|---------|
| `DUMP EVENTSLOG FROM BUFFER TO MEMORY` | `GloMyLogger->processEvents(statsdb, nullptr)` | Event count |
| `DUMP EVENTSLOG FROM BUFFER TO DISK` | `GloMyLogger->processEvents(nullptr, statsdb_disk)` | Event count |
| `DUMP EVENTSLOG FROM BUFFER TO BOTH` | `GloMyLogger->processEvents(statsdb, statsdb_disk)` | Event count |
| `SAVE MYSQL DIGEST TO DISK` | `GloAdmin->FlushDigestTableToDisk<SERVER_TYPE_MYSQL>(statsdb_disk)` | Digest count |
| `SAVE PGSQL DIGEST TO DISK` | `GloAdmin->FlushDigestTableToDisk<SERVER_TYPE_PGSQL>(statsdb_disk)` | Digest count |

Both functions are thread-safe:
- `processEvents()` uses `std::mutex` internally for the circular buffer
- `FlushDigestTableToDisk()` uses `pthread_rwlock` for the digest hash map

Required includes for these functions:
```cpp
#include "proxysql_admin.h"
#include "MySQL_Logger.hpp"

extern MySQL_Logger *GloMyLogger;
```

### 1.4 Query Execution Pattern

```cpp
json Stats_Tool_Handler::execute_query(const std::string& sql, SQLite3DB* db) {
    SQLite3_result* resultset = NULL;
    char* error = NULL;
    int cols = 0;
    int affected_rows = 0;

    int rc = db->execute_statement(sql.c_str(), &error, &cols, &affected_rows, &resultset);

    if (rc != SQLITE_OK) {
        std::string err_msg = error ? error : "Query execution failed";
        if (error) free(error);
        return create_error_response(err_msg);
    }

    json rows = resultset_to_json(resultset, cols);
    delete resultset;

    return rows;
}
```

---

## 2. Table-to-Tool Mapping

### 2.1 Live Data Tools

| Tool | MySQL Tables | PostgreSQL Tables |
|---|---|---|
| `show_status` | `stats.stats_mysql_global`, `stats.stats_memory_metrics` | `stats.stats_pgsql_global`, `stats.stats_memory_metrics` |
| `show_processlist` | `stats.stats_mysql_processlist` | `stats.stats_pgsql_processlist` |
| `show_queries` | `stats.stats_mysql_query_digest` | `stats.stats_pgsql_query_digest` |
| `show_commands` | `stats.stats_mysql_commands_counters` | `stats.stats_pgsql_commands_counters` |
| `show_connections` | `stats.stats_mysql_connection_pool`, `stats.stats_mysql_free_connections` | `stats.stats_pgsql_connection_pool`, `stats.stats_pgsql_free_connections` |
| `show_errors` | `stats.stats_mysql_errors` | `stats.stats_pgsql_errors` (uses `sqlstate` instead of `errno`) |
| `show_users` | `stats.stats_mysql_users` | `stats.stats_pgsql_users` |
| `show_client_cache` | `stats.stats_mysql_client_host_cache` | `stats.stats_pgsql_client_host_cache` |
| `show_query_rules` | `stats.stats_mysql_query_rules` | `stats.stats_pgsql_query_rules` |
| `show_prepared_statements` | `stats.stats_mysql_prepared_statements_info` | `stats.stats_pgsql_prepared_statements_info` |
| `show_gtid` | `stats.stats_mysql_gtid_executed` | N/A |
| `show_cluster` | `stats.stats_proxysql_servers_status`, `stats.stats_proxysql_servers_metrics`, `stats.stats_proxysql_servers_checksums`, `stats.stats_proxysql_servers_clients_status` | Same (shared) |

### 2.2 Historical Data Tools

| Tool | MySQL Tables | PostgreSQL Tables |
|---|---|---|
| `show_system_history` | `system_cpu`, `system_cpu_hour`, `system_memory`, `system_memory_hour` | Same (shared) |
| `show_query_cache_history` | `mysql_query_cache`, `mysql_query_cache_hour` | N/A |
| `show_connection_history` | `mysql_connections`, `mysql_connections_hour`, `myhgm_connections`, `myhgm_connections_hour`, `history_stats_mysql_connection_pool` | N/A |
| `show_query_history` | `history_mysql_query_digest` | `history_pgsql_query_digest` |

### 2.3 Utility Tools

| Tool | MySQL Tables | PostgreSQL Tables |
|---|---|---|
| `flush_query_log` | `stats.stats_mysql_query_events`, `history_mysql_query_events` | N/A |
| `show_query_log` | `stats.stats_mysql_query_events`, `history_mysql_query_events` | N/A |
| `flush_queries` | `history_mysql_query_digest` | `history_pgsql_query_digest` |

### 2.4 Column Naming: MySQL vs PostgreSQL

ProxySQL uses different column names for the same concept between MySQL and PostgreSQL:

| Concept | MySQL Column | PostgreSQL Column | API Field |
|---------|--------------|-------------------|-----------|
| Database/Schema | `schemaname` | `database` | `database` |
| Error Code | `errno` | `sqlstate` | `errno`/`sqlstate` |
| Process DB | `db` | `database` | `database` |

**Implementation Note:** The history table `history_pgsql_query_digest` uses `schemaname` (matching MySQL convention) rather than `database`, creating an inconsistency with the live `stats_pgsql_query_digest` table. Implementation must handle this when building queries for PostgreSQL.

---

## 3. Data Flow Patterns

### 3.1 Query Events Flow

Query events use a circular buffer that must be explicitly flushed to tables.

```text
┌─────────────────────────────────────────────────────────────────┐
│                     Query Execution                              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              MySQL_Logger::log_request()                         │
│              Creates MySQL_Event, adds to circular buffer        │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              Circular Buffer (MyLogCB)                           │
│              Size controlled by eventslog_table_memory_size      │
│              Events accumulate until flushed                     │
└─────────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┼───────────────┐
              │               │               │
    DUMP TO MEMORY    DUMP TO DISK    DUMP TO BOTH
              │               │               │
              ▼               ▼               ▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ stats_mysql_    │ │ history_mysql_  │ │     Both        │
│ query_events    │ │ query_events    │ │    tables       │
│ (in-memory,     │ │ (on-disk,       │ │                 │
│  capped size)   │ │  append-only)   │ │                 │
└─────────────────┘ └─────────────────┘ └─────────────────┘
```

**Implementation for `flush_query_log`:**

This tool calls `GloMyLogger->processEvents()` directly (see [Section 1.3](#13-admin-commands-vs-direct-function-calls)).

```cpp
json Stats_Tool_Handler::handle_flush_query_log(const json& arguments) {
    std::string destination = arguments.value("destination", "memory");

    if (destination != "memory" && destination != "disk" && destination != "both") {
        return create_error_response("Invalid destination");
    }

    if (!GloMyLogger || !GloAdmin) {
        return create_error_response("Required components not available");
    }

    SQLite3DB* statsdb = nullptr;
    SQLite3DB* statsdb_disk = nullptr;

    if (destination == "memory" || destination == "both") {
        statsdb = GloAdmin->statsdb;
    }
    if (destination == "disk" || destination == "both") {
        statsdb_disk = GloAdmin->statsdb_disk;
    }

    int events_flushed = GloMyLogger->processEvents(statsdb, statsdb_disk);

    json result;
    result["events_flushed"] = events_flushed;
    result["destination"] = destination;
    return create_success_response(result);
}
```

### 3.2 Query Digest Flow

Query digest statistics are maintained in an in-memory hash map, not SQLite.

```text
┌─────────────────────────────────────────────────────────────────┐
│                     Query Completes                              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              Query_Processor::update_query_digest()              │
│              Updates digest_umap (hash map in memory)            │
│              Aggregates: count_star, sum_time, min/max, rows     │
└─────────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────────────┴─────────────────────┐
        │                                           │
   SELECT query                           SAVE TO DISK
   (non-destructive)                      (destructive)
        │                                           │
        ▼                                           ▼
┌─────────────────────────┐         ┌─────────────────────────────┐
│ get_query_digests_v2()  │         │ FlushDigestTableToDisk()    │
│ - Swap map with empty   │         │ - get_query_digests_reset() │
│ - Serialize to SQLite   │         │ - Atomic swap (empties map) │
│ - Merge back            │         │ - Write to history table    │
│ - Data preserved        │         │ - Delete swapped data       │
└─────────────────────────┘         │ - Map starts fresh          │
                                    └─────────────────────────────┘
```

**Key Implementation Notes:**

1. **Reading live data (`show_queries`):** Non-destructive. ProxySQL handles the swap-serialize-merge internally when you query `stats_mysql_query_digest`.

2. **Saving to history (`flush_queries`):** Destructive. The live map is emptied. This tool calls `FlushDigestTableToDisk()` directly (see [Section 1.3](#13-admin-commands-vs-direct-function-calls)).

```cpp
json Stats_Tool_Handler::handle_flush_queries(const json& arguments) {
    std::string db_type = arguments.value("db_type", "mysql");

    if (db_type != "mysql" && db_type != "pgsql") {
        return create_error_response("Invalid db_type");
    }

    if (!GloAdmin || !GloAdmin->statsdb_disk) {
        return create_error_response("Stats disk database not available");
    }

    int digests_saved;
    if (db_type == "mysql") {
        digests_saved = GloAdmin->FlushDigestTableToDisk<SERVER_TYPE_MYSQL>(GloAdmin->statsdb_disk);
    } else {
        digests_saved = GloAdmin->FlushDigestTableToDisk<SERVER_TYPE_PGSQL>(GloAdmin->statsdb_disk);
    }

    json result;
    result["db_type"] = db_type;
    result["digests_saved"] = digests_saved;
    result["dump_time"] = (long long)time(NULL);
    return create_success_response(result);
}
```

### 3.3 Historical Tables Flow

Historical tables are populated by periodic timers and aggregated into hourly tables.

```text
┌─────────────────────────────────────────────────────────────────┐
│              Admin Thread Timer Check                            │
│              (every poll cycle)                                  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              *_timetoget(curtime) returns true?                  │
│              (checks if interval has elapsed)                    │
└─────────────────────────────────────────────────────────────────┘
                              │ yes
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              Collect current metrics                             │
│              (e.g., system_cpu from times())                     │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              INSERT INTO raw table                               │
│              (e.g., system_cpu)                                  │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              Check if hourly aggregation needed                  │
│              (current time >= last_hour_entry + 3600)            │
└─────────────────────────────────────────────────────────────────┘
                              │ yes
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              INSERT INTO *_hour SELECT ... GROUP BY              │
│              (aggregation: SUM/AVG/MAX depending on column)      │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│              DELETE old data                                     │
│              - Raw: older than 7 days                            │
│              - Hourly: older than 365 days                       │
└─────────────────────────────────────────────────────────────────┘
```

---

## 4. Interval-to-Table Resolution

Historical tools accept user-friendly interval parameters and automatically select the appropriate table.

### 4.1 Interval Mapping

| User Interval | Seconds | Table Type | Rationale |
|---|---|---|---|
| `30m` | 1800 | Raw | Fine-grained, small dataset |
| `1h` | 3600 | Raw | Fine-grained, small dataset |
| `2h` | 7200 | Raw | Fine-grained, moderate dataset |
| `4h` | 14400 | Raw | Raw data still manageable |
| `6h` | 21600 | Raw | Raw data still manageable |
| `8h` | 28800 | Hourly | Hourly aggregation preferred |
| `12h` | 43200 | Hourly | Hourly aggregation preferred |
| `1d` | 86400 | Hourly | Raw would have ~1440 rows, hourly has 24 |
| `3d` | 259200 | Hourly | Hourly aggregation more efficient |
| `7d` | 604800 | Hourly | Raw data may not exist (7-day retention) |
| `30d` | 2592000 | Hourly | Raw data doesn't exist this far back |
| `90d` | 7776000 | Hourly | Raw data doesn't exist this far back |

### 4.2 Implementation

```cpp
struct IntervalConfig {
    int seconds;
    bool use_hourly;
};

std::map<std::string, IntervalConfig> interval_map = {
    {"30m",  {1800, false}},
    {"1h",   {3600, false}},
    {"2h",   {7200, false}},
    {"4h",   {14400, false}},
    {"6h",   {21600, false}},
    {"8h",   {28800, true}},
    {"12h",  {43200, true}},
    {"1d",   {86400, true}},
    {"3d",   {259200, true}},
    {"7d",   {604800, true}},
    {"30d",  {2592000, true}},
    {"90d",  {7776000, true}}
};

std::string get_table_name(const std::string& base_table, const std::string& interval) {
    auto it = interval_map.find(interval);
    if (it == interval_map.end()) {
        return base_table; // Default to raw
    }

    if (it->second.use_hourly) {
        return base_table + "_hour";
    }
    return base_table;
}

std::string build_time_range_query(const std::string& table, int seconds) {
    time_t now = time(NULL);
    time_t start = now - seconds;

    return "SELECT * FROM " + table +
           " WHERE timestamp BETWEEN " + std::to_string(start) +
           " AND " + std::to_string(now) +
           " ORDER BY timestamp";
}
```

---

## 5. Tool Implementation Details

### 5.1 show_status

**Source Tables:**
- MySQL: `stats.stats_mysql_global`, `stats.stats_memory_metrics`
- PostgreSQL: `stats.stats_pgsql_global`, `stats.stats_memory_metrics`

**Category Mapping:**

```cpp
std::map<std::string, std::vector<std::string>> category_prefixes = {
    {"connections", {"Client_Connections_", "Server_Connections_", "Active_Transactions"}},
    {"queries", {"Questions", "Slow_queries", "GTID_", "Queries_", "Query_Processor_", "Backend_query_time_"}},
    {"commands", {"Com_"}},
    {"pool_ops", {"ConnPool_", "MyHGM_"}},
    {"monitor", {"MySQL_Monitor_", "PgSQL_Monitor_"}},
    {"query_cache", {"Query_Cache_"}},
    {"prepared_stmts", {"Stmt_"}},
    {"security", {"automatic_detected_sql_injection", "ai_", "mysql_whitelisted_"}},
    {"memory", {"_buffers_bytes", "_internal_bytes", "SQLite3_memory_bytes", "ConnPool_memory_bytes",
                "jemalloc_", "Auth_memory", "query_digest_memory", "query_rules_memory",
                "prepare_statement_", "firewall_", "stack_memory_"}},
    {"errors", {"generated_error_packets", "Access_Denied_", "client_host_error_", "mysql_unexpected_"}},
    {"logger", {"MySQL_Logger_"}},
    {"system", {"ProxySQL_Uptime", "MySQL_Thread_Workers", "PgSQL_Thread_Workers",
                "Servers_table_version", "mysql_listener_paused", "pgsql_listener_paused", "OpenSSL_"}},
    {"mirror", {"Mirror_"}}
};
```

**SQL Query:**

```sql
-- For category filter
SELECT Variable_Name, Variable_Value
FROM stats.stats_mysql_global
WHERE Variable_Name LIKE 'Client_Connections_%'
   OR Variable_Name LIKE 'Server_Connections_%'
   OR Variable_Name = 'Active_Transactions';

-- For variable_name filter (using LIKE)
SELECT Variable_Name, Variable_Value
FROM stats.stats_mysql_global
WHERE Variable_Name LIKE ?;

-- Also query memory_metrics for 'memory' category
SELECT Variable_Name, Variable_Value
FROM stats.stats_memory_metrics;
```

**Description Lookup:**

Maintain a static map of variable descriptions:

```cpp
std::map<std::string, std::string> variable_descriptions = {
    {"Client_Connections_connected", "Currently connected clients"},
    {"Client_Connections_created", "Total client connections ever created"},
    {"Questions", "Total queries processed"},
    // ... etc
};
```

### 5.2 show_processlist

**Source Tables:**
- MySQL: `stats.stats_mysql_processlist`
- PostgreSQL: `stats.stats_pgsql_processlist`

**SQL Query:**

```sql
SELECT ThreadID, SessionID, user, db, cli_host, cli_port,
       hostgroup, l_srv_host, l_srv_port, srv_host, srv_port,
       command, time_ms, info, status_flags, extended_info
FROM stats.stats_mysql_processlist
WHERE (user = ? OR ? IS NULL)
  AND (hostgroup = ? OR ? IS NULL)
  AND (time_ms >= ? OR ? IS NULL)
ORDER BY time_ms DESC
LIMIT ? OFFSET ?;
```

**Note:** The `l_srv_host` and `l_srv_port` columns represent the local ProxySQL interface, while `srv_host` and `srv_port` represent the backend server.

**Summary Aggregation:**

```cpp
json build_summary(const json& sessions) {
    std::map<std::string, int> by_user, by_hostgroup, by_command;

    for (const auto& session : sessions) {
        by_user[session["user"].get<std::string>()]++;
        by_hostgroup[std::to_string(session["hostgroup"].get<int>())]++;
        by_command[session["command"].get<std::string>()]++;
    }

    json summary;
    summary["by_user"] = by_user;
    summary["by_hostgroup"] = by_hostgroup;
    summary["by_command"] = by_command;
    return summary;
}
```

### 5.3 show_queries

**Source Tables:**
- MySQL: `stats.stats_mysql_query_digest` (uses `schemaname` column)
- PostgreSQL: `stats.stats_pgsql_query_digest` (uses `database` column)

**SQL Query (MySQL):**

```sql
SELECT hostgroup, schemaname AS database, username, client_address, digest,
       digest_text, count_star, first_seen, last_seen,
       sum_time AS sum_time_us, min_time AS min_time_us, max_time AS max_time_us,
       sum_rows_affected, sum_rows_sent
FROM stats.stats_mysql_query_digest
WHERE (count_star >= ? OR ? IS NULL)
  AND (hostgroup = ? OR ? IS NULL)
  AND (username = ? OR ? IS NULL)
  AND (schemaname = ? OR ? IS NULL)  -- database parameter maps to schemaname column
  AND (digest = ? OR ? IS NULL)
  AND (sum_time / count_star >= ? OR ? IS NULL)
ORDER BY count_star DESC
LIMIT ? OFFSET ?;
```

**SQL Query (PostgreSQL):**

```sql
SELECT hostgroup, database, username, client_address, digest,
       digest_text, count_star, first_seen, last_seen,
       sum_time AS sum_time_us, min_time AS min_time_us, max_time AS max_time_us,
       sum_rows_affected, sum_rows_sent
FROM stats.stats_pgsql_query_digest
WHERE (count_star >= ? OR ? IS NULL)
  AND (hostgroup = ? OR ? IS NULL)
  AND (username = ? OR ? IS NULL)
  AND (database = ? OR ? IS NULL)  -- database parameter maps to database column
  AND (digest = ? OR ? IS NULL)
  AND (sum_time / count_star >= ? OR ? IS NULL)
ORDER BY count_star DESC
LIMIT ? OFFSET ?;
```

**Calculated Fields:**

```cpp
for (auto& query : queries) {
    int count = query["count_star"].get<int>();
    int sum_time = query["sum_time_us"].get<int>();
    query["avg_time_us"] = count > 0 ? sum_time / count : 0;
}
```

### 5.4 show_commands

**Source Tables:**
- MySQL: `stats.stats_mysql_commands_counters`
- PostgreSQL: `stats.stats_pgsql_commands_counters`

**SQL Query:**

```sql
SELECT Command, Total_Time_us, Total_cnt,
       cnt_100us, cnt_500us, cnt_1ms, cnt_5ms, cnt_10ms, cnt_50ms,
       cnt_100ms, cnt_500ms, cnt_1s, cnt_5s, cnt_10s, cnt_INFs
FROM stats.stats_mysql_commands_counters
WHERE Command = ? OR ? IS NULL;
```

**Percentile Calculation:**

See [Section 6.1](#61-percentile-calculation-from-histograms).

### 5.5 show_connections

**Source Tables:**
- MySQL: `stats.stats_mysql_connection_pool`, `stats.stats_mysql_free_connections`
- PostgreSQL: `stats.stats_pgsql_connection_pool`, `stats.stats_pgsql_free_connections`

**SQL Query (main):**

```sql
SELECT hostgroup, srv_host, srv_port, status,
       ConnUsed, ConnFree, ConnOK, ConnERR, MaxConnUsed,
       Queries, Queries_GTID_sync, Bytes_data_sent, Bytes_data_recv, Latency_us
FROM stats.stats_mysql_connection_pool
WHERE (hostgroup = ? OR ? IS NULL)
  AND (status = ? OR ? IS NULL)
ORDER BY hostgroup, srv_host, srv_port;
```

**SQL Query (detail - MySQL):**

```sql
SELECT fd, hostgroup, srv_host, srv_port, user, schema AS database,
       init_connect, time_zone, sql_mode, autocommit, idle_ms
FROM stats.stats_mysql_free_connections
WHERE (hostgroup = ? OR ? IS NULL);
```

**SQL Query (detail - PostgreSQL):**

```sql
SELECT fd, hostgroup, srv_host, srv_port, user, database,
       init_connect, time_zone, sql_mode, idle_ms
FROM stats.stats_pgsql_free_connections
WHERE (hostgroup = ? OR ? IS NULL);
```

**PostgreSQL Notes:**
- The `stats_pgsql_free_connections` table uses `database` column (MySQL uses `schema`)
- The `stats_pgsql_free_connections` table does not have the `autocommit` column
- The `stats_pgsql_connection_pool` table does not have the `Queries_GTID_sync` column

**Calculated Fields:**

```cpp
for (auto& server : servers) {
    int used = server["conn_used"].get<int>();
    int free = server["conn_free"].get<int>();
    int total = used + free;

    server["utilization_pct"] = total > 0 ? (double)used / total * 100 : 0;

    int ok = server["conn_ok"].get<int>();
    int err = server["conn_err"].get<int>();
    int total_conns = ok + err;

    server["error_rate"] = total_conns > 0 ? (double)err / total_conns : 0;
}
```

### 5.6 show_errors

**Source Tables:**
- MySQL: `stats.stats_mysql_errors` (uses `schemaname` column, `errno` for error codes)
- PostgreSQL: `stats.stats_pgsql_errors` (uses `database` column, `sqlstate` for error codes)

**SQL Query (MySQL):**

```sql
SELECT hostgroup, hostname, port, username, client_address,
       schemaname AS database, errno, count_star, first_seen, last_seen, last_error
FROM stats.stats_mysql_errors
WHERE (count_star >= ? OR ? IS NULL)
  AND (errno = ? OR ? IS NULL)
  AND (username = ? OR ? IS NULL)
  AND (schemaname = ? OR ? IS NULL)  -- database parameter maps to schemaname column
ORDER BY count_star DESC
LIMIT ? OFFSET ?;
```

**SQL Query (PostgreSQL):**

```sql
SELECT hostgroup, hostname, port, username, client_address,
       database, sqlstate, count_star, first_seen, last_seen, last_error
FROM stats.stats_pgsql_errors
WHERE (count_star >= ? OR ? IS NULL)
  AND (sqlstate = ? OR ? IS NULL)
  AND (username = ? OR ? IS NULL)
  AND (database = ? OR ? IS NULL)  -- database parameter maps to database column
ORDER BY count_star DESC
LIMIT ? OFFSET ?;
```

**Note:** The tool normalizes to `database` field name in responses for consistency across both databases. Error codes use `errno` for MySQL and `sqlstate` for PostgreSQL as these are fundamentally different concepts.

**Calculated Fields:**

```cpp
for (auto& error : errors) {
    int count = error["count_star"].get<int>();
    int first = error["first_seen"].get<int>();
    int last = error["last_seen"].get<int>();

    double hours = (last - first) / 3600.0;
    error["frequency_per_hour"] = hours > 0 ? count / hours : count;
}
```

### 5.7 show_cluster

**Source Tables (shared):**
- `stats.stats_proxysql_servers_status`
- `stats.stats_proxysql_servers_metrics`
- `stats.stats_proxysql_servers_checksums`

**SQL Queries:**

```sql
-- Node status
SELECT hostname, port, weight, master, global_version,
       check_age_us, ping_time_us, checks_OK, checks_ERR
FROM stats.stats_proxysql_servers_status
WHERE hostname = ? OR ? IS NULL;

-- Node metrics
SELECT hostname, port, weight, response_time_ms, Uptime_s,
       last_check_ms, Queries, Client_Connections_connected, Client_Connections_created
FROM stats.stats_proxysql_servers_metrics;

-- Configuration checksums
SELECT hostname, port, name, version, epoch, checksum,
       changed_at, updated_at, diff_check
FROM stats.stats_proxysql_servers_checksums;
```

**Health Calculation:**

```cpp
std::string calculate_cluster_health(const json& nodes) {
    int total = nodes.size();
    int healthy = 0;

    for (const auto& node : nodes) {
        int ok = node["checks_ok"].get<int>();
        int err = node["checks_err"].get<int>();
        double success_rate = (ok + err) > 0 ? (double)ok / (ok + err) : 0;

        if (success_rate >= 0.95) healthy++;
    }

    if (healthy == total) return "healthy";
    if (healthy >= total / 2) return "degraded";
    return "unhealthy";
}
```

### 5.8 show_connection_history

**Source Tables:**
- Global: `mysql_connections`, `mysql_connections_hour`, `myhgm_connections`, `myhgm_connections_hour`
- Per-server: `history_stats_mysql_connection_pool`

**SQL Queries:**

```sql
-- Global connections (raw)
SELECT timestamp, Client_Connections_aborted, Client_Connections_connected,
       Client_Connections_created, Server_Connections_aborted, Server_Connections_connected,
       Server_Connections_created, ConnPool_get_conn_failure, ConnPool_get_conn_immediate,
       ConnPool_get_conn_success, Questions, Slow_queries, GTID_consistent_queries
FROM mysql_connections
WHERE timestamp BETWEEN ? AND ?
ORDER BY timestamp;

-- Global connections (hourly)
SELECT timestamp, Client_Connections_aborted, Client_Connections_connected,
       Client_Connections_created, Server_Connections_aborted, Server_Connections_connected,
       Server_Connections_created, ConnPool_get_conn_failure, ConnPool_get_conn_immediate,
       ConnPool_get_conn_success, Questions, Slow_queries, GTID_consistent_queries
FROM mysql_connections_hour
WHERE timestamp BETWEEN ? AND ?
ORDER BY timestamp;

-- MyHGM connections (raw)
SELECT timestamp, MyHGM_myconnpoll_destroy, MyHGM_myconnpoll_get,
       MyHGM_myconnpoll_get_ok, MyHGM_myconnpoll_push, MyHGM_myconnpoll_reset
FROM myhgm_connections
WHERE timestamp BETWEEN ? AND ?
ORDER BY timestamp;

-- Per-server history
SELECT timestamp, hostgroup, srv_host, srv_port, status,
       ConnUsed, ConnFree, ConnOK, ConnERR, MaxConnUsed,
       Queries, Queries_GTID_sync, Bytes_data_sent, Bytes_data_recv, Latency_us
FROM history_stats_mysql_connection_pool
WHERE timestamp BETWEEN ? AND ?
  AND (hostgroup = ? OR ? IS NULL)
ORDER BY timestamp, hostgroup, srv_host;
```

### 5.9 show_query_history

**Source Tables:**
- MySQL: `history_mysql_query_digest` (uses `schemaname` column)
- PostgreSQL: `history_pgsql_query_digest` (uses `schemaname` column)

**Note:** Both MySQL and PostgreSQL history tables use `schemaname` column. This differs from the live `stats_pgsql_query_digest` table which uses `database`. The tool normalizes to `database` in responses.

**SQL Query (MySQL):**

```sql
SELECT dump_time, hostgroup, schemaname AS database, username, client_address,
       digest, digest_text, count_star, first_seen, last_seen,
       sum_time AS sum_time_us, min_time AS min_time_us, max_time AS max_time_us,
       sum_rows_affected, sum_rows_sent
FROM history_mysql_query_digest
WHERE (dump_time = ? OR ? IS NULL)
  AND (dump_time >= ? OR ? IS NULL)
  AND (dump_time <= ? OR ? IS NULL)
  AND (digest = ? OR ? IS NULL)
  AND (username = ? OR ? IS NULL)
  AND (schemaname = ? OR ? IS NULL)  -- database parameter maps to schemaname column
ORDER BY dump_time DESC, count_star DESC
LIMIT ? OFFSET ?;
```

**SQL Query (PostgreSQL):**

```sql
-- Note: history_pgsql_query_digest uses 'schemaname' (unlike live stats_pgsql_query_digest which uses 'database')
SELECT dump_time, hostgroup, schemaname AS database, username, client_address,
       digest, digest_text, count_star, first_seen, last_seen,
       sum_time AS sum_time_us, min_time AS min_time_us, max_time AS max_time_us,
       sum_rows_affected, sum_rows_sent
FROM history_pgsql_query_digest
WHERE (dump_time = ? OR ? IS NULL)
  AND (dump_time >= ? OR ? IS NULL)
  AND (dump_time <= ? OR ? IS NULL)
  AND (digest = ? OR ? IS NULL)
  AND (username = ? OR ? IS NULL)
  AND (schemaname = ? OR ? IS NULL)  -- database parameter maps to schemaname column
ORDER BY dump_time DESC, count_star DESC
LIMIT ? OFFSET ?;
```

**Grouping by Snapshot:**

```cpp
json group_by_snapshot(SQLite3_result* resultset) {
    std::map<int, json> snapshots;

    for (each row in resultset) {
        int dump_time = atoi(row->fields[0]);
        if (snapshots.find(dump_time) == snapshots.end()) {
            snapshots[dump_time] = json::array();
        }
        snapshots[dump_time].push_back(row_to_json(row));
    }

    json result = json::array();
    for (const auto& [dump_time, queries] : snapshots) {
        json snapshot;
        snapshot["dump_time"] = dump_time;
        snapshot["queries"] = queries;
        result.push_back(snapshot);
    }
    return result;
}
```

### 5.10 show_query_log

**Source Tables:**
- Memory: `stats.stats_mysql_query_events`
- Disk: `history_mysql_query_events`

**Note:** This tool is MySQL-only. The `id` column is used internally for row management and is not exposed in the response.

**SQL Query:**

```sql
SELECT thread_id, username, schemaname AS database, start_time, end_time,
       query_digest, query, server, client, event_type, hid,
       extra_info, affected_rows, last_insert_id, rows_sent,
       client_stmt_id, gtid, errno, error
FROM stats.stats_mysql_query_events  -- or history_mysql_query_events for disk
WHERE (username = ? OR ? IS NULL)
  AND (schemaname = ? OR ? IS NULL)  -- database parameter maps to schemaname column
  AND (query_digest = ? OR ? IS NULL)
  AND (server = ? OR ? IS NULL)
  AND (errno = ? OR ? IS NULL)
  AND (errno != 0 OR ? = 0)  -- errors_only filter
  AND (start_time >= ? OR ? IS NULL)
  AND (start_time <= ? OR ? IS NULL)
ORDER BY start_time DESC
LIMIT ? OFFSET ?;
```

---

## 6. Helper Functions

### 6.1 Percentile Calculation from Histograms

The `stats_mysql_commands_counters` table provides latency histograms. To calculate percentiles:

```cpp
struct HistogramBucket {
    int threshold_us;
    int count;
};

std::vector<int> bucket_thresholds = {
    100, 500, 1000, 5000, 10000, 50000, 100000, 500000, 1000000, 5000000, 10000000, INT_MAX
};

int calculate_percentile(const std::vector<int>& bucket_counts, double percentile) {
    if (bucket_counts.empty() || bucket_thresholds.empty()) {
        return 0;
    }

    if (percentile < 0.0) {
        percentile = 0.0;
    } else if (percentile > 1.0) {
        percentile = 1.0;
    }

    long long total = 0;
    for (int count : bucket_counts) {
        if (count > 0) {
            total += count;
        }
    }

    if (total == 0) {
        return 0;
    }

    if (percentile == 0.0) {
        for (size_t i = 0; i < bucket_counts.size() && i < bucket_thresholds.size(); i++) {
            if (bucket_counts[i] > 0) {
                return bucket_thresholds[i];
            }
        }
        return 0;
    }

    long long target = std::ceil(total * percentile);
    if (target < 1) target = 1;
    long long cumulative = 0;

    for (size_t i = 0; i < bucket_counts.size() && i < bucket_thresholds.size(); i++) {
        if (bucket_counts[i] > 0) {
            cumulative += bucket_counts[i];
        }
        if (cumulative >= target) {
            return bucket_thresholds[i];
        }
    }

    return bucket_thresholds.empty() ? 0 : bucket_thresholds.back();
}

json calculate_percentiles(SQLite3_row* row) {
    std::vector<int> counts = {
        atoi(row->fields[3]),   // cnt_100us
        atoi(row->fields[4]),   // cnt_500us
        atoi(row->fields[5]),   // cnt_1ms
        atoi(row->fields[6]),   // cnt_5ms
        atoi(row->fields[7]),   // cnt_10ms
        atoi(row->fields[8]),   // cnt_50ms
        atoi(row->fields[9]),   // cnt_100ms
        atoi(row->fields[10]),  // cnt_500ms
        atoi(row->fields[11]),  // cnt_1s
        atoi(row->fields[12]),  // cnt_5s
        atoi(row->fields[13]),  // cnt_10s
        atoi(row->fields[14])   // cnt_INFs
    };

    json percentiles;
    percentiles["p50_us"] = calculate_percentile(counts, 0.50);
    percentiles["p90_us"] = calculate_percentile(counts, 0.90);
    percentiles["p95_us"] = calculate_percentile(counts, 0.95);
    percentiles["p99_us"] = calculate_percentile(counts, 0.99);

    return percentiles;
}
```

### 6.2 SQLite Result to JSON Conversion

```cpp
json resultset_to_json(SQLite3_result* resultset, int cols) {
    json rows = json::array();

    if (!resultset || resultset->rows_count == 0) {
        return rows;
    }

    for (size_t i = 0; i < resultset->rows_count; i++) {
        SQLite3_row* row = resultset->rows[i];
        json obj;

        for (int j = 0; j < cols; j++) {
            const char* field = row->fields[j];
            const char* column = resultset->column_definition[j]->name;

            if (field == nullptr) {
                obj[column] = nullptr;
            } else if (is_numeric(field)) {
                // Try to parse as integer first, then as double
                char* endptr;
                long long ll = strtoll(field, &endptr, 10);
                if (*endptr == '\0') {
                    obj[column] = ll;
                } else {
                    obj[column] = std::stod(field);
                }
            } else {
                obj[column] = field;
            }
        }
        rows.push_back(obj);
    }

    return rows;
}

bool is_numeric(const char* str) {
    if (str == nullptr || *str == '\0') return false;

    char* endptr;
    strtod(str, &endptr);
    return *endptr == '\0';
}
```

### 6.3 Time Range Builder

```cpp
std::pair<time_t, time_t> get_time_range(const std::string& interval) {
    auto it = interval_map.find(interval);
    if (it == interval_map.end()) {
        throw std::invalid_argument("Invalid interval: " + interval);
    }

    time_t now = time(NULL);
    time_t start = now - it->second.seconds;

    return {start, now};
}
```

---

## 7. Error Handling Patterns

### 7.1 Standard Error Response

```cpp
json create_error_response(const std::string& message) {
    json response;
    response["success"] = false;
    response["error"] = message;
    return response;
}

json create_success_response(const json& result) {
    json response;
    response["success"] = true;
    response["result"] = result;
    return response;
}
```

### 7.2 Common Error Scenarios

**Database Query Failure:**

```cpp
if (rc != SQLITE_OK) {
    std::string err_msg = error ? error : "Query execution failed";
    if (error) free(error);
    return create_error_response(err_msg);
}
```

**Invalid Parameters:**

```cpp
if (!arguments.contains("required_param")) {
    return create_error_response("Missing required parameter: required_param");
}

std::string value = arguments["param"];
if (!is_valid_value(value)) {
    return create_error_response("Invalid value for parameter 'param': " + value);
}
```

**PostgreSQL Not Supported:**

```cpp
std::string db_type = arguments.value("db_type", "mysql");
if (db_type == "pgsql") {
    return create_error_response("PostgreSQL is not supported for this tool. Historical connection data is only available for MySQL.");
}
```

**Empty Result Set:**

```cpp
if (!resultset || resultset->rows_count == 0) {
    json result;
    result["message"] = "No data found";
    result["data"] = json::array();
    return create_success_response(result);
}
```

---

## 8. Testing Strategies

### 8.1 Unit Tests

Test each handler function independently:

```cpp
TEST(StatsToolHandler, ShowStatus) {
    Stats_Tool_Handler handler(GloMCPH);
    handler.init();

    json args;
    args["db_type"] = "mysql";
    args["category"] = "connections";

    json response = handler.execute_tool("show_status", args);

    ASSERT_TRUE(response["success"].get<bool>());
    ASSERT_TRUE(response["result"].contains("variables"));
    ASSERT_GT(response["result"]["variables"].size(), 0);
}

TEST(StatsToolHandler, ShowStatusWithVariableFilter) {
    Stats_Tool_Handler handler(GloMCPH);
    handler.init();

    json args;
    args["db_type"] = "mysql";
    args["variable_name"] = "Client_Connections_%";

    json response = handler.execute_tool("show_status", args);

    ASSERT_TRUE(response["success"].get<bool>());
    for (const auto& var : response["result"]["variables"]) {
        std::string name = var["variable_name"].get<std::string>();
        ASSERT_TRUE(name.find("Client_Connections_") == 0);
    }
}
```

### 8.2 Integration Tests

Test with actual ProxySQL instance:

```bash
# Start ProxySQL with test configuration
proxysql -f -c test_proxysql.cnf &

# Generate some traffic
mysql -h 127.0.0.1 -P6033 -utest -ptest -e "SELECT 1" &

# Test MCP endpoint
curl -X POST http://localhost:6071/mcp/stats \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer test-token" \
  -d '{
    "jsonrpc": "2.0",
    "method": "tools/call",
    "params": {
      "name": "show_queries",
      "arguments": {"db_type": "mysql", "limit": 10}
    },
    "id": 1
  }'

# Verify response structure
# ...
```

### 8.3 Test Data Setup

```sql
-- Populate test data via admin interface
-- (Note: Most stats tables are read-only and populated by ProxySQL internally)

-- For testing historical tables, wait for timer-based collection
-- or manually trigger collection via internal mechanisms

-- For testing query events, generate queries and then flush
SELECT 1;
SELECT 2;
-- Admin: DUMP EVENTSLOG FROM BUFFER TO MEMORY;
```

### 8.4 Edge Cases to Test

1. **Empty tables** — Ensure graceful handling when no data exists
2. **Large result sets** — Test with limit parameter, verify truncation
3. **Invalid parameters** — Test error responses for bad input
4. **PostgreSQL fallback** — Test error messages for unsupported PostgreSQL operations
5. **Time range boundaries** — Test historical queries at retention boundaries (7 days, 365 days)
6. **Concurrent access** — Test behavior under concurrent tool calls