Replace direct stats-schema reads in MCP show_processlist with in-memory processlist access via SQL3_Processlist query options.
Add processlist query option types in proxysql_admin interfaces and implement filtering, sorting, and pagination post-processing in both MySQL and PgSQL thread implementations.
Introduce configurable MCP cap mcp_stats_show_processlist_max_rows (default 200, max 1000), wire it through MCP variable loading, and expose it in default config.
Expand Stats_Tool_Handler processlist handling to parse filter/sort arguments, enforce cap metadata, and return consistent MCP payload fields from in-memory snapshots.
Add extensive doxygen documentation in the touched headers and source blocks to describe API contracts, filtering behavior, and runtime constraints.
Add new TAP test mcp_pgsql_concurrency_stress-t that generates sustained concurrent PgSQL traffic (simple reads, read/write table workload, randomized pg_sleep) while polling MCP show_processlist and show_queries in parallel with filter and ordering assertions.
Validation performed locally: mcp_pgsql_concurrency_stress-t (31 assertions) and mcp_show_queries_topk-t (12 assertions).
- Finalized MySQL and PostgreSQL protocol state machines.
- Implemented accurate affected_rows and rows_sent capture.
- Added defensive null checks to prevent early-session crashes.
- Enhanced TAP tests with result-set draining and success verification.
- Verified 100% pass rate for MySQL CRUD, binary protocol, and memory bypass.
Replace stats table reads in show_queries with Query Processor in-memory aggregation to avoid stale/empty admin-db snapshots.\n\nIntroduce Top-K digest filtering primitives and wire them into Stats_Tool_Handler so filtering/sorting/limit are executed against live digest memory.
Add runtime MCP variable mcp-stats_show_queries_max_rows with validation, defaulting, and configuration plumbing, plus cap metadata in the tool response.\n\nEnforce configurable hard cap semantics while preserving deterministic ordering and pagination behavior for MCP clients.
Add internal DEBUG validator path (PROXYSQLTEST 56) in admin tests to stress generated digest data and validate the Top-K/filter pipeline against in-memory state.\n\nKeep the validator callable from startup/testing workflows without depending on MCP linkage.
Add TAP coverage in test/tap/tests/mcp_show_queries_topk-t.cpp for:\n- MCP endpoint setup and reachability\n- digest generation + optional DEBUG internal validator\n- payload validation and cap metadata assertions\n- descending count order validation\n- match_digest_text filter verification\n\nMake payload parsing backward-compatible in TAP (direct tool result object and legacy wrapped success/result payloads).
Problem
MCP stats tools were querying stats.* tables directly through admindb.
Unlike admin-session SQL, that path bypassed GenericRefreshStatistics() and could
return stale or empty data for runtime-populated tables.
What changed
- Updated Stats_Tool_Handler::execute_admin_query() to mirror admin-session
semantics for stats reads:
- acquire GloAdmin->sql_query_global_mutex
- optionally invoke ProxySQL_Admin::GenericRefreshStatistics(sql, ..., false)
- execute admindb statement
- release sql_query_global_mutex
- Added strict input validation and explicit lock/unlock error reporting.
- Added refresh_before_query parameter (default true) to avoid duplicate refresh
passes for secondary count queries.
- Switched secondary COUNT(*) calls in show_processlist/show_queries/
show_errors/show_query_rules to refresh_before_query=false.
Documentation
- Expanded Doxygen in include/Stats_Tool_Handler.h for execute_admin_query()
to document locking, refresh behavior, and performance tradeoff.
- Added detailed Doxygen above the execute_admin_query() implementation in
lib/Stats_Tool_Handler.cpp.
Testing
- Added TAP integration test test/tap/tests/mcp_stats_refresh-t.cpp.
- The test injects a synthetic stale marker row into stats.stats_mysql_global,
calls /mcp/stats show_status, and verifies the marker disappears after
refresh-before-read.
- Added helper-level Doxygen in the test for setup/parsing behavior.
Build verification
- Compiled lib object: make -C lib obj/Stats_Tool_Handler.oo
- Compiled TAP target: make -C test/tap/tests mcp_stats_refresh-t
- Direct runtime execution of the TAP test in this workspace could not fully run
because no local admin listener was available (connection to 127.0.0.1:6032 failed).
- Resolved signature mismatch in report_query_stats
- Fixed missing type declarations for Query_Processor in FFTO classes
- Verified Metric Parity for affected_rows and rows_sent
- Confirmed full coverage in TAP tests for text/binary protocols and bypass logic
- Verified clean build with make debug
Address two minor correctness/style follow-ups from final review:
1) `PgSQL_Event::set_error()` overwrite safety
- Free previous `sqlstate` / `errmsg` buffers when either ownership mode is active
(`free_on_delete` or `free_error_on_delete`) before replacing pointers.
- This avoids latent leaks if `set_error()` is called on deep-copied events.
2) Explicit atomic access for circular-buffer capacity
- Keep `buffer_size` atomic and use explicit `load/store` operations with
`std::memory_order_relaxed` in hot-path and helper methods.
- Update log-path check, insert path, getter, and setter to use explicit atomic
operations for clearer thread-safety semantics.
- Implemented MySQL OK_Packet parsing to extract affected_rows
- Implemented PostgreSQL CommandComplete parsing to extract rows affected/sent
- Updated TAP tests to validate sum_rows_affected and sum_rows_sent metrics
- Fixed incorrect assertion in test_ffto_mysql-t.cpp
- Introduce proxysql_query_logger_logged_queries_total with protocol label
(mysql, pgsql) for direct query-log accounting via Prometheus.
- Wire MySQL and PgSQL loggers to increment/export the counter and hook
PgSQL logger metrics into admin scrape updates.
- Add reg_test_5389-flush_logs_no_drop-t.cpp to stress concurrent
client traffic + PROXYSQL FLUSH LOGS and assert no dropped logs via
metric delta == executed queries.
- Document the new TAP with doxygen comments for env knobs and flow.
Admin command behavior:
- `DUMP PGSQL EVENTSLOG ...` returned `OK 0` when `GloPgSQL_Logger` was null.
- This masked unavailable logger state from operators.
Change:
- Return an explicit admin error (`PgSQL logger not initialized`) when logger is unavailable.
- Preserve normal `OK` response path when logger is present.
Stats path behavior:
- PgSQL logger metrics insertion into `stats_pgsql_global` used per-row string formatting and `execute()` calls.
Change:
- Replace per-row dynamic SQL construction with a prepared-statement row insert path.
- Reuse `sqlite3_global_stats_row_step()` for consistent binding/reset semantics.
Problem 1:
- `PgSQL_Event::set_error()` stored borrowed pointers from connection-owned strings.
- This could create lifetime hazards before/around event serialization and buffering.
Fix:
- Add explicit ownership tracking for error fields via `free_error_on_delete`.
- Make `set_error()` duplicate incoming SQLSTATE/error strings.
- Free owned error strings in destructor when the event does not own all fields.
- Keep copy-constructor behavior for deep-copied buffered events and disable copy-assignment to avoid accidental shallow assignment.
Problem 2:
- `processEvents()` could repopulate `stats_pgsql_query_events` using the beginning of the drained batch after retention pruning.
- This keeps older rows instead of the most recent rows when truncation is required.
Fix:
- Insert the tail of the drained batch (`events.end() - numEventsToInsert`) so memory table retention always preserves latest events.
- Tighten row retention arithmetic using `size_t` to avoid signed/unsigned mismatch in keep/delete calculations.
Context:
- Admin commands are exposed by one shared Admin module over two protocols:
- MySQL admin endpoint (`6032`)
- PostgreSQL admin endpoint (`6132`)
- `DUMP PGSQL EVENTSLOG ...` must behave identically regardless of which admin protocol executes the command.
Issue fixed:
- PGSQL events dump path used thread-local PGSQL sizing variables when draining to memory (`stats_pgsql_query_events`).
- When command execution happened from a non-PGSQL protocol admin worker, TLS values could be out of sync for PGSQL sizing decisions.
Changes:
- In `lib/PgSQL_Logger.cpp`, add runtime-size helper functions that read canonical values from `GloPTH->variables` with TLS fallback:
- `eventslog_buffer_max_query_length`
- `eventslog_table_memory_size`
- Use these helpers in:
- `PgSQL_Event` copy constructor (buffered query truncation)
- `PgSQL_Logger::processEvents()` (in-memory retention for `stats_pgsql_query_events`)
- Update internal architecture doc to explicitly state command availability from both admin protocol endpoints.
Why:
- Ensure `DUMP PGSQL EVENTSLOG ...` semantics remain consistent across `6032` and `6132`, matching the shared-admin-module design.
Context:
- Bison/Flex-generated parser files in `lib/` can be created locally during parser work.
- These files are generated artifacts and should not be tracked.
Changes:
- Add generated parser artifacts to `.gitignore`:
- `lib/MySQL_Lexer.yy.c`
- `lib/MySQL_Parser.output`
- `lib/MySQL_Parser.tab.c`
- `lib/MySQL_Parser.tab.h`
- Extend `lib/Makefile` `clean` target to remove the same generated files.
Why:
- Prevent accidental inclusion of regenerated parser outputs in feature commits.
- Keep worktrees clean and make `make clean` deterministic for parser-related changes.
- Implemented statement ID tracking for MySQL COM_STMT_PREPARE/EXECUTE
- Implemented Parse/Bind/Execute portal mapping for PostgreSQL Extended Query
- Corrected documentation regarding binary protocol support
- Ensured metric parity for prepared statements in Fast Forward mode
- Added TrafficObserver base interface
- Implemented MySQLFFTO and PgSQLFFTO classes for protocol-aware observation
- Integrated FFTO hooks into MySQL and PostgreSQL session handlers in FAST_FORWARD state
- Added configuration variables: mysql-ffto_enabled, mysql-ffto_max_buffer_size, pgsql-ffto_enabled, pgsql-ffto_max_buffer_size
- Implemented query digest extraction and reporting parity with internal stats
- Added session-level bypass when packet size exceeds buffer threshold
- Included architectural documentation in doc/ffto_design.md
This commit addresses three correctness regressions introduced by
thread-local log buffering in PR #5364 (query logging performance):
1) stale logfile pointer/UAF race during concurrent rotate/close
2) stale logfile-open state after close
3) non-global flush behavior in admin/format-switch paths
What was fixed
- Make `flush_and_rotate()` consume the current logfile pointer under lock
- Signature changed from `std::fstream*` to `std::fstream*&`
- Prevents dereferencing a stale stream pointer captured before lock
acquisition while another thread rotates/closes the file
- Updated declaration/definition and all call sites
- Add explicit synchronization for cross-thread buffer draining
- Added `LogBufferThreadContext::buffer_lock`
- Any path that appends or flushes a thread buffer now locks this mutex
- Guarantees force-flush from admin/config paths cannot race with
worker-thread appends on the same context
- Restore global forced flush semantics where required
- Extended `MySQL_Logger::flush` and `PgSQL_Logger::flush` to
`flush(bool force = false)`
- `force=false`: preserves existing low-overhead worker-loop behavior
(per-thread timeout-based flush)
- `force=true`: snapshots all known thread contexts and drains both
events/audit buffers regardless of timeout
- `flush_log()` now calls `flush(true)` before file rotation, so admin
flush and format-switch operations no longer miss pending thread buffers
- Avoid unintended rotation during forced draining
- In `force=true` path, flush uses `rotate_fn=nullptr`
- Drains buffered payload into the current file first
- `flush_log()` then performs one controlled rotate/open step
- Fix logfile state tracking after close
- `events_close_log_unlocked()` and `audit_close_log_unlocked()` now set
`logfile_open=false` when the stream is closed
- Prevents write paths from treating a closed stream as open
- Remove per-thread-context dependency during metadata header write
- `events_open_log_unlocked()` now uses a local `LogBuffer` for metadata
emission in format=1 instead of reusing a thread context buffer
- Keeps open/rotate path independent from worker context lifecycle
- Keep callers consistent and non-duplicative
- Since `flush_log()` now force-drains internally, removed redundant
explicit `flush()` calls from:
- MySQL/PgSQL `eventslog_format` switch handlers
- `ProxySQL_Admin::flush_logs`
Behavioral outcome
- No stale stream pointer use when close/rotate interleaves with flush
- No false-positive logfile-open state after close
- `FLUSH LOGS` and eventslog format switch now drain all known thread
buffers before rotating, preventing dropped/misplaced buffered records
Validation
- Built modified objects directly
- Ran full debug build with GENAI enabled:
make clean && export PROXYSQLGENAI=1 && make debug -j24
Build completed successfully.
Summary of changes:
- Fixed allocator mismatches (sqlite3_free vs free) across multiple files.
- Converted manual SQL construction to prepared statements for TSDB variables.
- Resolved memory leaks in statistics and monitoring loops.
- Fixed critical 'LOAD TSDB VARIABLES FROM CONFIG' command mismatch.
- Improved command parsing robustness using string searching.
- Standardized SQL LIKE patterns with escaped wildcards (\%).
- Switched to data-driven variable management in ProxySQL_Statistics.
- Improved web dashboard security with URL encoding and const correctness.
- Clarified documentation regarding configuration lifecycle and schema names.
- Updated TAP tests for better reliability and correct persistence verification.
The queries_sent counter in stats.stats_pgsql_connection_pool was not being incremented for prepared statement executions (ASYNC_STMT_EXECUTE_START), only for regular queries (ASYNC_QUERY_START) and statement preparation (ASYNC_STMT_PREPARE_START).
In the extended query protocol flow (Parse/Bind/Execute/Sync messages), queries without matching query rules were incorrectly retaining the current_hostgroup value from the previous query instead of being reset to the default_hostgroup.
This happened because:
- Simple query protocol goes through STATE_SLEEP between queries, which
resets current_hostgroup to default_hostgroup
- Extended query protocol does NOT go through STATE_SLEEP between
Parse/Bind/Execute messages, so current_hostgroup was never reset
Address Gemini Code Assist review feedback on PR #5386:
1. Discovery_Schema::resolve_run_id - Replace string concatenation with
replace_str escaping with proper parameterized queries using
sqlite3_prepare_v2 and sqlite3_bind_* functions.
2. PgSQL_Static_Harvester - Remove unsafe escape_sql_string() function
and replace its usages with:
- New lookup_object_id() helper using parameterized queries
- Parameterized UPDATE statements in harvest_view_definitions()
These changes ensure complete protection against SQL injection for the
catalog database, following the pattern already used in create_run().
This squashed commit consolidates the recent MCP stabilization work across query execution and static-discovery tests.
1) Query endpoint routing/executability now uses Hostgroup Manager runtime snapshots
- File: lib/Query_Tool_Handler.cpp
- Removed dependency on direct SQL probes against runtime_mysql_servers/runtime_pgsql_servers from inside the MCP query handler path.
- Target backend resolution now reuses the same Hostgroup Manager runtime snapshot APIs that feed Admin runtime server visibility:
- MyHGM->dump_table_mysql("mysql_servers")
- PgHGM->dump_table_pgsql("pgsql_servers")
- Non-executable diagnostics now summarize statuses from HGM snapshots instead of ad hoc runtime-table SQL.
2) Backend endpoint selection changed to weighted random
- File: lib/Query_Tool_Handler.cpp
- Selection remains constrained by:
- matching target hostgroup_id
- ONLINE backend status
- Candidate choice now uses random selection with linear probability by weight, replacing deterministic ordering.
3) Read-only validation and guardrail adjustments
- File: lib/Query_Tool_Handler.cpp
- Fixed false negatives for valid read-only queries such as SELECT literals (no FROM clause required).
- Added additional dangerous-query patterns for file-write variants:
- INTO OUTFILE
- INTO DUMPFILE
4) PostgreSQL static harvester schema query fix
- File: lib/PgSQL_Static_Harvester.cpp
- Fixed harvest_schemas() failure on PostgreSQL caused by querying non-existent information_schema.schemata.default_collation_name.
- Replaced with PostgreSQL-safe projection while preserving the insert_schema() column contract.
5) TAP reliability improvements for MCP test phases
- File: test/tap/tests/mcp_rules_testing/test_phase11_pgsql_target.sh
- Isolated phase state by clearing MCP rules before setup.
- Added verbose response prints for T11.1-T11.4.
- Added LOGENTRY phase start/end markers and EXIT trap for guaranteed end watermark.
- Replaced fragile grep/escaping checks with robust string assertions.
- Fixed NULL-concat stats assertions with COALESCE(username,'') and COALESCE(target_id,'').
- File: test/tap/tests/test_mcp_static_harvest-t.sh
- Hardened escaped-JSON parsing in MCP response checks (run_id extraction + key/value assertions).
- Reworked target/protocol/object assertions to avoid transport-format false negatives.
- Corrected cross-target negative assertion to check MCP isError semantics.
Observed result in user verification
- Phase 11 pgsql target/rules/stats checks pass.
- Static harvest phase-A checks pass for mixed mysql/pgsql target_id flow.
This commit intentionally keeps runtime ONLINE-only target executability semantics and improves observability/diagnostic quality for future MCP routing/test failures.
- Refactored variable flushing to use prepared statements for safety.
- Replaced brittle command parsing with robust string searching.
- Extracted TSDB dashboard HTML/JS to separate source file for maintainability.
- Added NULL guards for GloProxyStats in Admin and REST API.
- Implemented batched metric insertions using transactions for performance.
- Refactored backend probing to use non-blocking connect() with timeout.
- Fixed potential integer overflows in timer calculations.
- Improved error handling and fixed memory leaks in statistics reporting.
- Updated TAP tests to verify UI and REST API functionality.
Refactor the MCP query/discovery stack to remove prototype-era MySQL-only assumptions and make discovery/catalog semantics explicitly target-scoped. This commit is intentionally not backward compatible with legacy single-target catalog metadata.
Key implementation details:
- Discovery_Schema:
- runs table now stores target_id, protocol, and server_version
- resolve_run_id() now requires target_id and resolves schemas within target scope
- create_run() now records target/protocol/server version
- added legacy schema detection + destructive catalog table rebuild when old runs layout is found
- fixed resultset lifetime/escaping issues in target-aware run resolution
- New PostgreSQL static harvester:
- added PgSQL_Static_Harvester class and build wiring
- harvests schemas/objects/columns/indexes/fks/view definitions into Discovery_Schema
- maps pg metadata to object_id-based insert APIs used by Discovery_Schema
- Query_Tool_Handler:
- constructor simplified to catalog_path only (no mcp-mysql_* runtime ctor deps)
- discovery.run_static now requires target_id and dispatches protocol-aware harvester at runtime
- catalog.*, agent.run_start, and llm.* tool contracts now require target_id when resolving run_id
- all run_id resolution call sites switched to resolve_run_id(target_id, ...)
- list_schemas catalog query now scoped by target_id via runs join
- digest logging path now resolves run_id in target context
- ProxySQL_MCP_Server:
- updated Query_Tool_Handler construction to new signature
- Docs updated:
- added explicit target_id scoping guidance for discovery/catalog/agent/llm workflows
- clarified protocol-aware routing and noted legacy examples still present in large script docs
Build validation:
- Recompiled changed objects with PROXYSQLGENAI=1:
- Query_Tool_Handler.oo
- Discovery_Schema.oo
- PgSQL_Static_Harvester.oo
- ProxySQL_MCP_Server.oo
Observed confusion:
- MCP failures like 'Error 100' / 'Error 101' were visible at endpoint level but gave no direct clue whether they came from query-rule blocks or backend execution failures.
- Connection/firewall-style failures also lacked enough target/query context in lower-level logs.
Changes in this commit:
1) Log MCP query-rule block/OK matches with rule id and context
- In Discovery_Schema::evaluate_mcp_query_rules():
- when error_msg rule action is applied, log:
rule_id, tool, target_id, schema, error_msg, query preview.
- when OK_msg rule action is applied, log:
rule_id, tool, target_id, schema, ok_msg.
- This makes it explicit which runtime_mcp_query_rules row produced the returned error.
2) Add detailed backend execution error logs in Query_Tool_Handler
- For MySQL and PostgreSQL execution paths (with and without schema switching), failures now log:
- target_id
- schema (when applicable)
- backend error text
- SQL query text
- Added context for PG search_path failures too.
- This disambiguates rule-block errors from actual backend connectivity/authorization/firewall failures.
Behavioral notes:
- API responses are unchanged (to avoid breaking tests expecting exact messages).
- Logging is now sufficiently descriptive to trace each failure to either:
- a specific MCP rule id/action, or
- a concrete backend execution error.
Validation:
- Recompiled successfully:
- lib/obj/Discovery_Schema.oo
- lib/obj/Query_Tool_Handler.oo
Issue:
- MCP variable load path was writing rows into runtime_global_variables.
- For MCP this is incorrect and misleading; runtime_global_variables should not be used as a write target in this flow.
- Logs showed explicit INSERT statements into runtime_global_variables for mcp-* entries.
Fix implemented:
1) Removed runtime_global_variables population from MCP database->runtime flow
- In flush_mcp_variables___database_to_runtime(), removed the block that called:
flush_mcp_variables___runtime_to_database(..., runtime=true, ...)
- LOAD MCP VARIABLES TO RUNTIME now applies mcp-* variables to GloMCPH and reloads MCP server state, without runtime_global_variables writes.
2) Removed runtime_global_variables writes from MCP runtime->database helper
- In flush_mcp_variables___runtime_to_database():
- removed DELETE FROM runtime_global_variables WHERE variable_name LIKE 'mcp-%'
- removed INSERT INTO runtime_global_variables(...) per variable
- Function now persists MCP variables only to global_variables (main/disk paths), which is the intended save behavior.
3) Removed temporary MCP info-level debug noise related to runtime_global_variables inserts
- Deleted MCP info logs that printed per-variable INSERT SQL into runtime_global_variables.
Build validation:
- Recompiled successfully:
- lib/obj/Admin_FlushVariables.oo
Expected result after this commit:
- Executing LOAD MCP VARIABLES TO RUNTIME should no longer emit any INSERT/DELETE against runtime_global_variables for mcp-* variables.
- MCP runtime behavior remains: variables are applied in-memory and MCP server reload logic is triggered via load_mcp_server().
Adds targeted debug context for non-executable MCP targets.
When Query_Tool_Handler reports that a target has no ONLINE backend, it now appends a status summary read from the corresponding runtime server table for that hostgroup:
- MySQL: runtime_mysql_servers
- PostgreSQL: runtime_pgsql_servers
Example diagnostics now include status aggregates such as:
- ONLINE=0, SHUNNED=1
- OFFLINE_SOFT=1
- no rows in runtime_*_servers for hostgroup X
This closes the observability gap where users saw 'target non-executable' but could not immediately tell why the hostgroup failed ONLINE eligibility.
Validation:
- Recompiled lib/obj/Query_Tool_Handler.oo successfully.
This change does two things:
1) Reverts executable-target policy to strict ONLINE backend status
- Query target resolution in Query_Tool_Handler is restored to require UPPER(status)='ONLINE' in runtime_mysql_servers/runtime_pgsql_servers.
- This matches expected semantics: non-ONLINE backends should not be considered executable MCP targets.
2) Replaces generic non-executable errors with actionable diagnostics
- Added Query_Tool_Handler::format_target_unavailable_error(target_id).
- All query-tool paths that previously returned the generic:
'Target is not executable by this handler'
now return a reasoned error with context, including:
- target_id
- protocol
- hostgroup_id
- auth_profile_id
- concrete reason (for example: empty db_username in auth profile, or no ONLINE backend in runtime_*_servers)
- Internal logs in get_connection()/get_pgsql_connection() now emit the same detailed reason.
3) Improves MCP endpoint failure logging with SQL/target context
- In MCP_JSONRPC_Resource::handle_tools_call(), failed tool calls now log:
- endpoint/tool/error
- full arguments payload (existing)
- additional parsed details when present:
target_id, schema, sql (trimmed for safety)
- This makes it explicit what SQL was requested when a tool call fails before execution.
4) Protocol normalization retained for robustness
- MCP_Threads_Handler::load_target_auth_map() lowercases protocol values when loading runtime profile joins.
- This avoids protocol-casing drift causing inconsistent routing behavior.
Build validation:
- Successfully recompiled modified objects:
- lib/obj/Query_Tool_Handler.oo
- lib/obj/MCP_Endpoint.oo
- lib/obj/MCP_Thread.oo
Resulting behavior expected on next test run:
- If a target is non-executable, the response and logs will state exactly why (ONLINE/backend/auth reason), instead of a generic error.
- Failure logs will also include the attempted SQL text (for tool calls that carry sql arguments), clarifying whether a backend query was actually executed or blocked before execution.
Observed failure:
- MCP tools received valid target_id values, but /mcp/query returned 'Target is not executable by this handler'.
- This prevented rule evaluation, digest accounting, and hit counters from advancing, causing multiple TAP phases to fail.
Root cause addressed in this commit:
- Target endpoint resolution required runtime server status to be exactly ONLINE.
- In practice, reachable backends can temporarily appear in other statuses (for example monitor-related transitions), while MCP direct connections with profile credentials are still valid.
- Protocol casing inconsistencies could also lead to fragile routing behavior when profile data is consumed.
Changes:
1) Relax backend eligibility during endpoint resolution
- In Query_Tool_Handler::refresh_target_registry(), endpoint discovery now accepts any server status except OFFLINE_HARD.
- ONLINE is still preferred deterministically via ORDER BY, but non-ONLINE candidates remain eligible for MCP execution.
2) Normalize protocol values to lowercase
- In MCP_Threads_Handler::load_target_auth_map(), target protocol is normalized to lowercase when loaded from runtime profiles.
- In Query_Tool_Handler::refresh_target_registry(), protocol is normalized again defensively before routing decisions.
3) Improve diagnostics for target executability
- Added explicit warnings when:
- a target resolves to a backend but has empty db_username,
- a target has no eligible backend for its hostgroup/protocol.
- These logs make it clear why a target is being marked non-executable.
Expected behavioral impact:
- MCP query tools should execute for valid target_id entries even when monitor status is not strictly ONLINE.
- Query-rules blocking/rewriting/OK actions should be exercised again.
- stats_mcp_query_rules hits and stats_mcp_query_digest counters should resume incrementing once queries run.
Validation performed:
- Recompiled modified objects successfully:
- lib/obj/Query_Tool_Handler.oo
- lib/obj/MCP_Thread.oo
Problem addressed:
- MCP query endpoint could stay unusable with 'Tool Handler not initialized' after restart/reload flows.
- This was triggered when Query_Tool_Handler could not build an executable pool at init time (for example profiles loaded before ONLINE servers), leaving runtime commands like 'LOAD MCP QUERY RULES FROM MEMORY' blocked behind a NULL query tool handler.
- Users had to manually toggle MCP enablement to recover, which is the opposite of expected self-healing behavior.
What this commit changes:
1) Query_Tool_Handler pool init is now resilient and idempotent
- init_connection_pool() now starts with close() so reinitialization fully resets stale mysql/pgsql pool state before rebuilding from runtime profile + server tables.
- If no executable targets are available, init_connection_pool() now returns success with a warning instead of hard failure. This allows the query tool handler (and /mcp/query endpoint) to stay initialized even before backends are ready.
2) Lazy auto-rebuild on first query usage
- get_connection() and get_pgsql_connection() now:
- refresh target registry before resolution,
- attempt to use an existing pooled connection,
- if unavailable, trigger a full pool rebuild (init_connection_pool()) and retry once.
- This provides automatic recovery when hostgroups/servers/profiles are loaded or changed after MCP startup, without requiring manual MCP disable/enable.
3) Admin runtime rule load path attempts MCP recovery
- load_mcp_query_rules_to_runtime() now detects NULL query tool handler and calls load_mcp_server() once before failing.
- This turns a hard, immediate admin error into a self-recovery attempt consistent with MCP runtime semantics.
Behavioral impact:
- /mcp/query endpoint remains online even when there are temporarily zero executable targets.
- As soon as compatible runtime targets/backends exist, run_sql_readonly/explain_sql can recover automatically on demand.
- LOAD MCP QUERY RULES TO/FROM RUNTIME no longer fails immediately on first NULL handler condition; it retries after MCP server recovery.
Validation performed:
- Recompiled modified objects successfully:
- lib/obj/Query_Tool_Handler.oo
- lib/obj/ProxySQL_Admin.oo
- Full TAP runtime test execution is not possible in this sandbox due blocked local TCP socket creation; validation should be run in the normal test environment where ProxySQL/MySQL/PGSQL are reachable.
This change fixes recurring MCP TAP failures where `/mcp/query` returned:
Tool Handler not initialized for endpoint: query
and where backend monitor auth failures flooded logs.
Problem summary
- MCP server startup can occur before runtime target/auth profiles and backend server mappings are loaded.
- If that happens, Query_Tool_Handler initialization sees no executable targets and remains NULL.
- MCP endpoint resources bind the handler pointer at creation time, so a NULL query handler at startup breaks `/mcp/query` until server restart.
Code changes
1) Add explicit admin command logging for MCP PROFILES commands
- Added `Received <command>` logging in the MCP PROFILES command block, matching behavior of other admin command handlers.
- File: `lib/Admin_Handler.cpp`
2) Trigger MCP server refresh after `LOAD MCP PROFILES TO RUNTIME`
- After copying profiles into runtime and rebuilding target/auth map, call `ProxySQL_Admin::load_mcp_server()`.
- This allows MCP to self-heal when profiles become available after initial startup.
- File: `lib/Admin_Handler.cpp`
3) Restart MCP server when query handler is missing
- Extended `ProxySQL_Admin::load_mcp_server()` restart checks to include:
- running server + `query_tool_handler == NULL`
- Restart reason now includes tool handler initialization mismatch.
- File: `lib/ProxySQL_Admin.cpp`
4) Fix TAP configurator load order to avoid early MCP startup
- Reordered `test/tap/tests/mcp_rules_testing/configure_mcp.sh` runtime sequence:
- `LOAD MYSQL SERVERS TO RUNTIME`
- `LOAD PGSQL SERVERS TO RUNTIME` (best effort)
- `LOAD MCP PROFILES TO RUNTIME`
- `LOAD MCP VARIABLES TO RUNTIME` (last)
- This ensures MCP starts only after routing/auth context is present.
5) Seed monitor credentials in AI local infra pre-hook
- Added backend user/role creation for default monitor credentials `monitor/monitor`:
- MySQL: create user + monitor-relevant grants
- PostgreSQL: create role + `pg_monitor` + DB connect grants
- Reduces monitor auth noise in local AI TAP dockerized setup.
- File: `test/tap/groups/ai/pre-proxysql.bash`
6) Mark new TAP phase scripts executable
- `test_phase10_eval_explain.sh`
- `test_phase11_pgsql_target.sh`
Expected outcome
- MCP query endpoint no longer stays stuck with an uninitialized tool handler after TAP configuration.
- MCP query-rules admin commands stop failing due to missing Query_Tool_Handler.
- MCP profile command flow is visible in logs for easier debugging.
- Local AI TAP infra no longer emits continuous monitor authentication failures for default monitor credentials.
This commit completes end-to-end MCP query-rules validation for multi-target routing and introduces a self-contained TAP infra for the `ai` group that can run both in Jenkins and manually.
Main MCP/runtime changes:
- Extended MCP query-rule matching context to include both `target_id` and resolved backend `username`.
- Added `target_id` column to `mcp_query_rules` and `runtime_mcp_query_rules` table definitions.
- Extended `stats_mcp_query_rules` to include `username` and `target_id` alongside `rule_id` and `hits`.
- Updated load/save/runtime refresh paths to persist and rehydrate the expanded MCP rule schema.
- Wired the rule engine into `explain_sql` so MCP rules apply consistently across `run_sql_readonly` and `explain_sql`.
- Included startup-order fix in `src/main.cpp` to initialize MCP/GenAI thread handlers early, preventing startup crashes in PROXYSQLGENAI builds.
Test coverage changes:
- Updated existing MCP TAP phases to assert target-aware and username-aware behavior:
- `test_phase4_stats.sh`
- `test_phase6_eval_block.sh`
- Added new MCP TAP phases:
- `test_phase10_eval_explain.sh` (rule engine coverage for `explain_sql`)
- `test_phase11_pgsql_target.sh` (pgsql target routing/rule/stats coverage; graceful skip if no pgsql target is configured)
- Updated `test_mcp_query_rules-t.sh` runner to execute new phases.
AI group isolated infra (manual + CI compatible):
- Added `test/tap/groups/ai/docker-compose.yml` with MySQL 9.0 and PostgreSQL 16 backends.
- Added lifecycle scripts:
- `docker-compose-init.bash`
- `docker-compose-destroy.bash`
- `pre-proxysql.bash`
- `post-proxysql.bash`
- Extended `test/tap/groups/ai/env.sh` with local default ports/credentials/target IDs used by MCP TAP tests.
- Added `test/tap/groups/ai/README.md` documenting manual execution flow outside Jenkins.
Outcome:
- MCP tests now validate routing-aware rule enforcement and stats attribution for both MySQL and PostgreSQL targets.
- The `ai` TAP group can be run with an isolated local backend stack without relying on external Jenkins infra repositories.
Bug fix:
- Mirror sessions had their destination_hostgroup incorrectly overwritten
by fast routing rules, causing duplicate query execution to the same hostgroup
- Added check in Query_Processor.cpp to skip fast routing for mirror sessions
Changes:
- lib/Query_Processor.cpp: Wrapped fast routing logic with mirror session check
- test/tap/tests/reg_test_2233_mirror_fast_routing-t.cpp: New regression test
- test/tap/groups/groups.json: Added test to default-g1 test group
- test/tap/tests/Makefile: Added build rule for the new test (commented out)
The fix ensures mirror sessions maintain their assigned destination_hostgroup
and are not affected by fast routing rules.
Extend \d meta-command to support \d tablename for describing specific tables. Implements handlers for all 8 queries in the describe sequence:
- Query 1: pg_class lookup with c.relname OPERATOR (extracts/caches table)
- Query 2: pg_class attributes with c.oid= (table metadata)
- Query 3: pg_attribute with PRAGMA table_info() (column information)
- Queries 4-8: pg_policy, pg_statistic_ext, pg_publication, pg_inherits
(return empty results as SQLite lacks these features)
Add time-series database extension to ProxySQL_Statistics with:
- Three SQLite tables: tsdb_metrics, tsdb_metrics_hour, tsdb_backend_health
- Automatic downsampling from raw to hourly aggregates
- Retention management (7 days raw, 1 year hourly)
- Prometheus metrics sampling
- Backend health monitoring via TCP probes
- SQL query interface with JSON label filtering
- Admin variable integration
Documentation:
- embedded_tsdb_overview.md - Architecture and benefits
- embedded_tsdb_architecture.md - System design and components
- embedded_tsdb_reference.md - Configuration and API reference
- embedded_tsdb_specs.md - Technical specifications
- embedded_tsdb_quickstart.md - Getting started guide
All 4 build combinations pass:
- make debug / make release
- PROXYSQLGENAI=1 make debug / make release
When pkts_recv==0, the code reads 4 bytes expecting a MySQL packet header,
but PROXY protocol starts with 'PROXY ' which gets misinterpreted as a
MySQL header with a huge packet length (0x504f5250 = ~1.3GB).
Add check for 'PROX' prefix before parsing as MySQL header. This prevents
passing an incorrect length parameter to recv() that exceeds the buffer size.
Fixes#5376
Change malloc() to calloc() in get_result_buffer() to zero-initialize
the query digest buffer before use. This prevents stage_2_parsing()
from reading uninitialized memory when processing query digests.
Fixes#5375
Add static initialization PTHREAD_RWLOCK_INITIALIZER to filters_rwlock
in debug.cpp to prevent race condition where proxy_debug() is called
before init_debug_struct() initializes the rwlock.
This fixes Valgrind errors in filter_debug_entry() where threads were
accessing an uninitialized rwlock.
Fix race condition where threads were accessing GloMTH before it was fully
initialized, causing Valgrind errors in get_variable_int() and related
functions.
Added wait loop pattern (while GloMTH==NULL { usleep(50000); }) followed
by usleep(100000) to ensure GloMTH is fully initialized before use:
In lib/MySQL_Monitor.cpp:
- monitor_connect(), monitor_ping(), monitor_read_only()
- monitor_group_replication(), monitor_galera(), monitor_replication_lag()
- monitor_AWS_Aurora_thread_HG(), monitor_aws_aurora()
- init_mysql_thread_struct()
In src/SQLite3_Server.cpp:
- SQLite3_Server_session_handler
In lib/ProxySQL_Admin.cpp:
- admin_session_handler
In lib/ClickHouse_Server.cpp:
- ClickHouse_Server_session_handler
The pattern ensures threads wait for GloMTH initialization during startup,
then check for NULL during shutdown to exit cleanly.
Fix race condition where monitor threads were accessing GloMTH before
it was fully initialized, causing Valgrind errors in get_variable_int().
Added wait loop pattern (while GloMTH==NULL { usleep(50000); }) followed
by usleep(100000) to:
- monitor_ping_thread
- monitor_read_only_thread
- monitor_replication_lag_thread
- monitor_group_replication_thread (also moved check before MySQL_Thread creation)
- monitor_galera_thread (also moved check before MySQL_Thread creation)
This matches the existing pattern in monitor_connect_thread.
Add wait loop for GloMTH initialization before accessing variables.
Other monitor threads (MonitorPing, MonitorReadOnly, etc.) already had
this pattern, but monitor_connect_thread was missing it.
This fixes Valgrind errors about uninitialized values in
get_variable_uint16() when strcasecmp reads from memory that hasn't
been fully initialized yet.
Initialize tmp_charset=0 in MySQL_Data_Stream and PgSQL_Data_Stream
constructors. This field was used uninitialized in
process_pkt_handshake_response() when calling proxy_debug(), causing
Valgrind errors about uninitialized values in vsnprintf.
Initialize fd=-1 and status=0 in MySQL_Data_Stream and PgSQL_Data_Stream
constructors. These fields were not initialized, causing Valgrind
errors when GPFC_Statuses2() checked mybe->server_myds->status and fd.
1. Base_Thread.cpp: Initialize member variables in constructor
- curtime, last_move_to_idle_thread_time, epoll_thread, shutdown, mysql_sessions
were not initialized, causing uninitialized value use in
ProcessAllMyDS_BeforePoll and ProcessAllMyDS_AfterPoll.
2. MySQL_Logger.cpp: Initialize query_ptr, query_len, and buf in MySQL_Event
constructor. These fields were used without initialization, causing
Valgrind errors in JSON serialization and other operations.
Replace _counters{} initialization syntax with explicit memset() to ensure
the _counters array is properly zero-initialized. The _counters{} syntax
may not reliably zero-initialize arrays in all compiler implementations,
causing Valgrind to report use of uninitialized values when
add_and_reset() reads from the array.
Also remove the safety checks added previously since the root cause
is now properly fixed.
1. MySQL_Logger.cpp: Initialize username_len, schemaname_len, and server_len
to 0 in MySQL_Event constructor. These fields were uninitialized and
used in copy constructor and JSON serialization.
2. debug.cpp: Add default member initializers to DebugLogEntry struct
to ensure POD fields are zero-initialized. Move errno save/restore
to after early return checks to avoid potential TLS access issues
before thread is fully initialized.
1. MySQL_Logger.cpp: Add NULL checks for myconn->parent before accessing
address/port. Client connections (frontend) have parent=NULL, causing
uninitialized value reads when logging.
2. debug.cpp: Add NULL check for GloVars.global.gdbg_lvl before accessing
debug level configuration. Prevents use of uninitialized data when
proxy_debug_func is called before init_debug_struct() completes.
3. ProxySQL_Admin.cpp: Add NULL checks for debug filter database fields
before constructing filter strings. Prevents undefined behavior when
constructing std::string from NULL pointers.
Use memset to zero-initialize the entire QueryParserArgs struct instead
of individually setting only digest_text and first_comment to NULL.
This fixes Valgrind errors about use of uninitialised values in
strcasestr() when reading from the uninitialized buf array.
Fixes errors like:
Use of uninitialised value of size 8
at tolower (ctype.c:46)
by strcasestr
by MySQL_Connection::ProcessQueryAndSetStatusFlags(char*)
- Added <pthread.h> to log_utils.h and removed the header from log_utils.cpp
- Pass 0 to reset_time argument in flush_and_rotate inside logger's destructor
- Replace remaining sprintf with snprintf in Admin_FlushVariables.cpp
- Fix indentation in flush_GENERIC_variables__process__database_to_runtime
- Apply same pattern to ProxySQL_Admin.cpp and ProxySQL_Admin_Stats.cpp
- Convert prepare_v2() calls to use unique_ptr RAII pattern
- Remove manual sqlite3_finalize calls (now handled by RAII)
- Update flush_*_variables functions to use new prepare_v2 return pattern
- Apply changes to Admin, Cluster, Monitor, Catalog, and other modules
- Replace sprintf with snprintf in flush_*_variables functions
- Add null value safety with safe_val in variable flushing
- Fix error message to use send_error_msg_to_client instead of send_ok_msg_to_client
- Add coordinated shutdown for PROXYSQL KILL/SHUTDOWN in debug builds
- Track active admin client threads with admin_client_threads_active counter
- Add shutdown checks to prevent new connections during shutdown
- Wait for all admin client threads to finish before cleanup in admin_shutdown()
- Handle accept failures gracefully
Implement handling for psql meta-commands (\dt, \di, \dv, \d, \l) in ProxySQL's PostgreSQL admin interface by intercepting expanded queries and converting them to appropriate SQLite queries.
Supported commands:
- \l, \l+: List databases (maps to PRAGMA DATABASE_LIST)
- \dt [pattern]: List tables with optional pattern matching
- \di [pattern]: List indexes with optional pattern matching
- \dv [pattern]: List views with optional pattern matching
- \d: List all relations (tables, views, indexes, triggers)
Pattern matching supports psql regex syntax like '\dt mytest*' by converting PostgreSQL regex patterns (^(pattern).*$) to SQLite LIKE patterns (pattern%).
Detailed Changes:
- Introduced LogBuffer and LogBufferThreadContext to implement per-thread buffering, reducing lock contention.
- Replaced localtime() with localtime_r() for improved performance and thread safety during timestamp generation.
- Implemented configurable sampling for event logs to reduce I/O overhead.
- Added global configuration variables to control flush size, flush interval, and sampling rate for event and audit logs.
This commit resolves the prometheus metric collision between MySQL and
PostgreSQL Query Cache instances by creating separate metric maps with
protocol labels and using compile-time type trait selection.
Changes:
- Add #include <type_traits> to Query_Cache.cpp
- Create qc_metrics_map_mysql with { "protocol", "mysql" } labels (8 metrics)
- Create qc_metrics_map_pgsql with { "protocol", "pgsql" } labels (8 metrics)
- Modify Query_Cache constructor to use if constexpr with std::is_same_v
to select the appropriate metrics map based on derived type
- Enable GloPgQC metrics export in update_modules_metrics()
This allows MySQL and PostgreSQL query cache metrics to coexist without
collision, following the same pattern as Thread Handler metrics.
Related: #5068, #5069
This commit extends PR #5069 by adding protocol labels to distinguish
MySQL and PostgreSQL metrics in the Thread Handler modules.
Changes:
- Add { "protocol", "mysql" } labels to 65 MySQL_Thread.cpp metrics
- Add { "protocol", "pgsql" } labels to 60 PgSQL_Thread.cpp metrics
- Enable GloPTH (PostgreSQL threads handler) metrics export
- Document Query Cache limitation (template base class collision)
This resolves Issue #5068 by ensuring PostgreSQL prometheus metrics
are properly exported with protocol labels, distinguishing them from
MySQL metrics.
Related: #5068, #5069
The statement1 was prepared but never finalized, causing a memory leak.
Migrated to RAII pattern for automatic cleanup.
Also updates mysql_hostgroup_attributes and mysql_servers_ssl_params
to use RAII for consistency (lines 2231 and 2267), even though those
had manual finalize calls.
Reported-by: CodeRabbit AI
The RAII-managed unique_ptr already handles statement finalization.
Manual finalize calls cause double-free which leads to undefined behavior.
Reported-by: CodeRabbit AI
Move statement2_unique to function scope to prevent use-after-free.
The unique_ptr was declared inside the if (runtime) block but
statement2 (raw pointer) was used in the loop outside that block,
causing the statement to be finalized before use.
Affected functions:
- flush_variables___runtime_to_database (PgSQL variables)
- flush_genai_variables___runtime_to_database (GenAI variables)
- flush_mysql_variables___runtime_to_database (MySQL variables)
Reported-by: CodeRabbit AI
Fix NULL pointer dereference in CopyCmdMatcher that caused ProxySQL
to crash immediately when running any PostgreSQL query on macOS and
FreeBSD systems.
Root cause: copy_cmd_matcher was guarded by #ifdef IDLE_THREADS, but
IDLE_THREADS is explicitly disabled on macOS (__APPLE__) and FreeBSD
in proxy_defines.h. This caused the pointer to be NULL when accessed.
The fix removes #ifdef IDLE_THREADS guards from copy_cmd_matcher since
COPY command detection is a fundamental PostgreSQL feature for fast-
forward mode and has no dependency on idle threads.
Affected systems: macOS, FreeBSD (where IDLE_THREADS is disabled)
Fixes: #5359
- Fixed SAFE_SQLITE3_STEP2 macro errors: renamed 'rc1' to 'rc' in PgSQL_Monitor.cpp
functions (update_connect_table, update_ping_table, update_readonly_table,
maint_monitor_table) to match the macro's expected variable name
- Removed redundant NULL declarations in ProxySQL_Admin_Stats.cpp functions
(stats___pgsql_processlist, stats___pgsql_free_connections,
stats___pgsql_errors, stats___save_pgsql_query_digest_to_sqlite)
- Completed migration of stats___pgsql_errors and
stats___save_pgsql_query_digest_to_sqlite to RAII pattern
Migrated 5 prepare_v2 calls to RAII-based pattern in:
- update_monitor_pgsql_servers(): 2 statements (stmt1, stmt32)
- update_connect_table(): 1 statement (stmt)
- update_ping_table(): 1 statement (stmt)
- update_readonly_table(): 1 statement (stmt)
- maint_monitor_table(): 1 statement (stmt)
Note: Functions shunn_non_resp_srv() and update_srv_latency() were NOT
migrated because they use global thread_local cached statements, which is
a valid use case for the old API.
The RAII pattern ensures automatic cleanup and eliminates manual
sqlite_finalize_statement() calls.
Migrated the template function FlushDigestTableToDisk to use RAII-based
prepare_v2. This function is instantiated for both MySQL and PostgreSQL,
eliminating 8 deprecation warnings (4 instantiations x 2 prepare_v2 calls).
The RAII pattern ensures automatic cleanup and eliminates manual
(*proxy_sqlite3_finalize) calls.
The RAII-based prepare_v2 migration introduced duplicate variable declarations
that caused compilation errors. Removed duplicate 'int rc;' declarations in:
- dump_checksums_values_table()
- save_mysql_users_runtime_to_database()
- save_pgsql_users_runtime_to_database()
The 'int rc' variable was already declared at function scope, so the
duplicate declarations after the structured bindings were redundant.
Migrated:
- save_pgsql_ldap_mapping_runtime_to_database()
- save_mysql_servers_runtime_to_database()
All statements are now automatically managed by RAII smart pointers.
Migrated:
- save_pgsql_users_runtime_to_database()
- save_mysql_ldap_mapping_runtime_to_database()
All statements are now automatically managed by RAII smart pointers.
Migrated:
- dump_checksums_values_table()
- save_mysql_users_runtime_to_database()
All statements are now automatically managed by RAII smart pointers.
Migrated:
- save_mysql_firewall_whitelist_rules_from_runtime()
- save_pgsql_firewall_whitelist_rules_from_runtime()
All statements are now automatically managed by RAII smart pointers.
Migrated the following functions:
- save_mysql_query_rules_fast_routing_from_runtime()
- save_pgsql_query_rules_fast_routing_from_runtime()
- save_mysql_firewall_whitelist_sqli_fingerprints_from_runtime()
- save_pgsql_firewall_whitelist_sqli_fingerprints_from_runtime()
- save_mysql_firewall_whitelist_users_from_runtime()
- save_pgsql_firewall_whitelist_users_from_runtime()
All statements are now automatically managed by RAII smart pointers,
eliminating manual finalize calls and potential memory leaks.
Migrated the following functions:
- populate_monitor_mysql_server_aws_aurora_log()
- populate_monitor_mysql_server_aws_aurora_check_status()
- monitor_ping_process_ready_task_thread()
- monitor_read_only_process_ready_tasks() (with if/else branches for replication_lag)
All statements are now automatically managed by RAII smart pointers,
eliminating manual finalize calls and potential memory leaks.
- Changed from raw pointer to smart pointer-based prepare_v2
- Fixes existing memory leak where statement was never finalized
- RAII ensures automatic cleanup when function exits
- Changed from raw pointer to smart pointer-based prepare_v2
- Use raw pointer aliases to avoid .get() verbosity
- Remove manual finalize - RAII handles cleanup automatically
This commit addresses a use-after-free race condition that could cause
ProxySQL to crash with SIGSEGV during COM_CHANGE_USER timeout scenarios.
Root Cause:
-----------
When a session is in RESETTING_CONNECTION status and the backend
times out during COM_CHANGE_USER, the code calls RequestEnd() to
handle the error. However, RequestEnd() accesses client_myds without
null checks. If the session is marked as unhealthy and deleted by
ProcessAllSessions_Healthy0() while RequestEnd() is still executing,
this results in a use-after-free crash.
The crash is particularly likely during AWS Aurora failures when:
1. Monitor detects Aurora master timeouts
2. Multiple sessions experience connection issues
3. Many client connections are marked unhealthy
4. The high volume of session deletions increases race condition probability
Fix:
----
Added a null pointer check before accessing client_myds in the
unexp_com_pings handling code within RequestEnd().
This is a defensive programming fix that prevents the crash when
client_myds has been freed. A more robust fix would involve proper
session lifecycle management with reference counting.
Fixes: #5355
Migrate 2 test functions from deprecated prepare_v2() to the new RAII-based API:
- ProxySQL_Test___GenerateRandom_mysql_query_rules_fast_routing()
- ProxySQL_Test___MySQL_HostGroups_Manager_generate_many_clusters()
Both are test functions with clear structure - simple cases.
Uses raw pointer aliases to avoid .get() verbosity.
Pattern used:
auto [rc, statement_unique] = admindb->prepare_v2(query);
sqlite3_stmt *statement = statement_unique.get();
// ... use statement directly ...
// No manual finalize - RAII handles it
This addresses 3 of the ~150 deprecated prepare_v2 warnings.
Migrate 4 functions from deprecated prepare_v2() to the new RAII-based API:
- flush_pgsql_variables___runtime_to_database()
- flush_genai_variables___runtime_to_database()
- flush_mysql_variables___runtime_to_database()
- flush_mcp_variables___runtime_to_database()
All early returns happen before prepare_v2(), making these simple cases.
Uses raw pointer aliases to avoid .get() verbosity.
Pattern used:
auto [rc, statement_unique] = db->prepare_v2(query);
sqlite3_stmt *statement = statement_unique.get();
// ... use statement directly ...
// No manual finalize - RAII handles it
This addresses 4 of the ~150 deprecated prepare_v2 warnings.
Migrate insertMysqlEventsIntoDb() function from deprecated prepare_v2()
to the new RAII-based API using raw pointer aliases.
Pattern used:
auto [rc, statement_unique] = db->prepare_v2(query);
sqlite3_stmt *statement = statement_unique.get();
// ... use statement directly ...
// No manual finalize - RAII handles it
This addresses 2 of the ~150 deprecated prepare_v2 warnings.
Migrate 11 functions in MySQL_HostGroups_Manager.cpp from the deprecated
prepare_v2() API to the new RAII-based API. Uses raw pointer aliases to
avoid .get() verbosity throughout the code.
Changes per function:
- servers_add()
- All mysql_* table generation functions
- Group replication, Galera, AWS Aurora hostgroups functions
Pattern used:
auto [rc, statement_unique] = mydb->prepare_v2(query);
sqlite3_stmt *statement = statement_unique.get();
// ... use statement directly, no .get() needed ...
// No manual finalize - RAII handles it
Benefits:
- Automatic cleanup via RAII (no manual finalize needed)
- Exception-safe cleanup on early returns
- Reduced memory leak risk
This addresses ~11 of the ~150 deprecated prepare_v2 warnings.
Found during compiler warning analysis with 'make debug'.
1. Fix write-strings warnings (12 occurrences):
- MySQL_HostGroups_Manager.cpp: Change char* to const char*
for src_status and dst_status variables
- MySQL_Monitor.cpp: Change ConsumerThread constructor
parameter from char thread_name[16] to const char *thread_name
2. Fix unused-but-set-variable warnings (6 occurrences) by
wrapping PROXYSQLGENAI-specific variables in #ifdef blocks:
- stats_mcp_query_tools_counters
- stats_mcp_query_tools_counters_reset
- stats_mcp_query_digest
- stats_mcp_query_digest_reset
- stats_mcp_query_rules
- runtime_mcp_query_rules
These variables were assigned values but appeared unused when
PROXYSQLGENAI was not defined. Now they are only defined and
assigned when PROXYSQLGENAI is enabled.
This eliminates 18 compiler warnings: 12 write-strings and 6
unused-but-set-variable warnings.
Found during compiler warning analysis with 'make debug'.
Fix two HIGH priority signed/unsigned comparison warnings:
1. MySQL_HostGroups_Manager.cpp:1792 - Change loop variable from
'int j' to 'unsigned int j' to match the unsigned comparison
with servers->len.
2. MySQL_Thread.cpp:4357 - Cast int to size_t when comparing with
getBufferSize() which returns size_t.
These changes prevent potential logic errors from negative values
being interpreted as large unsigned values.
Found during compiler warning analysis with 'make debug'.
Fix use-after-free vulnerability where the function frees the 'arg'
pointer at line 2306 but then returns it at line 2323. Change the
return value to NULL instead of the freed pointer.
This fixes a -Wuse-after-free compiler warning and prevents potential
memory corruption.
Found during compiler warning analysis with 'make debug'.
The fix ensures that when the purge checks ref_count_client under write lock, it sees the latest value from concurrent threads holding raw pointers, preventing reading stale data.
This commit fixes a bug where connection cleanup loops were removing
the wrong connection from the pool. The loops checked each connection
by index (i), but when an expired connection was found, they removed
index 0 instead of index i.
This caused:
- Fresh connections at index 0 to be incorrectly deleted
- Expired connections to remain in the pool
Fixed files:
- lib/Base_HostGroups_Manager.cpp:2603 (MySQL)
- lib/MySQL_HostGroups_Manager.cpp:2939 (MySQL)
- lib/PgSQL_HostGroups_Manager.cpp:2782 (PgSQL)
The fix changes `remove(0)` to `remove(i)` to remove the correct
connection.
Related: #5094 (fixes similar issue in drop_all_idle_connections)
This commit addresses critical issues identified in the PR review:
1. Build system redundancy (src/Makefile):
- Remove redundant PROXYSQLGENAI conditionals in linking commands
- The flag doesn't affect linking, so simplify to PROXYSQLCLICKHOUSE only
2. Missing PROXYSQLGENAI guard (lib/Admin_Handler.cpp):
- Add #ifdef PROXYSQLGENAI around MCP VARIABLES DISK commands
- Ensures MCP commands are only available when GenAI is enabled
3. Broken retry logic (lib/LLM_Clients.cpp):
- Remove misleading is_retryable_error() checks that used stale values
- last_curl_code and last_http_code were never updated
- Simplify to retry on empty responses only (documented limitation)
- Fix thread-safety: use thread_local std::mt19937 instead of rand()
4. Resource leak (lib/MySQL_Tool_Handler.cpp):
- Clean up previously created connections on init failure
- Both mysql_init and mysql_real_connect error paths now clean up
5. Race condition (lib/Query_Tool_Handler.cpp):
- Add missing pool_lock in find_connection()
- Prevents race condition when accessing connection_pool
Fixes identified by automated PR review agents.
This commit fixes compilation errors when building with PROXYSQLGENAI=1:
Header file fixes:
- Add missing #endif /* PROXYSQLGENAI */ before header guards in multiple headers
- Remove #include cpp.h from GenAI headers to avoid circular dependencies
Source file fixes:
- Add #include proxysql.h to GenAI .cpp files that were missing it
- Add #include Static_Harvester.h to Query_Tool_Handler.cpp for forward decl
Build system fixes:
- Remove vec.o from libproxysql.a (it's linked separately in src/Makefile)
- Prevents duplicate symbol errors during linking
Test files:
- Rename MCP test files with .sh suffix to prevent make clean deletion
Both build modes now work:
- make build_lib_debug (without GenAI)
- PROXYSQLGENAI=1 make build_lib_debug (with GenAI)
This commit merges the experimental v4.0 GenAI/MCP features into the stable
v3.0 branch using conditional compilation. All v4.0 features are disabled by
default and only enabled when PROXYSQLGENAI=1 is set at compile time.
Changes:
Build System:
- Modified main Makefile to pass PROXYSQLGENAI flag to sub-makefiles
- Modified deps/Makefile to conditionally build sqlite-vec and sqlite-rembed
- Modified lib/Makefile to add PSQLGA flag and include GenAI object files
- Modified src/Makefile to add PSQLGA flag and conditional linking
Headers (wrapped with #ifdef PROXYSQLGENAI):
- All 20 new GenAI header files in include/
- Modified cpp.h, proxysql_glovars.hpp, proxysql_admin.h
- Modified ProxySQL_Admin_Tables_Definitions.h for GenAI/MCP tables
Source Files:
- All 22 new GenAI source files in lib/ wrapped with #ifdef PROXYSQLGENAI
- Modified src/main.cpp for conditional global variables and init/shutdown
- Modified Admin_Handler.cpp for conditional command handlers
- Modified Admin_Bootstrap.cpp for conditional table registration
- Modified Admin_FlushVariables.cpp for conditional variable flushing
- Modified ProxySQL_Admin.cpp for conditional admin methods
- Modified ProxySQL_Admin_Stats.cpp for conditional MCP stats functions
- Modified proxy_sqlite3_symbols.cpp to always compile (needed by core)
- Modified MySQL_Session.cpp for conditional GenAI function calls
Test Files:
- Renamed test_mcp_query_rules-t to test_mcp_query_rules-t.sh
- Renamed test_mcp_rag_metrics-t to test_mcp_rag_metrics-t.sh
- Modified anomaly_detection-t.cpp for conditional test execution
Usage:
# Build without GenAI (v3.0 mode - default)
make clean && make build_deps -j$(nproc) && make build_lib -j$(nproc) && make build_src -j$(nproc)
# Build with GenAI (v4.0 mode)
make clean && PROXYSQLGENAI=1 make build_deps -j$(nproc) && PROXYSQLGENAI=1 make build_lib -j$(nproc) && PROXYSQLGENAI=1 make build_src -j$(nproc)
- `sqlite-vec` requires that `knn` queries (using the `MATCH` operator for
vector similarity search) must have a `LIMIT` clause at the same query
level as the `MATCH` clause.
- Execute `knn` queries as a subquery and then do `JOIN`s with
`rag_chunks` and `rag_documents`.
Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
Merge changes from PR #5318 (RAG ingestion feature) into the new
development branch v4.0_rag_ingest_2.
Changes include:
- RAG ingestion tool (rag_ingest) with chunking and embeddings
- MySQL_Catalog fixes for NULL pointer handling
- MCP server updates for RAG tools
- Comprehensive documentation and test scripts
- Fix NULL pointer dereference in rag_ingest.cpp: use str_or_empty() helper
for all sqlite3_column_text results assigned to std::string
- Fix NULL tags/links crash in MySQL_Catalog.cpp: add null guards before
assigning sqlite3_column_text results to std::string
- Fix missing curl_global_cleanup on error path in rag_ingest.cpp
- Fix std::out_of_range exception in rag_ingest.cpp: wrap std::stoll calls
in try-catch blocks, fall back to string comparison on overflow
- Fix Makefile: Use $(CXXFLAGS) directly for consistency with build philosophy
- Fix MySQL_Catalog: Return proper error JSON instead of empty array on missing query
This commit addresses concerns raised by AI code reviewers (gemini-code-assist,
Copilot, coderabbitai) on the initial security fixes.
Critical fixes:
- Fix lock.release() → lock.unlock() in GenAI_Thread.cpp worker_loop
(lock.release() detaches without unlocking, causing deadlock)
- Add missing early return after schema validation failure in Query_Tool_Handler.cpp
Code quality improvements:
- Improve escape_string() memory management in MySQL_Tool_Handler.cpp:
- Use std::string instead of new[]/delete[] for buffer management
- Check return value of mysql_real_escape_string() for errors
- Remove redundant validation checks in validate_sql_identifier functions
(character class loop already rejects unsafe characters)
- Add backslash escaping to escape_string_literal() for defense-in-depth
- Improve column list validation in MySQL_Tool_Handler sample_rows():
- Replace blacklist approach with proper column identifier parsing
- Allow qualified identifiers (table.column)
- Allow AS aliases (column AS alias)
- No longer rejects legitimate column names containing "JOIN"
These changes improve robustness while maintaining the security posture
of the original SQL injection fixes.
This commit addresses critical and important security vulnerabilities found
during comprehensive code review of the Gen AI features merge.
Critical fixes:
- SQL injection vulnerabilities in MySQL_Tool_Handler.cpp:
- Added validate_sql_identifier() for schema/table validation
- Added escape_string() for MySQL string escaping using mysql_real_escape_string
- Fixed list_tables(), describe_table(), sample_rows(), sample_distinct()
- SQL injection vulnerabilities in Query_Tool_Handler.cpp:
- Added validate_sql_identifier_sqlite() for identifier validation
- Added escape_string_literal() for SQLite string escaping
- Fixed list_tables tool and catalog.get_relationships function
- Use-after-free race condition in GenAI_Thread:
- Changed shutdown_ from int to std::atomic<int> for proper memory ordering
- Added additional shutdown check in worker_loop after popping request
Important fixes:
- Buffer overflow risks from sprintf usage:
- Converted all sprintf() calls to snprintf() in GenAI_Thread.cpp
- Converted sprintf() to snprintf() in MySQL_Session.cpp
- Worker loop shutdown race condition:
- Added shutdown check after popping request from queue
- Properly clean up client_fd when shutdown is detected
These fixes ensure:
1. All user input is properly validated before use in SQL queries
2. String values are properly escaped using database-specific escaping
3. Thread-safe shutdown with proper memory ordering guarantees
4. Bounds-safe string formatting to prevent buffer overflows
This addresses an issue from PR #22 where LoadPlugin() was completely
disabled. The function performs necessary initialization even when no
plugin is loaded (initializes built-in sqlite3 function pointers).
Changes:
- Added `const bool allow_load_plugin = false` flag in LoadPlugin()
- Modified `if (plugin_name)` to `if (plugin_name && allow_load_plugin == true)`
- Re-enabled the LoadPlugin() call in LoadPlugins()
The plugin loading code remains disabled (allow_load_plugin=false) while
the function pointer initialization from built-in SQLite3 now works correctly.
TODO: Revisit plugin loading safety mechanism to allow actual plugin loading.
- Fix comment mismatch: Changed _2 suffix to match actual function name
(mysql_query_digest_and_first_comment, not _2)
- Make get_def_mysql_opts() static to avoid symbol pollution
- Fix NULL first_comment parameter to prevent potential segfault
- Pass valid char* pointer instead of NULL
- Free first_comment if allocated by the function
Address coderabbitai review - implement full JSON escaping for SQL digest:
- Handle backslash (\) and double quote (")
- Handle control characters: newline (\n), carriage return (\r), tab (\t)
- Handle other control characters (U+0000 through U+001F) with \uXXXX escapes
This ensures digest_text in stats_mcp_query_digest is always valid JSON,
preventing parsing errors for consumers of this data.
Fix issues identified by coderabbitai review:
1. Admin_Handler.cpp: Fix typo in strncasecmp for LOAD MCP QUERY RULES
- Line 2365 had "LOAD MCP RULES FROM DISK" instead of "LOAD MCP QUERY RULES FROM DISK"
2. Admin_Handler.cpp: Fix use-after-free and missing client response
- Removed l_free(*ql,*q) which freed *q before caller used it
- Added send_error_msg_to_client calls on all error paths
- Added send_ok_msg_to_client call on success path
- Changed return value from true to false (to match handler pattern)
- Applied to both LOAD MCP QUERY RULES and SAVE MCP QUERY RULES handlers
3. ProxySQL_Admin_Stats.cpp: Fix sprintf SQL injection in stats___mcp_query_rules
- Replaced sprintf with prepared statement using positional parameters
- Changed from char* query with malloc/free to sqlite3_stmt with prepare_v2
- Both columns now bound as parameters (?1, ?2)
Fix compilation errors in the SQL injection fixes:
1. ProxySQL_Admin_Stats.cpp: Use public statsdb->prepare_v2() API
- Changed from direct proxy_sqlite3_prepare_v2() calls with statsdb->db
- statsdb->db is private, must use public prepare_v2(query, &stmt) method
2. Admin_Handler.cpp: Add SPA cast for template function access
- Added ProxySQL_Admin *SPA=(ProxySQL_Admin *)pa; declaration
- Changed all admindb->execute to SPA->admindb->execute
- Removed unused 'error' and 'success' variables
The build now completes successfully.
Fix two SQL injection vulnerabilities identified by coderabbitai in
ProxySQL_Admin_Stats.cpp by converting sprintf/snprintf interpolation
to SQLite prepared statements.
1. stats___mcp_query_digest (lines 2581-2636):
- Changed from sprintf with format string to prepared statement
- digest_text field now properly bound as parameter (was unsafe)
- All 10 columns now use positional parameters (?1-?10)
2. stats___mcp_query_tools_counters (lines 1587-1635):
- Changed from snprintf with unescaped fields to prepared statement
- Fixed incorrect table name logic (was appending _reset incorrectly)
- All 8 columns now use positional parameters (?1-?8)
These changes prevent SQL injection when resultset fields contain
quotes, backslashes, or other SQL metacharacters.
Fix multi-statement execution in Admin_Handler.cpp for MCP query rules.
The previous code built a single SQL string with "DELETE ...; INSERT ..." but
SQLite only executed the first statement.
Changed to execute statements as an explicit transaction:
1. BEGIN
2. DELETE FROM target_table
3. INSERT OR REPLACE INTO target_table SELECT * FROM source_table
4. COMMIT (or ROLLBACK on error)
Applied to both:
- LOAD MCP QUERY RULES FROM DISK/TO MEMORY
- SAVE MCP QUERY RULES TO DISK
Addresses coderabbitai review comment.
Change free(resultset) to delete resultset in
ProxySQL_Admin::load_mcp_query_rules_to_runtime.
SQLite3_result is a C++ class allocated with new, so it must be
deallocated with delete, not free(). Using free() causes undefined
behavior and memory leaks.
Addresses coderabbitai review comment.
Fix two issues in Query_Tool_Handler's execute_query functions:
1. mysql_query() failure path now returns immediately after
return_connection() instead of continuing to process on bad state.
2. Capture affected_rows BEFORE return_connection() to avoid race
condition. Previously, mysql_affected_rows() was called after
return_connection(), potentially accessing a stale connection.
Apply fixes to both execute_query() and execute_query_with_schema().
Addresses coderabbitai review comments.
Add escape_sql_string() helper function that doubles single quotes
to prevent SQL injection when strings are used in SQL string
concatenation. Update harvest_view_definitions to use this function
for view_def, schema_name, and view_name.
This prevents SQL injection in the UPDATE statement that stores
view definitions in the catalog.
Addresses coderabbitai review comment.
The execute_parameterized_query function in RAG_Tool_Handler was creating
a prepared statement and binding parameters, but then executing the raw query
string instead of the prepared statement. This completely bypassed the
parameter binding, making the function useless for preventing SQL injection.
Changed to use vector_db->execute_prepared(stmt, ...) to execute the bound
statement instead of the raw query, so the bound parameters are actually used.
Addresses coderabbitai review comment.
Change free(resultset) to delete resultset in Query_Tool_Handler
(extract_schema_name function). SQLite3_result is a C++ class allocated
with new, so it must be deallocated with delete, not free(). Using free()
causes mixed allocator UB.
Addresses coderabbitai review comment.
Add catalog_au AFTER UPDATE trigger in MySQL_Catalog that mirrors the
delete+insert pattern used in catalog_ad/catalog_ai. This keeps the FTS
index current when upserts occur (INSERT OR REPLACE ... ON CONFLICT ...
DO UPDATE), since the UPDATE doesn't trigger INSERT/DELETE triggers.
The trigger first deletes the old entry from catalog_fts then inserts
the new entry, ensuring the full-text search index stays synchronized
with the catalog table.
Addresses coderabbitai review comment.
Add assert(proxy_sqlite3_bind_blob) to the assertion block in
SQLite3DB::LoadPlugin to ensure the symbol is provided by plugins.
Without this, if a plugin fails to provide the symbol, the code will
crash at runtime with no safety check.
proxy_sqlite3_bind_blob is actively used in Anomaly_Detector.cpp:765
to bind embeddings.
Addresses coderabbitai review comment.
- lib/MySQL_Catalog.cpp: Convert search/list/remove to use SQLite prepared
statements instead of string concatenation for user parameters
- lib/RAG_Tool_Handler.cpp: Add escape_fts_query() function to properly
escape single quotes in FTS5 MATCH clauses; update all FTS and vector
MATCH queries to use escaped values
- lib/Static_Harvester.cpp: Add is_valid_schema_name() validation function
to ensure schema names only contain safe characters (alphanumeric,
underscore, dollar sign) before using in INFORMATION_SCHEMA queries
- lib/Query_Tool_Handler.cpp: Add clarifying comments to validate_readonly_query
explaining the blacklist (quick exit) + whitelist (allowed query types) approach
- Remove backup file lib/Anomaly_Detector.cpp.bak
Addresses gemini-code-assist review comments from PR #26.
- Fixed invalid memory accesses for tables:
+ mcp_query_rules
+ stats_mcp_query_rules
+ stats_mcp_query_digests
- Fixed inactive 'mcp_query_rules' being loaded to runtime.
- Fixed hash computation in 'compute_mcp_digest'.
- Fixed invalid escaping during 'stats_mcp_query_digests' gen.
- Fixed digest generation for MCP arguments:
+ SQL queries are now preserved using
'mysql_query_digest_and_first_comment'.
+ TODO: Options for the tokenizer are right now hardcoded.
- Added initial testing and testing plan for MCP query_(rules/digests).
+ TODO: Test finished on phase8. Timeouts destroy the MCP
connection, leaving it unusable for subsequent queries this should
be fixed for continuing testing.
- TODO: There are several limitations to fix in
'validate_readonly_query'. This reflect in some query hacks in the
testing.
+ 'SELECT' starting with comments (--) gets flagged as non-read.
+ 'SELECT' must have a 'SELECT .* FROM' structure. While common,
simple testing queries many times lack this form.