This commit fixes a serious design flaw where AI configuration variables
were not integrated with the ProxySQL admin interface. All ai_*
variables have been migrated to the GenAI module as genai-* variables.
Changes:
- Added 21 new genai_* variables to GenAI_Thread.h structure
- Implemented get/set functions for all new variables in GenAI_Thread.cpp
- Removed internal variables struct from AI_Features_Manager
- AI_Features_Manager now reads from GloGATH instead of internal state
- Updated documentation to reference genai-* variables
- Fixed debug.cpp assertion for PROXY_DEBUG_NL2SQL and PROXY_DEBUG_ANOMALY
Variable mapping:
- ai_nl2sql_enabled → genai-nl2sql_enabled
- ai_anomaly_detection_enabled → genai-anomaly_enabled
- ai_features_enabled → genai-enabled
- All other ai_* variables follow the same pattern
The flush functions automatically handle all variables in the
genai_thread_variables_names array, so database persistence
works correctly without additional changes.
Related to: https://github.com/ProxySQL/proxysql-vec/pull/13
- Fix retry logic to use is_retryable_error function for proper HTTP error handling
- Add exception handling to get_json_int function with try-catch around std::stoi
- Improve validate_numeric_range to use strtol instead of atoi for better error reporting
- Fix Chinese characters in documentation (non-zero -> non-zero)
- Replace placeholder tests with actual comprehensive tests for anomaly detection functionality
- Create new standalone unit test anomaly_detector_unit-t.cpp with 29 tests covering:
* SQL injection pattern detection (12 tests)
* Query normalization (8 tests)
* Risk scoring calculations (5 tests)
* Configuration validation (4 tests)
- All tests pass successfully, providing meaningful validation of core anomaly detection logic
Thanks to gemini-code-assist for the thorough code review and recommendations.
- Rename validate_provider_name to validate_provider_format for clarity
- Add null checks and error handling for all strdup() operations
- Enhance error messages with more context and HTTP status codes
- Implement performance monitoring with timing metrics for LLM calls and cache operations
- Add comprehensive test coverage for edge cases, retry scenarios, and performance
- Extend status variables to track performance metrics
- Update MySQL session to report timing information to AI manager
MCP server was already running. The issue was caused by improper cleanup of
handler objects during reinitialization.
Root cause:
- ProxySQL_MCP_Server destructor deletes mysql_tool_handler
- The old code tried to delete handlers again after deleting the server,
causing double-free corruption
The fix properly handles handler lifecycle during reinitialization:
1. Delete Query_Tool_Handler first (server destructor doesn't clean this)
2. Delete the server (which also deletes MySQL_Tool_Handler via destructor)
3. Delete other handlers (config/admin/cache/observe) created by old server
4. Create new MySQL_Tool_Handler with updated configuration
5. Create new Query_Tool_Handler
6. Create new server (recreates all handlers with new endpoints)
This ensures proper cleanup and prevents double-free issues while allowing
runtime reconfiguration of MySQL connection parameters.
This commit adds comprehensive unit tests for the AI configuration
validation functions used in AI_Features_Manager.
Changes:
- Add test/tap/tests/ai_validation-t.cpp with 61 unit tests
- Test URL format validation (validate_url_format)
- Test API key format validation (validate_api_key_format)
- Test numeric range validation (validate_numeric_range)
- Test provider name validation (validate_provider_name)
- Test edge cases and boundary conditions
The test file is self-contained with its own copies of the validation
functions to avoid complex linking dependencies on libproxysql.
Test Categories:
- URL validation: 15 tests (http://, https:// protocols)
- API key validation: 14 tests (OpenAI, Anthropic formats)
- Numeric range: 13 tests (min/max boundaries)
- Provider name: 8 tests (openai, anthropic)
- Edge cases: 11 tests (NULL handling, long values)
All 61 tests pass successfully.
Part of: Phase 4 of NL2SQL improvement plan
Add comprehensive structured logging for NL2SQL LLM API calls with
request correlation, timing metrics, and detailed error context.
Changes:
- Add request_id field to NL2SQLRequest with UUID-like auto-generation
- Add structured logging macros:
* LOG_LLM_REQUEST: Logs URL, model, prompt length with request ID
* LOG_LLM_RESPONSE: Logs HTTP status, duration_ms, response preview
* LOG_LLM_ERROR: Logs error phase, message, and status code
- Update call_generic_openai() signature to accept req_id parameter
- Update call_generic_anthropic() signature to accept req_id parameter
- Add timing metrics to both LLM call functions using clock_gettime()
- Replace existing debug logging with structured logging macros
- Update convert() to pass request_id to LLM calls
Request IDs are generated as UUID-like strings (e.g., "12345678-9abc-def0-1234-567890abcdef")
and are included in all log messages for correlation. This allows tracking
a single NL2SQL request through all log lines from request to response.
Timing is measured using CLOCK_MONOTONIC for accurate duration tracking
of LLM API calls, reported in milliseconds.
This provides much better debugging capability when troubleshooting
NL2SQL issues, as administrators can now:
- Correlate all log lines for a single request
- See exact timing of LLM API calls
- Identify which phase of processing failed
- Track request/response metrics
Fixes#2 - Add Structured Logging
Add comprehensive validation for AI features configuration variables
to prevent invalid states and improve error messages.
Changes:
- Add validate_url_format(): Checks for http:// or https:// prefix and host part
- Add validate_api_key_format(): Validates API key format, checks for whitespace,
minimum length, and incomplete key patterns (sk- with <20 chars, sk-ant- with <25 chars)
- Add validate_numeric_range(): Validates numeric values are within min/max range
- Add validate_provider_name(): Ensures provider is 'openai' or 'anthropic'
- Update set_variable() to call validation functions before setting values
Validated variables:
- ai_nl2sql_provider: Must be 'openai' or 'anthropic'
- ai_nl2sql_provider_url: Must have http:// or https:// prefix
- ai_nl2sql_provider_key: No whitespace, minimum 10 chars
- ai_nl2sql_cache_similarity_threshold: Range [0, 100]
- ai_nl2sql_timeout_ms: Range [1000, 300000] (1 second to 5 minutes)
- ai_nl2sql_max_cloud_requests_per_hour: Range [1, 10000]
- ai_anomaly_similarity_threshold: Range [0, 100]
- ai_anomaly_risk_threshold: Range [0, 100]
- ai_anomaly_rate_limit: Range [1, 10000]
- ai_vector_dimension: Range [128, 4096]
This prevents misconfigurations and provides clear error messages to users
when invalid values are provided.
Fixes compilation issue by moving validation helper functions before
set_variable() to resolve forward declaration errors.
Add comprehensive SQL validation with confidence scoring based on:
- SQL keyword detection (17 keywords covering DDL/DML/transactions)
- Structural validation (balanced parentheses and quotes)
- SQL injection pattern detection
- Length and quality checks
Confidence scoring:
- Base 0.4 for valid SQL keyword
- +0.15 for balanced parentheses
- +0.15 for balanced quotes
- +0.1 for minimum length
- +0.1 for FROM clause in SELECT statements
- +0.1 for no injection patterns
- -0.3 penalty for injection patterns detected
Low confidence (< 0.5) results are logged with detailed info.
Cache storage threshold updated to 0.5 confidence (from implicit valid_sql).
This improves detection of malformed or potentially malicious SQL
while providing granular confidence scores for downstream use.
Remove Ollama-specific provider code and use only generic OpenAI-compatible
and Anthropic-compatible providers. Ollama is now used via its
OpenAI-compatible endpoint at /v1/chat/completions.
Changes:
- Remove LOCAL_OLLAMA from ModelProvider enum
- Remove ai_nl2sql_ollama_model and ai_nl2sql_ollama_url variables
- Remove call_ollama() function from LLM_Clients.cpp
- Update default configuration to use OpenAI provider with Ollama URL
- Update all documentation to reflect generic-only approach
Configuration:
- ai_nl2sql_provider: 'openai' or 'anthropic' (default: 'openai')
- ai_nl2sql_provider_url: endpoint URL (default: Ollama OpenAI-compatible)
- ai_nl2sql_provider_model: model name
- ai_nl2sql_provider_key: API key (optional for local endpoints)
This simplifies the codebase by removing a separate code path for Ollama
and aligns with the goal of avoiding provider-specific variables.
NL2SQL_Converter improvements:
- Implement get_query_embedding() using GenAI module
- Implement check_vector_cache() with KNN search via sqlite-vec
- Implement store_in_vector_cache() with embedding storage
- All stub methods now fully functional
Anomaly_Detector improvements:
- Implement add_threat_pattern() with embedding generation
- Stores patterns in both main table and virtual vec table
- Returns pattern ID on success, -1 on error
Documentation:
- Add comprehensive VECTOR_FEATURES documentation
- README.md (471 lines): User guide and quick start
- API.md (736 lines): Complete API reference
- ARCHITECTURE.md (358 lines): System architecture
- TESTING.md (767 lines): Testing guide and procedures
This completes the vector features implementation, enabling:
- Semantic similarity caching for NL2SQL queries
- Embedding-based threat pattern detection
- Full CRUD operations for threat patterns
Improve Anomaly_Detector with full threat pattern CRUD operations:
Changes to lib/Anomaly_Detector.cpp:
- Implement list_threat_patterns():
* Returns JSON array of all threat patterns
* Shows pattern_name, pattern_type, query_example, severity, created_at
* Ordered by severity DESC (highest risk first)
- Implement remove_threat_pattern():
* Deletes from both anomaly_patterns and anomaly_patterns_vec tables
* Proper error handling with error messages
* Returns true on success, false on failure
- Improve get_statistics():
* Add threat_patterns_count to statistics
* Add threat_patterns_by_type breakdown
* Shows patterns grouped by type (sql_injection, dos, etc.)
- Add count_by_pattern_type query for categorization
Features:
- Full CRUD operations for threat patterns
- JSON-formatted output for API integration
- Statistics include both counts and categorization
- Proper cleanup of both main and virtual tables
Implemented embedding-based threat pattern detection using GenAI and sqlite-vec:
Changes to lib/Anomaly_Detector.cpp:
- Add GenAI_Thread.h include and GloGATH extern
- Implement get_query_embedding():
* Calls GloGATH->embed_documents() via llama-server
* Normalizes query before embedding for better quality
* Returns std::vector<float> with embedding
- Implement check_embedding_similarity():
* Generates embedding for query if not provided
* Performs sqlite-vec KNN search against anomaly_patterns table
* Uses cosine distance (vec_distance_cosine) for similarity
* Calculates risk score based on severity and distance
* Returns AnomalyResult with pattern details and blocking decision
- Implement add_threat_pattern():
* Generates embedding for threat pattern example
* Stores pattern with embedding in anomaly_patterns table
* Updates anomaly_patterns_vec virtual table for KNN search
* Returns pattern ID on success
Features:
- Semantic similarity detection against known threat patterns
- Configurable similarity threshold (ai_anomaly_similarity_threshold)
- Risk scoring based on pattern severity (1-10) and similarity
- Automatic threat pattern management with vector indexing
- Add NL2SQL_Converter with prompt building and model selection
- Add LLM clients for Ollama, OpenAI, Anthropic APIs
- Update Makefile for new source files
- Add AI_Features_Manager coordinator class
- Add AI_Vector_Storage interface (stub)
- Add Anomaly_Detector class (stub for Phase 3)
- Update includes and main initialization
Bug Description:
ProxySQL would deadlock when processing extended query frames where:
1. Many Close Statement messages accumulate responses in PSarrayOUT
2. Total response size exceeds pgsql-threshold_resultset_size
3. A backend operation (Describe/Execute) follows in the same frame
Root Cause:
- Close Statement operations are handled locally by ProxySQL (no backend routing)
- Their CloseComplete responses accumulate in PSarrayOUT
- When threshold_resultset_size is exceeded, ProxySQL stops reading from backend
- Subsequent backend operations (Describe/Execute) need backend responses to complete
- This creates a deadlock: ProxySQL won't read, backend operation can't complete
- Extended query frame never finishes, query times out
The Fix:
When PSarrayOUT exceeds threshold_resultset_size and a backend operation is pending,
ProxySQL now flushes all accumulated data in PSarrayOUT to the client first, then
continues processing backend operations. This breaks the deadlock by clearing the
buffer before attempting to read more data from the backend.
Using bind message to obtain parameter information, rather than determining whether the query is parameterized from the query itself. Multiple parameters are not possible in this case, as PostgreSQL itself rejects multi-parameter pg_cancel_backend() and pg_terminate_backend() and only accepts a single parameter for these functions.
This commit extends the existing pg_cancel_backend() and pg_terminate_backend()
support to work with parameterized queries in the extended query protocol.
While literal PID values were already supported in both simple and extended
query protocols, this enhancement adds support for parameterized queries like
SELECT pg_cancel_backend($1).
The catalog_search() and catalog_list() methods in MySQL_Catalog.cpp
were manually building JSON strings by concatenating raw TEXT from
SQLite without proper escaping. This caused parse errors when stored
JSON contained quotes, backslashes, or newlines.
Changes:
- MySQL_Catalog.cpp: Use nlohmann::json to build proper nested JSON
in search() and list() methods instead of manual concatenation
- MySQL_Tool_Handler.cpp: Add try-catch for JSON parsing in catalog_get()
- test_catalog.sh: Fix MCP URL path, add jq extraction for MCP protocol
responses, add 3 special character tests (CAT013-CAT015)
Test Results: All 15 catalog tests pass, including new tests that
verify special characters (quotes, backslashes) are preserved.
Python bridge (scripts/mcp/proxysql_mcp_stdio_bridge.py):
- Make log file path configurable via PROXYSQL_MCP_BRIDGE_LOG env var
- Add httpx.RequestError exception handling for network issues
- Fix asyncio.CancelledError not being re-raised (HIGH priority)
- Replace deprecated asyncio.get_event_loop() with get_running_loop()
C++ server (lib/MCP_Endpoint.cpp):
- Refactor handle_tools_call() to reduce code duplication
- Handle string responses directly without calling .dump()
- Single shared wrapping block for all response types
Per review: https://github.com/ProxySQL/proxysql-vec/pull/11
The ProxySQL MCP server now wraps tool results in the correct MCP format:
- result.content: array of content items (type: "text", text: "...")
- result.isError: boolean
Per MCP spec: https://modelcontextprotocol.io/specification/2025-11-25/server/tools
Also simplified the bridge to pass through results directly since the server
now returns the correct format.
- Move explicit template instantiations for send_ok_msg_to_client and
send_error_msg_to_client to after template definitions
- Add missing closing brace for init_mcp_variables()
- Fix missing #endif and closing brace for GloMCPH shutdown block
- Fixed NULL value handling in execute_query: use empty string instead
of nullptr to avoid "basic_string: construction from null" errors
- Fixed validate_readonly_query: corrected substring length check
from substr(0,6)!="SELECT " to substr(0,6)!="SELECT"
- Fixed test script: added proper variable_name parameter for
get_config/set_config tools
Query endpoint tools now pass all tests.
The nlohmann::json value() method can throw "basic_string: construction
from null is not valid" when trying to convert a JSON null value to std::string.
Added helper functions get_json_string() and get_json_int() that:
- Check if key exists before accessing
- Check if value is not null
- Check if value has correct type
- Return default value if any check fails
This prevents crashes when:
1. Arguments are missing (returns default)
2. Arguments are explicitly null (returns default)
3. Arguments have wrong type (returns default)
MySQL returns column names in uppercase for information_schema tables,
but the code was expecting lowercase column names. This caused crashes
when accessing JSON keys that didn't exist.
Changes:
1. Convert all column names to lowercase in execute_query()
2. Store lowercase column names in a vector for efficient access
3. Use lowercase column names as keys in JSON row objects
This ensures consistent column name casing across all queries,
preventing JSON access errors for information_schema columns.
Also includes the previous use-after-free fix.
The code was creating a dangling pointer by calling c_str() on a
temporary std::string object, causing undefined behavior and crashes
when processing query results.
Before:
const char* col_name = columns[i].get<std::string>().c_str();
// ^ temporary string destroyed here, col_name is dangling
After:
std::string col_name = columns[i].get<std::string>();
// ^ col_name is valid until end of scope
This bug was causing ProxySQL to crash when running MCP tool tests.
When LOAD MCP VARIABLES TO RUNTIME is called and the MCP server is
already running, the MySQL Tool Handler is now recreated with the
current configuration values. This allows changing MySQL connection
parameters without restarting ProxySQL.
The reinitialization:
1. Deletes the old MySQL Tool Handler
2. Creates a new one with current mcp-mysql_* values
3. Initializes the new handler
4. Logs success or failure
- Initialize MySQL_Tool_Handler in ProxySQL_MCP_Server constructor
with MySQL configuration from MCP variables
- Use GloVars.get_SSL_pem_mem() to get SSL certificates correctly
- Add MySQL_Tool_Handler cleanup in destructor
- Change configure_mcp.sh default MySQL port from 3307 to 3306
- Change configure_mcp.sh default password from test123 to empty
- Update help text and examples to match new defaults
- Add automatic MCP HTTPS server start/stop based on mcp-enabled flag
- Server starts when mcp-enabled=true and LOAD MCP VARIABLES TO RUNTIME
- Server stops when mcp-enabled=false and LOAD MCP VARIABLES TO RUNTIME
- Validates SSL certificates before starting
- Added to both flush_mcp_variables___database_to_runtime() and
flush_mcp_variables___runtime_to_database() functions
- Update configure_mcp.sh to respect environment variables
- MYSQL_HOST, MYSQL_PORT, MYSQL_USER, MYSQL_PASSWORD
- TEST_DB_NAME (mapped to MYSQL_DATABASE)
- MCP_PORT
- Updated --help documentation with all supported variables
This commit fixes several issues with MCP (Model Context Protocol) variables
not being properly persisted across storage layers and adds support for DISK
commands.
Changes:
1. lib/Admin_FlushVariables.cpp:
- Fixed flush_mcp_variables___runtime_to_database() to properly insert
variables into runtime_global_variables using db->execute() with
formatted strings (matching admin pattern)
- Fixed SQL format string to avoid double-prefix bug (qualified_name
already contains "mcp-" prefix)
- Fixed lock ordering by releasing outer wrlock before calling
runtime_to_database with use_lock=true, then re-acquiring
- Removed explicit BEGIN/COMMIT transactions to match admin pattern
2. lib/Admin_Handler.cpp:
- Added MCP DISK command handlers that rewrite commands to SQL queries:
* LOAD MCP VARIABLES FROM DISK -> INSERT OR REPLACE INTO main.global_variables
* SAVE MCP VARIABLES TO DISK -> INSERT OR REPLACE INTO disk.global_variables
* SAVE MCP VARIABLES FROM MEMORY/MEM -> INSERT OR REPLACE INTO disk.global_variables
- Separated DISK command handlers from MEMORY/RUNTIME handlers
3. lib/ProxySQL_Admin.cpp:
- Added flush_mcp_variables___runtime_to_database() call to stats section
to ensure MCP variables are repopulated when runtime_global_variables
is cleared and refreshed
4. tests/mcp_module-t.cpp:
- Added verbose diagnostic output throughout tests
- Added section headers and test numbers for clarity
- Added variable value logging and error logging
All 52 MCP module tests now pass.
The old read_only_action() implementations were marked for deletion after 2025-07-14.
These were replaced with new implementation that doesn't depend on the admin table.
This change removes the deprecated code paths to clean up the codebase.
The checksum generation caused an assert failure because the MCP module
was not yet added to the checksums_values struct. For now, we skip
checksum generation for MCP until the feature is complete and stable.
Changes:
- Removed flush_GENERIC_variables__checksum__database_to_runtime() call
- Kept flush_mcp_variables___runtime_to_database() to populate runtime_global_variables
- Added comment explaining checksum is skipped until MCP is complete
This allows ProxySQL to start without crashing while MCP is under development.
The crash was caused by incorrect lock ordering. The admin version has:
1. wrlock() (acquire admin lock)
2. Process variables
3. checksum_mutex lock() (acquire checksum lock)
4. flush to runtime + generate checksum
5. checksum_mutex unlock() (release checksum lock)
6. wrunlock() (release admin lock)
The MCP version had the wrong order with the checksum_mutex lock outside
the wrlock/wrunlock region. This also added the missing 'lock' parameter
that exists in the admin version but was missing in MCP.
Changes:
- Added 'lock' parameter to flush_mcp_variables___database_to_runtime()
- Added conditional wrlock()/wrunlock() calls (if lock=true)
- Moved checksum generation inside the wrlock/wrunlock region
- Updated function signature in header file
The MCP module's flush_mcp_variables___database_to_runtime() was missing
the logic to populate runtime_global_variables table. This caused the
table to remain empty even though global_variables was correctly populated.
Following the same pattern as admin variables (line 268), this commit adds:
1. Call to flush_mcp_variables___runtime_to_database(admindb, ..., true)
to populate runtime_global_variables
2. Checksum generation for cluster sync
After this fix, both global_variables and runtime_global_variables will
contain MCP variables after ProxySQL startup.
The MCP module was not being loaded because:
1. The admin bootstrap process was not calling flush_mcp_variables___database_to_runtime
- Added the call after flush_sqliteserver_variables___database_to_runtime
2. There was no SHOW MCP VARIABLES command handler
- Added the handler in Admin_Handler.cpp, following the same pattern as
SHOW MYSQL VARIABLES and SHOW PGSQL VARIABLES
Now after this change:
- MCP variables (mcp-enabled, mcp-port, mcp-mysql_hosts, etc.) will be
automatically inserted into global_variables table during ProxySQL startup
- Users can run "SHOW MCP VARIABLES" to list all MCP configuration variables
- The configure_mcp.sh script will work correctly
Note: Requires rebuilding ProxySQL for changes to take effect.
- Change mcp-catalog_path default from /var/lib/proxysql/mcp_catalog.db to mcp_catalog.db
- SQLite accepts relative paths, which are resolved relative to the process working directory
- ProxySQL's working directory is its datadir, so the catalog will be stored there
- Update configure_mcp.sh to set mcp-catalog_path='mcp_catalog.db'
- Update lib/MCP_Thread.cpp default to "mcp_catalog.db"
- Update README.md to document relative path behavior
Added missing documentation for MySQL connection pool implementation:
Header (MySQL_Tool_Handler.h):
- Added MySQLConnection struct documentation with member descriptions
- Added member variable documentation using ///< Doxygen style
Implementation (MySQL_Tool_Handler.cpp):
- Added Doxygen blocks for close() method
- Added Doxygen blocks for init_connection_pool() with detailed behavior
- Added Doxygen blocks for get_connection() with thread-safety notes
- Added Doxygen blocks for return_connection() with reuse behavior
- Added Doxygen blocks for execute_query() with JSON format documentation
All new connection pool methods now have complete @brief, @param, and
@return documentation following Doxygen conventions.
Added built-in connection pool to MySQL_Tool_Handler for direct MySQL
connections to backend servers.
Changes:
- Added MySQLConnection struct with MYSQL* pointer, host, port, in_use flag
- Added connection_pool vector, pool_lock mutex, pool_size counter
- Implemented init_connection_pool() to create MYSQL connections using mysql_init/mysql_real_connect
- Implemented get_connection() and return_connection() with thread-safe locking
- Implemented execute_query() helper method for executing SQL and returning JSON results
- Updated tool methods to use actual MySQL connections:
- list_schemas: Query information_schema.schemata
- list_tables: Query information_schema.tables with metadata
- describe_table: Query columns, primary keys, indexes
- sample_rows: Execute SELECT with LIMIT
- sample_distinct: Execute SELECT DISTINCT with GROUP BY
- run_sql_readonly: Execute validated SELECT queries
- explain_sql: Execute EXPLAIN queries
- Fixed MYSQL forward declaration (use typedef struct st_mysql MYSQL)
The connection pool creates one connection per configured host:port pair
with 5-second timeouts for connect/read/write operations.
When SET commands use boolean literals (true/false), SQLite was
interpreting them as boolean keywords and storing 1/0 instead of
the string values "true"/"false".
Fixed by detecting boolean literals in admin_handler_command_set()
and quoting them as strings in the UPDATE statement.
All 52 MCP module TAP tests now pass.
- Add MCP variables to load_save_disk_commands map for LOAD/SAVE commands
- Add MCP variable validation in is_valid_global_variable() for SET commands
- Implement has_variable() method in MCP_Threads_Handler
- Add CHECKSUM command handlers for MCP VARIABLES (DISK/MEMORY/MEM)
Test results improved from 28 passed / 16 failed to 49 passed / 3 failed.
Remaining 3 failures are test expectation issues (boolean representation).
Remove unnecessary inheritance from MySQL_Threads_Handler. The MCP module
should be independent and not depend on MySQL/PostgreSQL thread handlers.
Changes:
- MCP_Threads_Handler now manages its own pthread_rwlock_t for synchronization
- Simplified init() signature (removed unused num/stack parameters)
- Added ProxySQL_Main_init_MCP_module() call in main initialization phase
- Include only standard C++ headers (pthread.h, cstring, cstdlib)
Add new MCP module supporting multiple MCP server endpoints over HTTPS
with JSON-RPC 2.0 protocol skeleton. Each endpoint (/mcp/config,
/mcp/observe, /mcp/query, /mcp/admin, /mcp/cache) is a distinct MCP
server with its own authentication configuration.
Features:
- HTTPS server using existing ProxySQL TLS certificates
- JSON-RPC 2.0 skeleton implementation (actual protocol TBD)
- 5 MCP endpoints with per-endpoint auth configuration
- LOAD/SAVE MCP VARIABLES admin commands
- Configuration file support (mcp_variables section)
Implementation follows GenAI module pattern:
- MCP_Threads_Handler: Main module handler with variable management
- ProxySQL_MCP_Server: HTTPS server wrapper using libhttpserver
- MCP_JSONRPC_Resource: Base endpoint class with JSON-RPC skeleton
This commit addresses critical issues identified in the code review:
1. Fix non-blocking read handling:
- lib/GenAI_Thread.cpp (listener_loop): Properly handle EAGAIN/EWOULDBLOCK
- Return early on EAGAIN/EWOULDBLOCK instead of closing connection
- Handle EOF (n==0) separately from errors (n<0)
- lib/MySQL_Session.cpp (handle_genai_response): Properly handle EAGAIN/EWOULDBLOCK
- Return early on EAGAIN/EWOULDBLOCK instead of cleaning up request
- Use goto for cleaner control flow
2. Refactor JSON building/parsing to use nlohmann/json:
- lib/GenAI_Thread.cpp (call_llama_batch_embedding):
- Replace manual stringstream JSON building with nlohmann/json
- Replace fragile string-based parsing with nlohmann/json::parse()
- Support multiple response formats (results, data, embeddings)
- Add proper error handling with try/catch
- lib/GenAI_Thread.cpp (call_llama_rerank):
- Replace manual stringstream JSON building with nlohmann/json
- Replace fragile string-based parsing with nlohmann/json::parse()
- Support multiple response formats and field names
- Add proper error handling with try/catch
These changes:
- Fix potential connection drops due to incorrect EAGAIN handling
- Improve security and robustness of JSON handling
- Reduce code complexity and improve maintainability
- Add support for multiple API response formats
- Add check_genai_events() function for non-blocking epoll_wait on GenAI response fds
- Integrate GenAI event checking into main handler() WAITING_CLIENT_DATA case
- Add goto handler_again to process multiple GenAI responses in one iteration
The async GenAI architecture is now fully integrated. MySQL threads no longer
block when processing GENAI: queries - they send requests asynchronously via
socketpair and continue processing other queries while GenAI workers handle
the embedding/reranking operations.
- Add GenAI_RequestHeader and GenAI_ResponseHeader protocol structures for socketpair communication
- Implement GenAI listener_loop to read requests from epoll and queue to workers
- Implement GenAI worker_loop to process requests and send responses via socketpair
- Add GenAI_PendingRequest state management to MySQL_Session/Base_Session
- Implement MySQL_Session async handlers: genai_send_async(), handle_genai_response(), genai_cleanup_request()
- Modify MySQL_Session genai handler to use async path when epoll is available
- Initialize GenAI epoll fd in Base_Session::init()
This completes the async architecture that was planned but never fully implemented
(previously had only placeholder comments). The GenAI module now processes
requests asynchronously without blocking MySQL threads.
- Fix double prefix bug in genai_thread_variables_names[] where variable
names included the "genai_" prefix, but flush functions added "genai-"
prefix, creating names like "genai-genai_threads"
- Update get_variable() and set_variable() to use names without prefix
- Add comprehensive TAP tests for GenAI embedding and reranking with 40 tests
covering configuration, single/batch embedding, reranking, error handling,
and GENAI: query syntax variations
- Fix test expectations for leading space behavior (should be rejected)
- Add tests for genai-embedding_timeout_ms and genai-rerank_timeout_ms
Move all JSON parsing and operation routing logic from MySQL_Session to
GenAI module. MySQL_Session now simply passes GENAI: queries to the GenAI
module via process_json_query(), which handles everything autonomously.
This simplifies the architecture and achieves better separation of concerns:
- MySQL_Session: Detects GENAI: prefix and forwards to GenAI module
- GenAI module: Handles JSON parsing, operation routing, and result formatting
Changes:
- GenAI_Thread.h: Add GENAI_OP_JSON operation type, json_query field, and
process_json_query() method declaration
- GenAI_Thread.cpp: Implement process_json_query() with embed/rerank support
and document_from_sql framework (stubbed for future MySQL connection handling)
- MySQL_Session.cpp: Simplify genai handler to just call process_json_query()
and parse JSON result (reduces net code by ~215 lines)
This commit refactors the experimental GenAI query syntax to use a single
GENAI: keyword with type-based operations instead of separate EMBED: and RERANK: keywords.
Changes:
- Replace EMBED: and RERANK: detection with unified GENAI: detection
- Merge genai_embedding and genai_rerank handlers into single genai handler
- Add 'type' field to operation JSON ("embed" or "rerank")
- Add 'columns' field for rerank operation (2 or 3, default 3)
- columns=2: Returns only index and score
- columns=3: Returns index, score, and document (default)
Old syntax:
EMBED: ["doc1", "doc2"]
RERANK: {"query": "...", "documents": [...], "top_n": 5}
New syntax:
GENAI: {"type": "embed", "documents": ["doc1", "doc2"]}
GENAI: {"type": "rerank", "query": "...", "documents": [...], "top_n": 5, "columns": 2}
This provides a cleaner, more extensible API for future GenAI operations.
This commit adds experimental support for reranking documents directly
from MySQL queries using a special RERANK: syntax.
Changes:
- Add handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY___genai_rerank()
- Add RERANK: query detection alongside EMBED: detection
- Implement JSON parsing for query, documents array, and optional top_n
- Build resultset with index, score, and document columns
- Use MySQL ERR_Packet for error handling
Query format: RERANK: {"query": "search query", "documents": ["doc1", "doc2", ...], "top_n": 5}
Result format: 1 row per result, 3 columns (index, score, document)
This commit adds experimental support for generating embeddings directly
from MySQL queries using a special EMBED: syntax.
Changes:
- Add MYDS_INTERNAL_GENAI to MySQL_DS_type enum for GenAI connections
- Add handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY___genai_embedding()
- Implement EMBED: query detection and JSON parsing for document arrays
- Build CSV resultset with embeddings (1 row per document, 1 column)
- Add myconn NULL check in MySQL_Thread for INTERNAL_GENAI type
- Add "debug_genai" name to debug module array
- Remove HAVE_LIBCURL checks (libcurl is always statically linked)
- Use static curl header: "curl/curl.h" instead of <curl/curl.h>
- Remove curl_global_cleanup() from GenAI module (should only be in main())
Query format: EMBED: ["doc1", "doc2", ...]
Result format: 1 row per document, 1 column with CSV embeddings
Error handling uses MySQL ERR_Packet instead of resultsets.
Enhance ProxySQL_Poll class documentation with detailed usage patterns:
- lib/ProxySQL_Poll.cpp: Enhanced file-level documentation with architecture
overview, template specialization, memory management, and event processing
pipeline explanations
- lib/MySQL_Thread.cpp: Added usage documentation for listener registration,
removal patterns, client session setup, and main poll loop
- lib/PgSQL_Thread.cpp: Added equivalent PostgreSQL usage documentation
mirroring MySQL patterns with protocol-specific details
- lib/mysql_data_stream.cpp: Documented cleanup, receive activity tracking,
and send activity tracking patterns
- lib/PgSQL_Data_Stream.cpp: Documented equivalent PostgreSQL data stream
patterns for cleanup and activity tracking
All documentation is placed directly where code is used, avoiding specific
line numbers for better maintainability. Includes comprehensive explanations
of when, why, and how ProxySQL_Poll methods are used throughout ProxySQL's
event-driven architecture.
[skip-ci]
Enhance ProxySQL_Poll class documentation with detailed usage patterns:
- lib/ProxySQL_Poll.cpp: Enhanced file-level documentation with architecture
overview, template specialization, memory management, and event processing
pipeline explanations
- lib/MySQL_Thread.cpp: Added usage documentation for listener registration,
removal patterns, client session setup, and main poll loop
- lib/PgSQL_Thread.cpp: Added equivalent PostgreSQL usage documentation
mirroring MySQL patterns with protocol-specific details
- lib/mysql_data_stream.cpp: Documented cleanup, receive activity tracking,
and send activity tracking patterns
- lib/PgSQL_Data_Stream.cpp: Documented equivalent PostgreSQL data stream
patterns for cleanup and activity tracking
All documentation is placed directly where code is used, avoiding specific
line numbers for better maintainability. Includes comprehensive explanations
of when, why, and how ProxySQL_Poll methods are used throughout ProxySQL's
event-driven architecture.
[skip-ci]
This change adds compile-time detection and fallback to poll() on systems
that don't support epoll(), improving portability across different platforms.
Header changes (include/GenAI_Thread.h):
- Make sys/epoll.h include conditional on #ifdef epoll_create1
Implementation changes (lib/GenAI_Thread.cpp):
- Add poll.h include for poll() support
- Add EPOLL_CREATE compatibility macro (epoll_create1 or epoll_create)
- Add #include <poll.h> for poll() support
- Update init() to use pipe() for wakeup when epoll is not available
- Update register_client() to skip epoll_ctl when epoll is not available
- Update unregister_client() to skip epoll_ctl when epoll is not available
- Update listener_loop() to use poll() when epoll is not available
The compile-time detection works by checking if epoll_create1 is defined
(Linux-specific glibc function since 2.9). On systems without epoll, the
code falls back to using poll() with a pipe for wakeup signaling.
The bug was checking query_no_space[5] == 'A' for GENAI commands,
but position 5 in "SAVE GENAI VARIABLES" is 'G', not 'A'.
Fixed two locations:
1. LOAD/SAVE VARIABLES command handler (line 1659)
2. LOAD FROM CONFIG command handler (line 1734)
All GenAI admin commands now work correctly:
- SAVE GENAI VARIABLES TO DISK
- LOAD GENAI VARIABLES FROM DISK
- SAVE GENAI VARIABLES FROM RUNTIME
- LOAD GENAI VARIABLES TO RUNTIME
- SAVE GENAI VARIABLES TO MEMORY
- LOAD GENAI VARIABLES FROM MEMORY
- LOAD GENAI VARIABLES FROM CONFIG
PostgreSQL allows a Bind message to specify a single parameter format
(num_param_formats = 1), which applies to all parameters.
libpq, however, always expects a format entry per parameter and previously
sent uninitialized values for the remaining parameters when only one format
was specified. This caused ProxySQL to forward malformed Bind packets to
backend.
ProxySQL now detects this case and propagates the single provided parameter
format to all parameters, matching PostgreSQL semantics.
Implement a new GenAI module for ProxySQL with basic infrastructure:
- GenAI_Threads_Handler class for managing GenAI module configuration
- Support for genai- prefixed variables in global_variables table
- Dummy variables: genai-var1 (string) and genai-var2 (integer)
- Config file support via genai_variables section
- Flush functions for runtime_to_database and database_to_runtime
- Module lifecycle: initialization at startup, graceful shutdown
- LOAD/SAVE GENAI VARIABLES admin command infrastructure
Core functionality verified:
- Config file loading works
- Variables persist in global_variables table
- Disk save/load via SQL works
- Module initializes and shuts down properly
Related files:
- include/GenAI_Thread.h: New GenAI thread handler class
- lib/GenAI_Thread.cpp: Implementation with dummy variables
- lib/Admin_Handler.cpp: Added GENAI command vectors and handlers
- lib/Admin_FlushVariables.cpp: Added genai flush functions
- lib/ProxySQL_Admin.cpp: Added init_genai_variables() and load_save_disk_commands entry
- include/proxysql_admin.h: Added function declarations
- lib/Makefile: Added GenAI_Thread.oo to build
- src/main.cpp: Added module initialization and cleanup
- src/proxysql.cfg: Added genai_variables configuration section
This commit fixes a parsing error in the MySQL SET statement parser that
occurred when processing `SET time_zone` statements with:
1. Three-component IANA timezone names (e.g., America/Argentina/Buenos_Aires)
2. Timezone names containing hyphens (e.g., America/Port-au-Prince)
Previously, the regex pattern `(?:\w+/\w+)` only matched 2-component
timezone names and did not support hyphens. This caused parsing errors
logged as:
"[ERROR] Unable to parse query. If correct, report it as a bug:
SET time_zone=\"America/Argentina/Buenos_Aires\";"
When multiplexing is enabled, this bug causes timestamps to be incorrectly
written to the database.
Changes:
- Updated timezone regex from `(?:\w+/\w+)` to `(?:[\w-]+(?:/[\w-]+){1,2})`
- Supports 2-3 components: Area/Location or Area/Country/Location
- Supports hyphens in component names (e.g., Port-au-Prince)
- Added comprehensive Doxygen documentation for timezone parsing
- Extended TAP test cases with new timezone formats
Note: Bare words like 'SYSTEM' and 'UTC' were already supported via
other patterns in the parser (vp2 pattern for word matching).
Fixes: #4993
Related: gemini-code-assist review comments
This commit addresses all review comments from gemini-code-assist on PR #5279:
1. Fixed FLUSH LOGS documentation - clarified that file is reopened for
appending, not truncating, and updated the note about preserving contents
2. Fixed callback documentation - clarified that the callback attaches to
all frontend connections, not just admin connections
3. Updated security warning - focused on passive eavesdropping and offline
decryption as the primary threats
4. Fixed typo: proxyql_ip -> proxysql_ip in tcpdump example
5. Removed misleading @see HPKP link - HPKP is unrelated to NSS Key Log
Format and is a deprecated feature
6. Updated NSS Key Log Format URL to use official MDN link instead of
unofficial mirror
7. Fixed buffer size comment to accurately reflect 256-byte buffer and
254-byte line length validation
8. Clarified fputs comment to emphasize the read lock's role in allowing
concurrent writes from multiple threads
This commit addresses critical issues identified in PR #5276 by
gemini-code-assist's code review, which could undermine the goal of
being allocation-free and cause hangs or silent failures.
Bug 1: Vector Passed by Value (Critical)
------------------------------------------
The function took std::vector<int> excludeFDs by value, causing heap
allocation during the copy operation. This undermines the PR's goal of
avoiding heap allocations after fork() to prevent deadlocks in
multi-threaded programs.
Fix: Change to pass by const reference to avoid heap allocation.
void close_all_non_term_fd(const std::vector<int>& excludeFDs)
Bug 2: Infinite Loop Risk (Critical)
------------------------------------
The loop used unsigned int for the variable while comparing against
rlim_t (unsigned long long). If rlim_cur exceeded UINT_MAX, this would
create an infinite loop.
Fix: Use rlim_t type for the loop variable and cap at INT_MAX.
for (rlim_t fd_rlim = 3; fd_rlim < nlimit.rlim_cur && fd_rlim <= INT_MAX; fd_rlim++)
Bug 3: close_range() Detection Logic (High)
------------------------------------------
The original detection logic had two problems:
1. Executed close_range syscall twice on first successful call
2. Incorrectly cached availability on transient failures (EINTR),
leaving file descriptors open without fallback
Fix: Reordered logic to only cache on success, allow retry on
transient failures. Only cache as "not available" on ENOSYS.
For other errors (EBADF, EINVAL, etc.), don't cache - might be transient.
Files Modified
--------------
- include/proxysql_utils.h
- lib/proxysql_utils.cpp
This commit adds extensive documentation for the ssl_keylog_file feature
(introduced in PR #4236), which enables TLS key logging for debugging
encrypted traffic.
## Background
The ssl_keylog_file variable (exposed as admin-ssl_keylog_file in SQL
interface) allows ProxySQL to write TLS secrets to a file in NSS Key Log
Format. These secrets can be used by tools like Wireshark and tshark to
decrypt and analyze TLS traffic for debugging purposes.
## Changes
### Inline Documentation (Code)
1. include/proxysql_sslkeylog.h (+96 lines)
- File-level documentation explaining the module purpose and security
- Doxygen comments for all 5 public APIs
- Thread-safety annotations
- Parameter descriptions and return values
2. lib/proxysql_sslkeylog.cpp (+136 lines)
- Implementation-level documentation
- Algorithm explanations (double-checked locking, thread safety)
- Reference to NSS Key Log Format specification
3. include/proxysql_admin.h (+19 lines)
- Variable documentation for ssl_keylog_file
- Path handling rules (absolute vs relative)
- Security implications
### Developer Documentation (doc/ssl_keylog/ssl_keylog_developer_guide.md)
Target audience: Developers working on ProxySQL codebase
Contents:
- Variable naming convention (SQL vs config file vs internal)
- Architecture diagrams
- Thread safety model (pthread rwlock)
- NSS Key Log Format specification
- Complete API reference for all public functions
- Integration points in the codebase
- Security considerations and code review checklist
- Testing procedures
### User Documentation (doc/ssl_keylog/ssl_keylog_user_guide.md)
Target audience: End users and system administrators
Contents:
- What is SSL key logging and when to use it
- Variable naming: admin-ssl_keylog_file (SQL) vs ssl_keylog_file (config)
- Step-by-step enable/disable instructions
- Path resolution (absolute vs relative)
- Log rotation procedures
- Production workflow: tcpdump capture → offline analysis
- Wireshark (GUI) integration tutorial
- tshark (command-line) usage examples
- Troubleshooting common issues
- Security best practices
- Quick reference card
## Key Features Documented
1. **Variable Naming Convention**
- SQL interface: SET admin-ssl_keylog_file = '/path';
- Config file: ssl_keylog_file='/path' (in admin_variables section)
- Internal code: ssl_keylog_file
2. **Production Workflow**
- Capture traffic with tcpdump (no GUI on production server)
- Transfer pcap + keylog to analysis system
- Analyze offline with Wireshark (GUI) or tshark (CLI)
3. **tshark Examples**
- Command-line analysis of encrypted traffic
- Filter examples for debugging TLS issues
- JSON export for automated analysis
## Security Notes
The documentation emphasizes that:
- Key log files contain cryptographic secrets that decrypt ALL TLS traffic
- Access must be restricted (permissions 0600)
- Only enable for debugging, never in production
- Securely delete old key log files
## Files Modified
- include/proxysql_admin.h
- include/proxysql_sslkeylog.h
- lib/proxysql_sslkeylog.cpp
## Files Added
- doc/ssl_keylog/ssl_keylog_developer_guide.md
- doc/ssl_keylog/ssl_keylog_user_guide.md
Since ProxySQL 3.0.4, SELECT VERSION() queries were intercepted and returned
ProxySQL's mysql-server_version variable instead of proxying to backends.
This broke SQLAlchemy for MariaDB which expects "MariaDB" in the version
string.
This commit adds a new variable `mysql-select_version_forwarding` with 4 modes:
- 0 = never: Always return ProxySQL's version (3.0.4+ behavior)
- 1 = always: Always proxy to backend (3.0.3 behavior)
- 2 = smart (fallback to 0): Try backend connection, else ProxySQL version
- 3 = smart (fallback to 1): Try backend connection, else proxy (default)
The implementation includes:
- New global variable mysql_thread___select_version_forwarding
- New function get_backend_version_for_hostgroup() to peek at backend
connection versions without removing them from the pool
- Modified SELECT VERSION() handler to support all 4 modes
- ProxySQL backend detection to avoid recursion
Mode 3 (default) ensures SQLAlchemy always gets the real MariaDB version
string while maintaining fast response when connections are available.
This commit fixes two critical bugs in close_all_non_term_fd() that caused
undefined behavior and potential deadlocks when called after fork() before
execve() in multi-threaded programs.
Bug 1: Self-Referential Directory FD Closure
----------------------------------------------
When iterating through /proc/self/fd, opendir() creates a file descriptor
for the directory stream. This fd appears in the enumeration while we're
iterating, and if we close it, readdir() operates on a corrupted DIR*
stream, causing undefined behavior, crashes, or missed file descriptors.
Fix: Use dirfd() to obtain the directory's fd and explicitly skip closing it.
Bug 2: Heap Allocation After fork() in Multi-Threaded Programs
----------------------------------------------------------------
In multi-threaded programs, when fork() is called while other threads hold
malloc locks, the child process inherits a "frozen" state where those locks
remain locked (the owning threads don't exist in the child). Any heap
allocation (malloc/free/new/delete) in the child before execve() can deadlock.
The original code used:
- std::stol(std::string(dir->d_name)) - creates a temporary std::string
- std::find() - may allocate internally
Fix: Replace with heap-allocation-free alternatives:
- atoi(dir->d_name) instead of std::stol(std::string(...))
- Simple C loops instead of std::find()
Additional Improvements
-----------------------
1. Added close_range() syscall support (Linux 5.9+) with runtime detection
- O(1) atomic operation, most efficient method
- Only used when excludeFDs is empty (closes all fds >= 3)
- Falls back to /proc/self/fd iteration when excludeFDs is non-empty
2. Added extensive doxygen documentation covering:
- Security implications (preventing fd leaks to child processes)
- Resource management (preventing fd exhaustion)
- Deadlock prevention in multi-threaded fork() contexts
- Implementation details (three strategies: close_range, /proc/self/fd, rlimit)
- fork() safety design considerations
- Example usage and portability notes
3. Added required includes: dirent.h, sys/syscall.h, linux/close_range.h
Workflow Safety
---------------
The function is now safe to use in the common fork() -> close_all_non_term_fd()
-> execve() workflow, even in multi-threaded programs.
Files Modified
--------------
- lib/proxysql_utils.cpp
* Change MySQL_Monitor_Connection_Pool::put_connection signature to accept MySQL_Monitor_State_Data* instead of raw MYSQL*/port.
* Centralize access to mysql and port via mmsd, reducing parameter mismatch and misuse.
* Improve DEBUG bookkeeping: ensure connections are properly unregistered from the global debug registry with clearer assertions and logs.
* Add consistent proxy_debug messages for connection register/unregister events.
* Simplify server lookup/creation logic when returning connections to the pool.
* Fix ordering of error handling to always unregister before closing connections.
* Minor cleanup: remove unused labels/variables and modernize casts.
* This refactor improves correctness, debuggability, and safety of monitor connection lifecycle management.
Add support for sqlite-rembed Rust SQLite extension to enable
text embedding generation from remote AI APIs (OpenAI, Nomic,
Ollama, Cohere, etc.) within ProxySQL's SQLite3 Server.
Changes:
1. Build system integration for Rust static library compilation
- Rust toolchain detection in deps/Makefile
- Static library target: sqlite3/libsqlite_rembed.a
- Linking integration in lib/Makefile and src/Makefile
2. Extension auto-registration in Admin_Bootstrap.cpp
- Declare sqlite3_rembed_init() extern C function
- Register via sqlite3_auto_extension() after sqlite-vec
3. Documentation updates
- doc/sqlite-rembed-integration.md: comprehensive integration guide
- doc/SQLite3-Server.md: usage examples and provider list
4. Source code inclusion
- deps/sqlite3/sqlite-rembed-source/: upstream sqlite-rembed v0.0.1-alpha.9
The integration follows the same pattern as sqlite-vec (static linking
with auto-registration). Provides rembed() function and temp.rembed_clients
virtual table for embedding generation.
Build requires Rust toolchain (cargo, rustc) and clang/libclang-dev.
This commit integrates sqlite-vec (https://github.com/asg017/sqlite-vec)
as a statically linked extension, enabling vector search capabilities
in all ProxySQL SQLite databases (admin, stats, config, monitor).
Changes:
1. Added sqlite-vec source files to deps/sqlite3/sqlite-vec-source/
- sqlite-vec.c: main extension source
- sqlite-vec.h: header for static linking
- sqlite-vec.h.tmpl: template header
2. Modified deps/Makefile:
- Added target sqlite3/sqlite3/vec.o that copies sources and compiles
with flags -DSQLITE_CORE -DSQLITE_VEC_STATIC
- Made sqlite3 target depend on vec.o
3. Modified lib/Makefile:
- Added $(SQLITE3_LDIR)/vec.o to libproxysql.a prerequisites
- Included vec.o in the static library archive
4. Modified lib/Admin_Bootstrap.cpp:
- Added extern "C" declaration for sqlite3_vec_init
- Enabled load extension support for all databases:
- admindb, statsdb, configdb, monitordb, statsdb_disk
- Registered sqlite3_vec_init as auto-extension at database open
(replacing commented sqlite3_json_init)
5. Updated top-level Makefile:
- Made GIT_VERSION fallback to git describe --always when tags missing
Result:
- Vector search functions (vec0 virtual tables, vector operations) are
available in all ProxySQL SQLite databases without runtime dependencies
- No separate shared library required; fully embedded in proxysql binary
- Extension automatically loaded at database initialization
Logging messages now include 'client address', 'session status' and
'data stream status'. Client address is also logged when OK packets are
dispatched, this should help tracking if a client has received the
expected packets or not.
Implements a workaround for the handling of unexpected 'COM_PING'
packets received during query processing, while a resultset is yet being
streamed to the client. Received 'COM_PING' packets are queued in the
form of a counter. This counter is later used to sent the corresponding
number of 'OK' packets to the client after 'MySQL_Session' has finished
processing the current query.
This commit documents:
1. The vacuum_stats() function's purpose, behavior, and the reason why
stats_pgsql_stat_activity is excluded from bulk deletion operations
2. The fact that stats_pgsql_stat_activity is a SQL VIEW (not a table)
and attempting DELETE on it would cause SQLite error:
"cannot modify stats_pgsql_stat_activity because it is a view"
The documentation explains:
- Why TRUNCATE stats_mysql_query_digest triggers vacuum_stats(true)
- Why both MySQL and PostgreSQL tables are cleared regardless of protocol
- How the view is automatically cleared via its underlying table
stats_pgsql_processlist
- The importance of keeping the view excluded from deletion lists
The `cache_empty_result` field in query rules has three possible values:
• -1: Use global setting (`query_cache_stores_empty_result`)
• 0: Do NOT cache empty resultsets, but cache non-empty resultsets
• 1: Always cache resultsets (both empty and non-empty)
Previously, when `cache_empty_result` was set to 0, nothing was cached at all,
even for non-empty resultsets. This prevented users from disabling caching
for empty resultsets while still allowing caching of non-empty resultsets
on a per-rule basis.
Changes:
1. Modified caching logic in MySQL_Session.cpp and PgSQL_Session.cpp to
add the condition `(qpo->cache_empty_result == 0 && MyRS->num_rows)`
(MySQL) and `(qpo->cache_empty_result == 0 && num_rows)` (PgSQL)
to allow caching when cache_empty_result=0 AND result has rows.
2. Added comprehensive Doxygen documentation in query_processor.h explaining
the semantics of cache_empty_result values.
3. Updated Query_Processor.cpp with inline comments explaining the
three possible values.
Now when cache_empty_result is set to 0:
- Empty resultsets (0 rows) are NOT cached
- Non-empty resultsets (>0 rows) ARE cached
- This matches the intended per-rule behavior described in issue #5248.
Fixes: https://github.com/sysown/proxysql/issues/5248
Replace sprintf-based SQL query construction with prepared statements using
bound parameters to prevent SQL injection attacks. This addresses the security
issue identified in PR #5247 review.
Changes:
- Use SQLite prepared statement with placeholders ?1, ?2
- Bind variable names and values securely using proxy_sqlite3_bind_text
- Use ASSERT_SQLITE_OK for error handling as per ProxySQL conventions
- Remove malloc/sprintf vulnerable code pattern
- Add necessary includes for SQLite functions and ASSERT_SQLITE_OK macro
Security: SQL injection could have occurred if configuration variable names
or values contained malicious quotes. Prepared statements eliminate this risk.
This commit adds detailed Doxygen documentation for:
1. The ProxySQL_Config class - describes its role in configuration management
2. The Read_Global_Variables_from_configfile() method - documents its behavior,
parameters, return value, and the automatic prefix stripping feature
The documentation explains the automatic prefix stripping behavior that handles
cases where users mistakenly include module prefix (e.g., "mysql-") in variable
names within configuration files.
The previous implementation stripped the prefix before calling
group.lookupValue(), which would fail because the config file
contains the prefixed name (e.g., "mysql-log_unhealthy_connections").
The lookup must use the original name from the config file.
This commit moves the prefix stripping logic to after the value
lookup but before constructing the SQL query, ensuring both:
1. The correct value is retrieved from the config using the
original prefixed name
2. The variable is stored in the database with a single prefix
Also includes a test to verify the fix works for mysql_variables,
pgsql_variables, and admin_variables sections.
When users mistakenly include the module prefix (e.g., mysql-log_unhealthy_connections)
in the mysql_variables section, the variable gets stored with a double prefix
(e.g., mysql-mysql-log_unhealthy_connections). This fix automatically strips
the prefix if present, ensuring variables are stored correctly.
The same logic applies to pgsql_variables (pgsql-) and admin_variables (admin-).
Fixes#5246
Allow permanent fast-forward sessions (SESSION_FORWARD_TYPE_PERMANENT)
to continue processing when bidirectional data flow is detected,
instead of treating it as a fatal error. This prevents unnecessary
session termination in these specific cases while maintaining the
original strict validation for all other session types.
This change introduces PostgreSQL-aware tokenization by adding support for dollar-quoted strings, PostgreSQL’s double-quoted identifiers, and its comment rules. The tokenizer now correctly parses $$…$$ and $tag$…$tag$, treats " as an identifier delimiter in PostgreSQL, disables MySQL-only # comments, and accepts -- as a comment starter without requiring a trailing space. All new behavior is fully isolated behind the dialect flag to avoid impacting MySQL parsing.
Add PostgreSQL dollar-quoted strings
* New parser state: st_dollar_quote_string.
* Recognizes $$ … $$ and $tag$ … $tag$ sequences.
* Tracks opening tag and searches for matching terminator.
* Normalizes entire literal to ?.
* Integrated into get_next_st() and stage_1_parsing().
The get_status_variable() function was only scanning worker threads
but ignoring auxiliary threads (idle threads) where timeout
terminations are detected. This caused the timeout termination
counter to show incorrect/zero values.
- Added idle thread scanning to both overloaded versions of
get_status_variable() function
- Now properly collects metrics from both worker and idle threads
- Fixes the issue where proxysql_mysql_timeout_terminated_connections_total
showed zero despite actual timeout terminations
Resolves the metrics reading issue identified in the previous commits.
Enhance logging clarity:
- Replace generic IP address with detailed connection info including IP and port
- Use client_myds->addr.addr and client_myds->addr.port for precise identification
- Improve debuggability of timeout clamping and enforcement warnings
The warning messages now provide complete connection details, making it easier
to identify and troubleshoot timeout-related issues in ProxySQL logs.
Code improvements:
- Extract SESS_TO_SCAN_idle_thread constant to header file for better maintainability
- Replace magic number 128 with named constant in idle_thread_to_kill_idle_sessions()
- Improve code readability and consistency in session scanning logic
Test enhancements:
- Add mysql-poll_timeout configuration for more precise timeout testing
- Reduce test sleep times to 13 seconds for faster test execution
- Add diagnostic messages to clearly show timeout configurations in test output
- Ensure tests properly validate timeout enforcement with precise timing
The changes improve code maintainability and make tests more reliable and faster
while maintaining accurate timeout validation.
Key improvements:
- Fix timeout comparison in MySQL_Thread::idle_thread_to_kill_idle_sessions() to prevent underflow
- Use effective wait_timeout (minimum of global and session values) for idle timeout calculations
- Add proper newline characters to proxy_warning messages for consistent log formatting
- Increase test sleep times to account for global timeout enforcement
- Fix session timeout test durations to properly test timeout behavior
Technical changes:
- Replace broken min_idle calculation with proper effective wait_timeout logic
- Add std::min() usage to determine effective timeout from global and session values
- Ensure warning messages end with newline characters for proper log formatting
- Update test sleep durations to ensure proper timeout testing
Resolves potential timeout calculation bugs and ensures consistent timeout enforcement behavior.
- Add range validation for client SET wait_timeout commands
- Implement clamping between 1 second (1000ms) and 20 days (1,728,000,000ms)
- Add warning messages when values are clamped due to ProxySQL limits
- Maintain MySQL compatibility by accepting larger values than global config
- Fix signed/unsigned comparison warning in wait_timeout assignment
- Ensures client applications don't break while enforcing safety limits
- Add wait_timeout member variable declaration to Base_Session class
- Fix constructor initialization to use this->wait_timeout
- Fix assignment in handler to properly scope member variable
- Resolves compilation error for wait_timeout functionality
Enhance the match_ff_req_options function to better handle CLIENT_DEPRECATE_EOF
flag validation in fast forward replication scenarios. The function now performs
a more robust check by examining the actual MySQL command type when the initial
CLIENT_DEPRECATE_EOF flags don't match between frontend and backend connections.
Key improvements:
- Special handling for binlog-related commands (_MYSQL_COM_BINLOG_DUMP,
_MYSQL_COM_BINLOG_DUMP_GTID, _MYSQL_COM_REGISTER_SLAVE) that should be
allowed even when CLIENT_DEPRECATE_EOF flags don't match
- Proper packet parsing to extract and validate MySQL command types
- Enhanced compatibility for fast forward replication connections with
mixed deprecate EOF configurations
This change ensures that ProxySQL can handle more complex replication
scenarios while maintaining proper protocol validation.
PROBLEM:
The initial fix used a DDL detection approach which required maintaining a list
of query types that should return 0 affected rows. This approach was brittle
and could miss edge cases like commented queries or complex statements.
SOLUTION:
Instead of detecting DDL queries, use sqlite3_total_changes64() to measure the
actual change count before and after each query execution. The difference between
total_changes before and after represents the true affected rows count for the
current query, regardless of query type.
CHANGES:
- Added proxy_sqlite3_total_changes64 function pointer and initialization
- Rewrote execute_statement() and execute_statement_raw() to use total_changes
difference approach
- This automatically handles all query types (DDL, DML, comments, etc.)
- Added comprehensive TAP test covering INSERT, CREATE, DROP, VACUUM, UPDATE, and
BEGIN operations
BENEFITS:
- More robust and accurate than DDL detection approach
- Handles edge cases like commented queries automatically
- No maintenance overhead for new query types
- Simpler and cleaner implementation
- Still fixes both Admin interface and SQLite3 Server
This approach is mathematically sound: affected_rows = total_changes_after -
total_changes_before, which gives the exact number of rows changed by the current
query execution.
Fixes#4855
Problem:
When executing DDL queries (CREATE TABLE, DROP TABLE, VACUUM, etc.) in the
ProxySQL Admin interface after DML operations, the affected rows count from
the previous DML operation was incorrectly reported instead of 0. This is
because SQLite's sqlite3_changes() function doesn't reset the counter for
DDL statements.
Root Cause:
SQLite's sqlite3_changes() returns the number of rows affected by the most
recent INSERT, UPDATE, or DELETE statement. For DDL statements that don't
modify rows, SQLite doesn't reset this counter, so it continues to return
the value from the last DML operation.
Solution:
- Added is_ddl_query_without_row_changes() function to identify DDL queries
that don't affect row counts
- Modified both execute_statement() and execute_statement_raw() in SQLite3DB
to return 0 for affected_rows when executing DDL queries
- The fix ensures that affected_rows is reset to 0 for:
CREATE, DROP, ALTER, TRUNCATE, VACUUM, REINDEX, ANALYZE, CHECKPOINT,
PRAGMA, BEGIN, COMMIT, ROLLBACK, SAVEPOINT, RELEASE, EXPLAIN
Testing:
- Created and ran comprehensive tests for DDL detection function
- Verified build completes successfully
- Confirmed the fix correctly identifies DDL vs DML queries
Impact:
This fix resolves the issue where Admin interface incorrectly shows affected
rows for DDL operations, improving the accuracy and reliability of the
ProxySQL Admin interface.
Fixes#4855
- Document addGtidInterval() function with parameter details and reconnection behavior
- Add documentation for readall() method explaining robust error handling
- Document connect_cb() and reader_cb() callbacks with resource management details
- Document generate_mysql_gtid_executed_tables() with multi-phase process explanation
- Focus on functionality, thread safety, and performance improvements
- Provide clear parameter descriptions and return value semantics
- Fix crash by using get_variable_int() instead of get_variable_string() for boolean use_tcp_keepalive variable
- use_tcp_keepalive is a boolean variable, not a string, so get_variable_int() returns 0/1 instead of a string
- Fix syntax errors by removing duplicate code and fixing brace structure
- Add comprehensive Doxygen documentation for both MySQL and PostgreSQL warnings
Resolves assertion failure: "Not existing variable: use_tcp_keepalive"
Resolves: #5212
- Add warnings in flush_mysql_variables___database_to_runtime() when mysql-use_tcp_keepalive=false
- Add warnings in flush_pgsql_variables___database_to_runtime() when pgsql-use_tcp_keepalive=false
- Include comprehensive Doxygen documentation explaining why disabling TCP keepalive is unsafe
- Warn users about potential connection drops when ProxySQL is deployed behind network load balancers
When TCP keepalive is disabled:
- Load balancers may drop idle connections from connection pools
- NAT devices may remove connection state
- Cloud load balancers may terminate connections during idle periods
- Results in sudden connection failures and "connection reset" errors
Resolves: #5212
- This patch was originally added by commit 0a70fd5 and
reverted by 8d1b5b5, prior to the release of `v3.0.3`.
- The following issues are addressed in this update,
- Fix for `use-after-free` issue which occured during CI test.
- Fix for deadlock issue between `GTID_syncer` and `MySQL_Worker`.
Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
Concurrency and Memory Management
* Lock-Free Ref Counting: Replaced global mutex-protected integer reference counts with `std::atomic<uint32_t>` within `PgSQL_STMT_Global_info`, eliminating lock contention during statement referencing.
* Modern Ownership: Adopted std::shared_ptr<const PgSQL_STMT_Global_info> for global and local storage, providing automatic, thread-safe memory and lifecycle management.
* Memory Optimization: Removed redundant auxiliary maps `global_id_to_stmt_names` and `map_stmt_id_to_info` from local and global statement managers respectively, reducing overall memory overhead.
* Optimized Purging: Statement removal logic was simplified for efficiently identifying and cleaning up unused statements.
Hot Path Performance (`BIND`, `DESCRIBE`, `EXECUTE`)
* Bypassed Global Lookups: Local session maps now store the `shared_ptr` directly, removing the need to acquire the global lock and search the global map during hot path operations.
* Direct Refcount Manipulation: Refcount modification functions now operate directly on the passed statement object, eliminating the overhead of searching the global map to find the object pointer based on statement id.
Safety and Protocol Logic (`PARSE`)
* Efficient Statement Reuse: Implemented a **local fast path** check for the unnamed statement (`""`), allowing immediate reuse of an identical query (same hash) upon re-parse, which bypasses global processing and locks.
Cleanup
* Cleaned up and class rename `PgSQL_STMT_Manager_v14` -> `PgSQL_STMT_Manager`.
- Rename and modify test to use MySQL C API mysql_binlog_* functions
- Implement throttled binlog reading with 5 iterations (no limit, 2s, 5s, 20s, 60s targets)
- Add diagnostics for debugging binlog fetch issues
- Set RPL options for file, position, server_id, and non-blocking flag
- Update Makefile to compile with MySQL client library
Problem: In fast forward mode, ProxySQL forwards packets directly from client
to backend without buffering them. If the backend connection closes
unexpectedly (e.g., due to server crash, network failure, or other issues),
ProxySQL immediately closes the client session. This can result in data loss
because the client may have sent additional data that hasn't been fully
transmitted yet, as ProxySQL does not wait for the output buffers to drain.
Solution: Implement a configurable grace period for session closure in fast
forward mode. When the backend closes unexpectedly, instead of closing the
session immediately, ProxySQL waits for a configurable timeout
(fast_forward_grace_close_ms, default 5000ms) to allow any pending client
output data to be sent. During this grace period:
- If the client output buffers become empty, the session closes gracefully.
- If the timeout expires, the session closes anyway to prevent indefinite
hanging.
Changes:
- Added global variable mysql_thread___fast_forward_grace_close_ms (0-3600000ms)
- Added session flags: backend_closed_in_fast_forward, fast_forward_grace_start_time
- Added data stream flag: defer_close_due_to_fast_forward
- Modified MySQL_Data_Stream::read_from_net() to detect backend EOF and initiate
grace close if client buffers are not empty
- Modified MySQL_Session::handler() FAST_FORWARD case to implement grace close
logic with timeout and buffer checks
- Added extensive inline documentation explaining the feature and its mechanics
This prevents data loss in fast forward scenarios while maintaining bounded
session lifetime.
Previously, the parser always tokenized the full command, even when we only
needed to check whether it was a transaction command. Now, it first extracts
the first word to determine relevance and performs full tokenization only
when necessary.
According to MySQL protocol, variable length strings are encoded using
length encoded integers. For reference, see:
- https://dev.mysql.com/doc/dev/mysql-server/9.4.0/page_protocol_com_stmt_execute.html
- https://dev.mysql.com/doc/dev/mysql-server/9.4.0/page_protocol_basic_dt_integers.html#a_protocol_type_int2
The protocol specifies that values greater than 2^24 (16777216) should
be encoded using '0xFE + 8-byte integer'. Yet, in reality MySQL ignores
the upper section of these 8-byte integers, treating them effectively
like '4-bytes'. For the sake of compatibility this commit changes the
decoding behavior for 'COM_STMT_EXECUTE' to match MySQL one. This
different is subtle but important, since in practice MySQL itself
doesn't use the '8 bytes' from the field. This means that connectors
that are compatible with MySQL could find issues when sending these
packets through ProxySQL (like NodeJS 'mysql2' connector which writes
the 8-bytes as a 4-bytes duplication, motivating these changes),
situation that could result in rejection due to malformed packet
detection (or crashes/invalid handling in the worse case scenario).
The previous decoding function is now renamed into
'mysql_decode_length_ll' to honor MySQL naming 'net_field_length_ll'.
For now, this protocol change is limited to 'COM_STMT_EXECUTE'.
caching_sha2_password full authentication is a complex task that
requires a lot of packets being sent and forth between client and
server (ProxySQL in this case). Every packet needs to have an
increased sequence ID (sid) according to protocol.
ProxySQL was incorrectly forgetting to increase the sid when
requesting a full authentication.
For some clients this is not a problem, while other clients will
consider the incorrect sid a serious issue and abort the connection.
This commit ensures that sid is correctly increased when requesting
caching_sha2_password full authentication.
When true, all `min_gtid` query annotations are ignored; see
https://proxysql.com/documentation/query-annotations/ for details.
This is useful on ProxySQL setups with multiple layers, where some
layers mandate GTID-based routing while others don't.
- Backport PQsendPipelineSync to PostgreSQL 16.3, enabling pipeline
synchronization without flushing the send buffer.
- Replace calls to PQPipelineSync in code with PQsendPipelineSync
to use the new functionality.
Accesses by 'stats___pgsql_processlist' to 'myconn->query.ptr' could
lead to invalid memory accesses, as the pointed query could already have
been free by the session after being issued.
Accesses by 'stats___mysql_processlist' to 'myconn->query.ptr' could
lead to invalid memory accesses, as the pointed query could already have
been free by the session after being issued.
- Add new mysql/pgsql variable `processlist_max_query_length`.
- Min: 1K
- Max: 32M
- Default: 2M
- Truncate current query based on the configuration before inserting into
`stats_*_processlist` tables.
- Refactor/fix code related to other processlist configurations.
1. `session_idle_show_processlist` value was not updated in `ProxySQL_Admin.variables`.
2. Pass processlist config as an argument to `MySQL_Threads_Handler::SQL3_Processlist`
instead of using thread-local variables.
Signed-off-by: Wazir Ahmed <wazir@proxysql.com>
This message is dump with each call to 'process_pkt_handshake_response'
printing the updated context. When the verbosity value for module
'debug_mysql_protocol' is >= 5, the stored and client supplied passwords
will be dumped in HEX format, for values < 5, the passwords will be
masked.
Previously, query cache metrics were shared between MySQL and PostgreSQL,
causing both to reflect the same values when performing cache operations.
This change isolates the metrics for each database type.
- Added `backend_pid` and `backend_state` columns to `stats_pgsql_processlist`
to display PostgreSQL backend process ID and connection state.
- Created `stats_pgsql_stat_activity` view on top of `stats_pgsql_processlist`
with column aliases matching PostgreSQL's `pg_stat_activity` for consistency.
These parameters use capitalized names in PostgreSQL for historical reasons.
ProxySQL now sends them using canonical capitalization to ensure client compatibility.
Updated PgSQL_DateStyle_Util::parse_datestyle() to support prefix-based
matching for known tokens (POSTGRES, EURO, NONEURO). This allows variants
like "PostgreSQL", "European", to be recognized as valid inputs.
Centralize escaping/formatting of connection parameters (key='value').
Replace duplicate escape/append/free sequences in connect_start and PgSQL_backend_kill_thread.
Add support for PostgreSQL query cancellation and backend termination
features to allow clients to cancel long-running queries and terminate
connections through the standard PostgreSQL protocol.
Features implemented:
- Intercept pg_backend_pid() queries and return ProxySQL session thread ID
- Intercept pg_terminate_backend() to terminate client connections asynchronously
- Intercept pg_cancel_backend() to cancel queries on backend connections
- Support Cancel Request protocol via separate connection with PID and secret key validation
- Return BackendKeyData message on successful authentication with session thread ID and unique cancel secret key
This enables clients to use standard PostgreSQL cancellation mechanisms
(pg_cancel_backend, pg_terminate_backend, and Cancel Request protocol)
while ProxySQL maintains proper session isolation and maps client requests
to appropriate backend connections.