proxysql

Commit Graph

Author	SHA1	Message	Date
Rene Cannao	ad166c6b8a	docs: Add comprehensive Doxygen documentation for RAG subsystem - Enhanced inline Doxygen comments in RAG_Tool_Handler.h and RAG_Tool_Handler.cpp - Added detailed parameter descriptions, return values, and cross-references - Created Doxyfile for documentation generation - Added documentation summary and guidelines - Documented all RAG tools with their schemas and usage patterns - Added security and performance considerations documentation The RAG subsystem is now fully documented with comprehensive Doxygen comments that provide clear guidance for developers working with the codebase.	3 months ago
Rene Cannao	3daaa5c592	feat: Implement RAG (Retrieval-Augmented Generation) subsystem Adds a complete RAG subsystem to ProxySQL with: - RAG_Tool_Handler implementing all MCP tools for retrieval operations - Database schema with FTS and vector support - FTS, vector, and hybrid search capabilities - Fetch and refetch tools for document/chunk retrieval - Admin tools for monitoring - Configuration variables for RAG parameters - Comprehensive documentation and test scripts Implements v0 deliverables from RAG blueprint: - SQLite schema initialization - Source registry management - MCP tools: search_fts, search_vector, search_hybrid, get_chunks, get_docs, fetch_from_source, admin.stats - Unit/integration tests and examples	3 months ago
Rene Cannao	5b502c0864	feat: Add question learning capability to demo agent Add ability for the demo agent to learn new questions and add them to the catalog, making it smarter over time. Changes: - Added get_last_agent_run_id() function to Discovery_Schema: - Queries agent_runs table for the most recent agent_run_id for a run_id - Returns 0 if no agent runs exist for the schema - Updated llm.question_template_add handler: - Made agent_run_id optional (defaults to 0 when not provided) - When agent_run_id <= 0, auto-fetches last agent_run_id for the schema - Returns helpful error if no agent run exists for the schema - Returns agent_run_id in response for visibility - Updated llm.question_template_add tool schema: - Moved agent_run_id from required to optional parameters - Updated description to explain auto-fetch behavior - Updated demo_agent_claude.sh prompt: - Added llm.question_template_add to available tools - Added Step 4: "Learn from Success" to workflow - Added explicit instruction to ALWAYS LEARN new questions - Added example showing learning workflow - Expanded from 4 steps to 5 steps to include learning Now the demo agent can: 1. Search for existing questions 2. Reuse SQL if a good match exists 3. Generate new SQL if no good match 4. LEARN new questions by adding them to the catalog 5. Present results This enables continuous learning - the more users interact with it, the smarter it becomes.	3 months ago
Rene Cannao	7e522aa2c0	feat: Add schema parameter to run_sql_readonly with per-connection tracking Add optional schema parameter to run_sql_readonly tool that allows queries to be executed against a specific schema, independent of the default schema configured in mcp-mysql_schema. Changes: - Added current_schema field to MySQLConnection structure to track the currently selected schema for each connection in the pool - Added find_connection() helper to find connection wrapper by mysql pointer - Added execute_query_with_schema() function that: - Uses mysql_select_db() instead of 'USE schema' SQL statement - Only calls mysql_select_db() if the requested schema differs from the current schema (optimization to avoid unnecessary switches) - Updates current_schema after successful schema switch - Updated run_sql_readonly handler: - Extracts optional 'schema' parameter - Calls execute_query_with_schema() instead of execute_query() - Returns error response when query fails (instead of success) - Updated tool schema to document the new 'schema' parameter This fixes the issue where queries would run against the default schema (configured in mcp-mysql_schema) instead of the schema being queried, causing "Table doesn't exist" errors when the default schema differs from the discovered schema.	3 months ago
Rene Cannao	ee13e4bf13	feat: Add include_objects parameter to llm_search for complete object retrieval Enhance the llm_search MCP tool to return complete question template data and optionally include full object schemas, reducing the need for additional MCP calls when answering questions. Changes: - Added related_objects column to llm_question_templates table - Updated add_question_template() to accept and store related_objects JSON array - Enhanced fts_search_llm() with include_objects parameter: - LEFT JOIN with llm_question_templates to return example_sql, related_objects, template_json, and confidence - When include_objects=true, fetches full object schemas (columns, indexes) for all related objects in a single batch operation - Added error checking for SQL execution failures - Fixed fts_search_llm() get_object() call to pass schema_name and object_name separately instead of combined object_key - Updated Query_Tool_Handler: - Added is_boolean() handling to json_int() helper to properly convert JSON boolean true/false to int 1/0 - Updated llm.search handler to extract and pass include_objects parameter - Updated llm.question_template_add to extract and pass related_objects - Updated tool schemas to document new parameters This change allows agents to get all necessary schema information in a single llm_search call instead of making multiple catalog_get_object calls, significantly reducing MCP call overhead.	3 months ago
Rene Cannao	5668c86809	fix: Implement FTS indexing for LLM artifacts and fix reserved keyword issue - Rename llm_search_log column from \"limit\" to \"lmt\" to avoid SQL reserved keyword - Add FTS inserts to all LLM artifact upsert functions: - add_question_template(): index question templates for search - add_llm_note(): index notes for search - upsert_llm_summary(): index object summaries for search - upsert_llm_domain(): index domains for search - upsert_llm_metric(): index metrics for search - Remove content='' from fts_llm table to store content directly - Add <functional> header for std::hash usage This fixes the bug where llm_search always returned empty results because the FTS index was never populated.	3 months ago
Rene Cannao	2250b762a3	feat: Add query_tool_calls table to log MCP tool invocations Add query_tool_calls table to Discovery Schema to track all MCP tool invocations via the /mcp/query/ endpoint. Logs: - tool_name: Name of the tool that was called - schema: Schema name (nullable, empty if not applicable) - run_id: Run ID from discovery (nullable, 0 if not applicable) - start_time: Start monotonic time in microseconds - execution_time: Execution duration in microseconds - error: Error message (null if success) Modified files: - Discovery_Schema.cpp: Added table creation and log_query_tool_call function - Discovery_Schema.h: Added function declaration - Query_Tool_Handler.cpp: Added logging after each tool execution	3 months ago
Rene Cannao	77643859e3	feat: Add timing columns to stats_mcp_query_tools_counters Extend the stats_mcp_query_tools_counters table with timing statistics (first_seen, last_seen, sum_time, min_time, max_time) following the same pattern as stats_mysql_query_digest. All timing values are in microseconds using monotonic_time(). New schema: - tool VARCHAR - schema VARCHAR - count INT - first_seen INTEGER (microseconds) - last_seen INTEGER (microseconds) - sum_time INTEGER (microseconds - total execution time) - min_time INTEGER (microseconds - minimum execution time) - max_time INTEGER (microseconds - maximum execution time)	3 months ago
Rene Cannao	fb66af7c1b	feat: Expose MCP catalog database in ProxySQL Admin interface The MCP catalog database is now accessible as the 'mcp_catalog' schema from the ProxySQL Admin interface, enabling direct SQL queries against discovered schemas and LLM memories.	3 months ago
Rene Cannao	35b0b224ff	refactor: Remove mcp-catalog_path variable and hardcode catalog path Remove the mcp-catalog_path configuration variable and hardcode the catalog database path to datadir/mcp_catalog.db for stability. Rationale: The catalog database is session state, not user configuration. Runtime swapping of the catalog could cause tables to be missed and the catalog to fail even if it was succeeding a second earlier. Changes: - Removed catalog_path from mcp_thread_variables_names array - Removed mcp_catalog_path from MCP_Thread variables struct - Removed getter/setter logic for catalog_path - Hardcoded catalog path to GloVars.datadir/mcp_catalog.db in: - ProxySQL_MCP_Server.cpp (Query_Tool_Handler initialization) - Admin_FlushVariables.cpp (MySQL_Tool_Handler reinitialization) - Updated VARIABLES.md to document the hardcoded path - Updated configure_mcp.sh to remove catalog_path configuration - Updated MCP README to remove catalog_path references	3 months ago
Rene Cannao	a816a756d4	feat: Add MCP query tool usage counters to stats schema Add stats_mcp_query_tools_counters and stats_mcp_query_tools_counters_reset tables to track MCP query tool usage statistics. - Added get_tool_usage_stats_resultset() method to Query_Tool_Handler - Defined table schemas in ProxySQL_Admin_Tables_Definitions.h - Registered tables in Admin_Bootstrap.cpp - Added pattern matching in ProxySQL_Admin.cpp - Added stats___mcp_query_tools_counters() in ProxySQL_Admin_Stats.cpp - Fixed friend declaration for track_tool_invocation() - Fixed Discovery_Schema.cpp log_llm_search() to use prepare_v2/finalize	3 months ago
Rene Cannao	623675b369	feat: Add schema name resolver and deprecate direct DB tools - Add resolve_run_id() to map schema names to latest run_id - Update all catalog and LLM tools to accept schema names - Deprecate describe_table, table_profile, column_profile - Deprecate get_constraints, suggest_joins, find_reference_candidates - Keep sample_rows, sample_distinct for data preview	3 months ago
Rene Cannao	1b7335acfe	Fix two-phase discovery documentation and scripts - Add mcp_config.example.json for Claude Code MCP configuration - Fix MCP bridge path in example config (../../proxysql_mcp_stdio_bridge.py) - Update Two_Phase_Discovery_Implementation.md with correct Phase 1/Phase 2 usage - Fix Two_Phase_Discovery_Implementation.md DELETE FROM fts_objects to scope to run_id - Update README.md with two-phase discovery section and multi-agent legacy note - Create static_harvest.sh bash wrapper for Phase 1 - Create two_phase_discovery.py orchestration script with prompts - Add --run-id parameter to skip auto-fetch - Fix RUN_ID placeholder mismatch (<USE_THE_PROVIDED_RUN_ID>) - Fix catalog path default to mcp_catalog.db - Add test_catalog.sh to verify catalog tools work - Fix Discovery_Schema.cpp FTS5 syntax (missing space) - Remove invalid CREATE INDEX on FTS virtual tables - Add MCP tool call logging to track tool usage - Fix Static_Harvester::get_harvest_stats() to accept run_id parameter - Fix DELETE FROM fts_objects to only delete for specific run_id - Update system prompts to say DO NOT call discovery.run_static - Update user prompts to say Phase 1 is already complete - Add --mcp-only flag to restrict Claude Code to MCP tools only - Make FTS table failures non-fatal (check if table exists first) - Add comprehensive documentation for both discovery approaches	3 months ago
Rene Cannao	6f23d5bcd0	feat: Implement two-phase schema discovery architecture Phase 1 (Static/Deterministic): - Add Discovery_Schema: SQLite catalog with deterministic and LLM tables - Add Static_Harvester: MySQL INFORMATION_SCHEMA metadata extraction - Harvest schemas, objects, columns, indexes, foreign keys, view definitions - Compute derived hints: is_time, is_id_like, has_pk, has_fks, has_time - Build quick profiles and FTS5 indexes Phase 2 (LLM Agent): - Add 19 new MCP tools for two-phase discovery - discovery.run_static: Trigger ProxySQL's static harvest - Catalog tools: init, search, get_object, list_objects, get_relationships - Agent tools: run_start, run_finish, event_append - LLM tools: summary_upsert, relationship_upsert, domain_upsert, etc. Files: - include/Discovery_Schema.h, lib/Discovery_Schema.cpp - include/Static_Harvester.h, lib/Static_Harvester.cpp - include/Query_Tool_Handler.h, lib/Query_Tool_Handler.cpp (updated) - lib/Makefile (updated) - scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/prompts/ - scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/two_phase_discovery.py	3 months ago
Rene Cannao	7de3f0c510	feat: Add schema separation to MCP catalog and discovery scope constraint This commit addresses two issues: 1. MCP Catalog Schema Separation: - Add 'schema' column to catalog table for proper isolation - Update all catalog methods (upsert, get, search, list, remove) to accept schema parameter - Update MCP tool handlers and JSON-RPC parameter parsing - Unique constraint changed from (kind, key) to (schema, kind, key) - FTS table updated to include schema column 2. Discovery Prompt Scope Constraint: - Add explicit SCOPE CONSTRAINT section to multi_agent_discovery_prompt.md - Agents now respect Target Schema and skip list_schemas when specified - Prevents analyzing all schemas when only one is targeted Files modified: - include/MySQL_Catalog.h: Add schema parameter to all catalog methods - include/MySQL_Tool_Handler.h: Update wrapper method signatures - lib/MySQL_Catalog.cpp: Implement schema filtering in all operations - lib/MySQL_Tool_Handler.cpp: Update wrapper implementations - lib/Query_Tool_Handler.cpp: Extract schema from JSON-RPC params, update tool descriptions - scripts/mcp/DiscoveryAgent/ClaudeCode_Headless/prompts/multi_agent_discovery_prompt.md: Add scope constraint	3 months ago
Rene Cannao	a3f0bade4e	feat: Convert NL2SQL to generic LLM bridge - Rename NL2SQL_Converter to LLM_Bridge for generic prompt processing - Update MySQL protocol handler from /* NL2SQL: / to / LLM: / - Remove SQL-specific fields (sql_query, confidence, tables_used) - Add GENAI_OP_LLM operation type to GenAI module - Rename all genai_nl2sql_ variables to genai_llm_* - Update AI_Features_Manager to use LLM_Bridge - Deprecate ai_nl2sql_convert MCP tool with error message - LLM bridge now handles any prompt type via MySQL protocol This enables generic LLM access (summarization, code generation, translation, analysis) while preserving infrastructure for future NL2SQL implementation via Web UI + external agents.	3 months ago
Rene Cannao	3fe8a48f70	Fix genai variable handling and add API key masking - Add has_variable() method to GenAI_Threads_Handler for variable validation - Add genai- prefix check in is_valid_global_variable() - Auto-initialize NL2SQL converter when genai-nl2sql_enabled is set to true at runtime - Make init_nl2sql() public to allow runtime initialization - Mask API keys in logs (show only first 2 chars, rest as 'x')	3 months ago
Rene Cannao	4018a0ad3b	fix: Follow MCP pattern for GenAI variables runtime table population Update flush_genai_variables___database_to_runtime() to match the MCP pattern exactly: - Add 'lock' parameter (default true) for flexibility - Use ProxySQL_Admin's wrlock()/wrunlock() instead of GloGATH's - Use consistent variable naming (var_name = name + 6 for 'genai-' prefix) - Follow exact same locking pattern as MCP variables This fixes the issue where runtime_global_variables table was not being populated on startup because the locking pattern was incorrect.	3 months ago
Rene Cannao	527bfed297	fix: Migrate AI variables to GenAI module for proper architecture This commit fixes a serious design flaw where AI configuration variables were not integrated with the ProxySQL admin interface. All ai_* variables have been migrated to the GenAI module as genai-* variables. Changes: - Added 21 new genai_* variables to GenAI_Thread.h structure - Implemented get/set functions for all new variables in GenAI_Thread.cpp - Removed internal variables struct from AI_Features_Manager - AI_Features_Manager now reads from GloGATH instead of internal state - Updated documentation to reference genai-* variables - Fixed debug.cpp assertion for PROXY_DEBUG_NL2SQL and PROXY_DEBUG_ANOMALY Variable mapping: - ai_nl2sql_enabled → genai-nl2sql_enabled - ai_anomaly_detection_enabled → genai-anomaly_enabled - ai_features_enabled → genai-enabled - All other ai_* variables follow the same pattern The flush functions automatically handle all variables in the genai_thread_variables_names array, so database persistence works correctly without additional changes. Related to: https://github.com/ProxySQL/proxysql-vec/pull/13	3 months ago
Rene Cannao	ae4200dbc0	Enhance AI features with improved validation, memory safety, error handling, and performance monitoring - Rename validate_provider_name to validate_provider_format for clarity - Add null checks and error handling for all strdup() operations - Enhance error messages with more context and HTTP status codes - Implement performance monitoring with timing metrics for LLM calls and cache operations - Add comprehensive test coverage for edge cases, retry scenarios, and performance - Extend status variables to track performance metrics - Update MySQL session to report timing information to AI manager	3 months ago
Rene Cannao	8f38b8a577	feat: Add exponential backoff retry for transient LLM failures This commit adds configurable retry logic with exponential backoff for NL2SQL LLM API calls. Changes: - Add retry configuration to NL2SQLRequest (max_retries, retry_backoff_ms, retry_multiplier, retry_max_backoff_ms) - Add is_retryable_error() to identify retryable HTTP/CURL errors - Add sleep_with_jitter() for exponential backoff with 10% jitter - Add call_generic_openai_with_retry() wrapper - Add call_generic_anthropic_with_retry() wrapper - Update NL2SQL_Converter::convert() to use retry wrappers Default retry behavior: - 3 retries with 1000ms initial backoff - 2.0x multiplier, 30000ms max backoff - Retries on empty responses (transient failures) Part of: Phase 3 of NL2SQL improvement plan	3 months ago
Rene Cannao	d0dc36ac0b	feat: Add structured logging with timing and request IDs Add comprehensive structured logging for NL2SQL LLM API calls with request correlation, timing metrics, and detailed error context. Changes: - Add request_id field to NL2SQLRequest with UUID-like auto-generation - Add structured logging macros: * LOG_LLM_REQUEST: Logs URL, model, prompt length with request ID * LOG_LLM_RESPONSE: Logs HTTP status, duration_ms, response preview * LOG_LLM_ERROR: Logs error phase, message, and status code - Update call_generic_openai() signature to accept req_id parameter - Update call_generic_anthropic() signature to accept req_id parameter - Add timing metrics to both LLM call functions using clock_gettime() - Replace existing debug logging with structured logging macros - Update convert() to pass request_id to LLM calls Request IDs are generated as UUID-like strings (e.g., "12345678-9abc-def0-1234-567890abcdef") and are included in all log messages for correlation. This allows tracking a single NL2SQL request through all log lines from request to response. Timing is measured using CLOCK_MONOTONIC for accurate duration tracking of LLM API calls, reported in milliseconds. This provides much better debugging capability when troubleshooting NL2SQL issues, as administrators can now: - Correlate all log lines for a single request - See exact timing of LLM API calls - Identify which phase of processing failed - Track request/response metrics Fixes #2 - Add Structured Logging	3 months ago
Rene Cannao	45e592b623	feat: Add structured error messages with context to NL2SQL Add comprehensive error details to help users debug NL2SQL conversion issues. Changes: - Add error_code, error_details, http_status_code, provider_used fields to NL2SQLResult - Add NL2SQLErrorCode enum with structured error codes: * SUCCESS, ERR_API_KEY_MISSING, ERR_API_KEY_INVALID, ERR_TIMEOUT * ERR_CONNECTION_FAILED, ERR_RATE_LIMITED, ERR_SERVER_ERROR * ERR_EMPTY_RESPONSE, ERR_INVALID_RESPONSE, ERR_SQL_INJECTION_DETECTED * ERR_VALIDATION_FAILED, ERR_UNKNOWN_PROVIDER, ERR_REQUEST_TOO_LARGE - Add nl2sql_error_code_to_string() function for error code conversion - Add format_error_context() helper to create detailed error messages including: * Query (truncated if too long) * Schema name * Provider attempted * Endpoint URL * Specific error message - Add set_error_details() helper to populate error fields - Update error handling in convert() to use new error details - Track provider_used in successful conversions This provides much better debugging information when NL2SQL conversions fail, making it easier to identify misconfigurations and connectivity issues. Fixes #1 - Improve Error Messages	3 months ago
Rene Cannao	36b11223b2	feat: Improve SQL validation with multi-factor scoring Add comprehensive SQL validation with confidence scoring based on: - SQL keyword detection (17 keywords covering DDL/DML/transactions) - Structural validation (balanced parentheses and quotes) - SQL injection pattern detection - Length and quality checks Confidence scoring: - Base 0.4 for valid SQL keyword - +0.15 for balanced parentheses - +0.15 for balanced quotes - +0.1 for minimum length - +0.1 for FROM clause in SELECT statements - +0.1 for no injection patterns - -0.3 penalty for injection patterns detected Low confidence (< 0.5) results are logged with detailed info. Cache storage threshold updated to 0.5 confidence (from implicit valid_sql). This improves detection of malformed or potentially malicious SQL while providing granular confidence scores for downstream use.	3 months ago
Rene Cannao	897d306d2d	Refactor: Simplify NL2SQL to use only generic providers Remove Ollama-specific provider code and use only generic OpenAI-compatible and Anthropic-compatible providers. Ollama is now used via its OpenAI-compatible endpoint at /v1/chat/completions. Changes: - Remove LOCAL_OLLAMA from ModelProvider enum - Remove ai_nl2sql_ollama_model and ai_nl2sql_ollama_url variables - Remove call_ollama() function from LLM_Clients.cpp - Update default configuration to use OpenAI provider with Ollama URL - Update all documentation to reflect generic-only approach Configuration: - ai_nl2sql_provider: 'openai' or 'anthropic' (default: 'openai') - ai_nl2sql_provider_url: endpoint URL (default: Ollama OpenAI-compatible) - ai_nl2sql_provider_model: model name - ai_nl2sql_provider_key: API key (optional for local endpoints) This simplifies the codebase by removing a separate code path for Ollama and aligns with the goal of avoiding provider-specific variables.	3 months ago
Rene Cannao	fec7d64093	feat: Implement NL2SQL vector cache with GenAI embedding generation Implemented semantic caching for NL2SQL using sqlite-vec and GenAI module: Changes to lib/AI_Features_Manager.cpp: - Create virtual vec0 tables for similarity search: * nl2sql_cache_vec for NL2SQL cache * anomaly_patterns_vec for threat patterns * query_history_vec for query history Changes to include/NL2SQL_Converter.h: - Add get_query_embedding() method declaration Changes to lib/NL2SQL_Converter.cpp: - Add GenAI_Thread.h include and GloGATH extern - Implement get_query_embedding() - calls GloGATH->embed_documents() - Implement check_vector_cache() - sqlite-vec KNN search with cosine distance - Implement store_in_vector_cache() - stores embedding and updates vec table - Implement clear_cache() - deletes from both main and vec tables - Implement get_cache_stats() - returns cache entry/hit counts - Add vector_to_json() helper for sqlite-vec MATCH queries Features: - Uses GenAI module (llama-server) for embedding generation - Cosine similarity search via sqlite-vec vec_distance_cosine() - Configurable similarity threshold (ai_nl2sql_cache_similarity_threshold) - Automatic hit counting and timestamp tracking	3 months ago
Rene Cannao	52a70b0b09	feat: Implement AI-based Anomaly Detection for ProxySQL Phase 3: Anomaly Detection Implementation This commit implements a comprehensive multi-stage anomaly detection system for real-time SQL query security analysis. Core Detection Methods: 1. SQL Injection Pattern Detection (lib/Anomaly_Detector.cpp) - Regex-based detection of 11 SQL injection patterns - Suspicious keyword detection (11 patterns) - Covers: tautologies, union-based, comment-based, stacked queries 2. Query Normalization (lib/Anomaly_Detector.cpp:normalize_query) - Converts to lowercase - Removes SQL comments - Replaces string/numeric literals with placeholders - Normalizes whitespace 3. Rate Limiting (lib/Anomaly_Detector.cpp:check_rate_limiting) - Per user/host query rate tracking - Configurable time windows (3600s default) - Auto-block on threshold exceeded - Prevents DoS and brute force attacks 4. Statistical Anomaly Detection (lib/Anomaly_Detector.cpp:check_statistical_anomaly) - Z-score based outlier detection - Abnormal execution time detection (>5s) - Large result set detection (>10000 rows) - Behavioral profiling per user 5. Embedding-based Similarity (lib/Anomaly_Detector.cpp:check_embedding_similarity) - Placeholder for vector similarity search - Framework for sqlite-vec integration - Detects novel attack variations Query Flow Integration: - Added `detect_ai_anomaly()` to MySQL_Session (line 3626) - Integrated after libinjection SQLi detection (line 5150) - Blocks queries when risk threshold exceeded (default: 0.70) - Sends error response with anomaly details Status Variables Added: - `ai_detected_anomalies`: Total anomalies detected - `ai_blocked_queries`: Total queries blocked - Available via: `SELECT * FROM stats_mysql_global` Configuration (defaults): - `enabled`: true - `risk_threshold`: 70 (0-100) - `similarity_threshold`: 85 (0-100) - `rate_limit`: 100 queries/hour - `auto_block`: true - `log_only`: false Detection Pipeline: ``` Query → SQLi Check → AI Anomaly Check → [Block if needed] → Execute (libinjection) (Multi-stage) ``` Files Modified: - include/MySQL_Session.h: Added detect_ai_anomaly() declaration - include/MySQL_Thread.h: Added AI status variables - lib/Anomaly_Detector.cpp: Full implementation (700+ lines) - lib/MySQL_Session.cpp: Integration and query flow - lib/MySQL_Thread.cpp: Status variable definitions Next Steps: - Add unit tests for each detection method - Add integration tests with sample attacks - Add user and developer documentation Related: Phase 1-2 (NL2SQL foundation and testing) Related: Phase 4 (Vector storage for embeddings)	3 months ago
Rene Cannao	3f44229e28	feat: Add MCP AI Tool Handler for NL2SQL with test script Phase 5: MCP Tool Implementation for NL2SQL This commit implements the AI Tool Handler for the MCP (Model Context Protocol) server, exposing NL2SQL functionality as an MCP tool. New Files: - include/AI_Tool_Handler.h: Header for AI_Tool_Handler class - Provides ai_nl2sql_convert tool via MCP protocol - Wraps NL2SQL_Converter and Anomaly_Detector - Inherits from MCP_Tool_Handler base class - lib/AI_Tool_Handler.cpp: Implementation - Implements ai_nl2sql_convert tool execution - Accepts parameters: natural_language (required), schema, context_tables, max_latency_ms, allow_cache - Returns JSON response with sql_query, confidence, explanation, cached, cache_id - scripts/mcp/test_nl2sql_tools.sh: Test script for NL2SQL MCP tool - Tests ai_nl2sql_convert via JSON-RPC over HTTPS - 10 test cases covering SELECT, WHERE, JOIN, aggregation, etc. - Includes error handling test for empty queries - Supports --verbose, --quiet options Modified Files: - include/MCP_Thread.h: Add AI_Tool_Handler forward declaration and pointer - lib/Makefile: Add AI_Tool_Handler.oo to _OBJ_CXX list - lib/ProxySQL_MCP_Server.cpp: Initialize and register AI tool handler - Creates AI_Tool_Handler with GloAI components - Registers /mcp/ai endpoint - Adds cleanup in destructor MCP Tool Details: - Endpoint: /mcp/ai - Tool: ai_nl2sql_convert - Parameters: - natural_language (string, required): Natural language query - schema (string, optional): Database schema name - context_tables (string, optional): Comma-separated table list - max_latency_ms (integer, optional): Max acceptable latency - allow_cache (boolean, optional): Check semantic cache (default: true) Testing: Run the test script with: ./scripts/mcp/test_nl2sql_tools.sh [--verbose] [--quiet] See scripts/mcp/test_nl2sql_tools.sh --help for usage. Related: Phase 1-4 (Documentation, Unit Tests, Integration Tests, E2E Tests) Related: Phase 6-8 (User Docs, Developer Docs, Test Docs)	3 months ago
Rene Cannao	4f45c25945	docs: Add comprehensive doxygen comments to NL2SQL headers and LLM_Clients - Add file-level doxygen documentation with @file, @brief, @date, @version - Add detailed class and method documentation with @param, @return, @note, @see - Document data structures (NL2SQLRequest, NL2SQLResult, ModelProvider) - Add section comments and inline documentation for implementation files - Document all three LLM provider APIs (Ollama, OpenAI, Anthropic)	3 months ago
Rene Cannao	bc4fff12ce	feat: Add NL2SQL query interception in MySQL_Session - Add NL2SQL handler declaration - Add routing for 'NL2SQL:' prefix - Return resultset with generated SQL and metadata	3 months ago
Rene Cannao	147a059781	feat: Add NL2SQL converter with hybrid LLM support - Add NL2SQL_Converter with prompt building and model selection - Add LLM clients for Ollama, OpenAI, Anthropic APIs - Update Makefile for new source files	3 months ago
Rene Cannao	d9346fe64d	feat: Add AI features manager foundation - Add AI_Features_Manager coordinator class - Add AI_Vector_Storage interface (stub) - Add Anomaly_Detector class (stub for Phase 3) - Update includes and main initialization	3 months ago
René Cannaò	313f637cf0	Merge branch 'v3.1-vec' into v3.1-MCP1 Signed-off-by: René Cannaò <rene.cannao@gmail.com>	3 months ago
Rene Cannao	c86a048d9c	Implement MCP multi-endpoint architecture with dedicated tool handlers This commit implements Option 1 (Multiple Tool Handlers) for the MCP module, where each of the 5 endpoints has its own dedicated tool handler with specific tools. ## Architecture Changes - Created MCP_Tool_Handler base class interface for all tool handlers - Each endpoint now has its own dedicated tool handler: - /mcp/config → Config_Tool_Handler (configuration management) - /mcp/query → Query_Tool_Handler (database exploration) - /mcp/admin → Admin_Tool_Handler (administrative operations) - /mcp/cache → Cache_Tool_Handler (cache management) - /mcp/observe → Observe_Tool_Handler (monitoring & metrics) ## New Files Base Interface: - include/MCP_Tool_Handler.h - Base class for all tool handlers Tool Handlers: - include/Config_Tool_Handler.h, lib/Config_Tool_Handler.cpp - include/Query_Tool_Handler.h, lib/Query_Tool_Handler.cpp - include/Admin_Tool_Handler.h, lib/Admin_Tool_Handler.cpp - include/Cache_Tool_Handler.h, lib/Cache_Tool_Handler.cpp - include/Observe_Tool_Handler.h, lib/Observe_Tool_Handler.cpp Documentation: - doc/MCP/Architecture.md - Comprehensive architecture documentation ## Modified Files - include/MCP_Thread.h, lib/MCP_Thread.cpp - Added 5 tool handler pointers - include/MCP_Endpoint.h, lib/MCP_Endpoint.cpp - Use tool_handler base class - lib/ProxySQL_MCP_Server.cpp - Create and pass handlers to endpoints - lib/Makefile - Added new source files ## Implementation Status - Config_Tool_Handler: Functional (get_config, set_config, list_variables, get_status) - Query_Tool_Handler: Functional (wraps MySQL_Tool_Handler, all 18 tools) - Admin_Tool_Handler: Stub implementations (TODO: implement) - Cache_Tool_Handler: Stub implementations (TODO: implement) - Observe_Tool_Handler: Stub implementations (TODO: implement) See GitHub Issue #8 for detailed TODO list. Co-authored-by: Claude <claude@anthropic.com>	3 months ago
Rene Cannao	2e7109d894	Fix lock ordering in flush_mcp_variables___database_to_runtime The crash was caused by incorrect lock ordering. The admin version has: 1. wrlock() (acquire admin lock) 2. Process variables 3. checksum_mutex lock() (acquire checksum lock) 4. flush to runtime + generate checksum 5. checksum_mutex unlock() (release checksum lock) 6. wrunlock() (release admin lock) The MCP version had the wrong order with the checksum_mutex lock outside the wrlock/wrunlock region. This also added the missing 'lock' parameter that exists in the admin version but was missing in MCP. Changes: - Added 'lock' parameter to flush_mcp_variables___database_to_runtime() - Added conditional wrlock()/wrunlock() calls (if lock=true) - Moved checksum generation inside the wrlock/wrunlock region - Updated function signature in header file	3 months ago
Rene Cannao	06aa6d6ef7	Add comprehensive Doxygen documentation for connection pool Added missing documentation for MySQL connection pool implementation: Header (MySQL_Tool_Handler.h): - Added MySQLConnection struct documentation with member descriptions - Added member variable documentation using ///< Doxygen style Implementation (MySQL_Tool_Handler.cpp): - Added Doxygen blocks for close() method - Added Doxygen blocks for init_connection_pool() with detailed behavior - Added Doxygen blocks for get_connection() with thread-safety notes - Added Doxygen blocks for return_connection() with reuse behavior - Added Doxygen blocks for execute_query() with JSON format documentation All new connection pool methods now have complete @brief, @param, and @return documentation following Doxygen conventions.	3 months ago
Rene Cannao	4eab519848	Implement MySQL connection pool for MySQL_Tool_Handler Added built-in connection pool to MySQL_Tool_Handler for direct MySQL connections to backend servers. Changes: - Added MySQLConnection struct with MYSQL* pointer, host, port, in_use flag - Added connection_pool vector, pool_lock mutex, pool_size counter - Implemented init_connection_pool() to create MYSQL connections using mysql_init/mysql_real_connect - Implemented get_connection() and return_connection() with thread-safe locking - Implemented execute_query() helper method for executing SQL and returning JSON results - Updated tool methods to use actual MySQL connections: - list_schemas: Query information_schema.schemata - list_tables: Query information_schema.tables with metadata - describe_table: Query columns, primary keys, indexes - sample_rows: Execute SELECT with LIMIT - sample_distinct: Execute SELECT DISTINCT with GROUP BY - run_sql_readonly: Execute validated SELECT queries - explain_sql: Execute EXPLAIN queries - Fixed MYSQL forward declaration (use typedef struct st_mysql MYSQL) The connection pool creates one connection per configured host:port pair with 5-second timeouts for connect/read/write operations.	3 months ago
Rene Cannao	221ff23991	Add MySQL exploration MCP tools with SQLite catalog Implemented MCP (Model Context Protocol) server providing tools for LLM-based MySQL database exploration: - MySQL_Catalog: SQLite-based catalog for LLM external memory with upsert, get, search, list, merge, delete operations and FTS support - MySQL_Tool_Handler: 17+ database exploration tools with guardrails: * Inventory: list_schemas, list_tables * Structure: describe_table, get_constraints, describe_view * Profiling: table_profile, column_profile * Sampling: sample_rows (max 20), sample_distinct (max 50) * Query: run_sql_readonly (max 200 rows, 2s timeout, SELECT-only) * Relationship: suggest_joins, find_reference_candidates * Catalog: catalog_upsert, catalog_get, catalog_search, catalog_list, catalog_merge, catalog_delete - MCP Module Integration: * Added 6 new configuration variables for MySQL tool handler (mysql_hosts, mysql_ports, mysql_user, mysql_password, mysql_schema, catalog_path) * Added MySQL_Tool_Handler pointer to MCP_Threads_Handler * Implemented tool routing in MCP endpoint for tools/list, tools/describe, and tools/call methods - TAP Tests: Updated to expect 14 MCP variables (was 8) Files: - include/MySQL_Catalog.h, lib/MySQL_Catalog.cpp - include/MySQL_Tool_Handler.h, lib/MySQL_Tool_Handler.cpp - include/MCP_Thread.h, lib/MCP_Thread.cpp - include/MCP_Endpoint.h, lib/MCP_Endpoint.cpp - lib/Makefile, test/tap/tests/mcp_module-t.cpp	3 months ago
Rene Cannao	81c53896bc	Fix MCP module TAP test failures - Add MCP variables to load_save_disk_commands map for LOAD/SAVE commands - Add MCP variable validation in is_valid_global_variable() for SET commands - Implement has_variable() method in MCP_Threads_Handler - Add CHECKSUM command handlers for MCP VARIABLES (DISK/MEMORY/MEM) Test results improved from 28 passed / 16 failed to 49 passed / 3 failed. Remaining 3 failures are test expectation issues (boolean representation).	3 months ago
Rene Cannao	245e61ee86	Make MCP_Threads_Handler a standalone independent class Remove unnecessary inheritance from MySQL_Threads_Handler. The MCP module should be independent and not depend on MySQL/PostgreSQL thread handlers. Changes: - MCP_Threads_Handler now manages its own pthread_rwlock_t for synchronization - Simplified init() signature (removed unused num/stack parameters) - Added ProxySQL_Main_init_MCP_module() call in main initialization phase - Include only standard C++ headers (pthread.h, cstring, cstdlib)	3 months ago
Rene Cannao	87fff9e046	Add MCP (Model Context Protocol) module skeleton Add new MCP module supporting multiple MCP server endpoints over HTTPS with JSON-RPC 2.0 protocol skeleton. Each endpoint (/mcp/config, /mcp/observe, /mcp/query, /mcp/admin, /mcp/cache) is a distinct MCP server with its own authentication configuration. Features: - HTTPS server using existing ProxySQL TLS certificates - JSON-RPC 2.0 skeleton implementation (actual protocol TBD) - 5 MCP endpoints with per-endpoint auth configuration - LOAD/SAVE MCP VARIABLES admin commands - Configuration file support (mcp_variables section) Implementation follows GenAI module pattern: - MCP_Threads_Handler: Main module handler with variable management - ProxySQL_MCP_Server: HTTPS server wrapper using libhttpserver - MCP_JSONRPC_Resource: Base endpoint class with JSON-RPC skeleton	3 months ago
Rene Cannao	db2485be37	Add comprehensive doxygen documentation to GenAI async module This commit adds extensive doxygen-format documentation to all key functions in the GenAI async module to improve code maintainability and API clarity. Documented functions: - lib/GenAI_Thread.cpp: - unregister_client() - cleanup flow and usage - call_llama_batch_embedding() - HTTP client with JSON format - call_llama_rerank() - HTTP client with JSON format - execute_sql_for_documents() - stub for document_from_sql - process_json_query() - autonomous JSON query processing - lib/MySQL_Session.cpp: - genai_send_async() - async flow and error handling - handle_genai_response() - response handling flow - genai_cleanup_request() - resource cleanup details - check_genai_events() - main loop integration Enhanced header documentation: - GenAI_RequestHeader - communication flow details - GenAI_ResponseHeader - response format details - register_client() - registration flow - unregister_client() - cleanup flow - embed_documents() - BLOCKING warning - rerank_documents() - BLOCKING warning - process_json_query() - supported formats All documentation includes: - @brief descriptions - @param parameter details - @return return value explanations - @note important warnings and usage notes - @see cross-references to related functions - Detailed workflow descriptions - Error handling details - Memory management notes	3 months ago
Rene Cannao	8405027124	Integrate GenAI async event handling into main MySQL session loop - Add check_genai_events() function for non-blocking epoll_wait on GenAI response fds - Integrate GenAI event checking into main handler() WAITING_CLIENT_DATA case - Add goto handler_again to process multiple GenAI responses in one iteration The async GenAI architecture is now fully integrated. MySQL threads no longer block when processing GENAI: queries - they send requests asynchronously via socketpair and continue processing other queries while GenAI workers handle the embedding/reranking operations.	3 months ago
Rene Cannao	0ff2e38e22	Implement async GenAI module with socketpair-based non-blocking architecture - Add GenAI_RequestHeader and GenAI_ResponseHeader protocol structures for socketpair communication - Implement GenAI listener_loop to read requests from epoll and queue to workers - Implement GenAI worker_loop to process requests and send responses via socketpair - Add GenAI_PendingRequest state management to MySQL_Session/Base_Session - Implement MySQL_Session async handlers: genai_send_async(), handle_genai_response(), genai_cleanup_request() - Modify MySQL_Session genai handler to use async path when epoll is available - Initialize GenAI epoll fd in Base_Session::init() This completes the async architecture that was planned but never fully implemented (previously had only placeholder comments). The GenAI module now processes requests asynchronously without blocking MySQL threads.	3 months ago
Rene Cannao	a82f58e22b	Refactor GenAI module for autonomous JSON query processing Move all JSON parsing and operation routing logic from MySQL_Session to GenAI module. MySQL_Session now simply passes GENAI: queries to the GenAI module via process_json_query(), which handles everything autonomously. This simplifies the architecture and achieves better separation of concerns: - MySQL_Session: Detects GENAI: prefix and forwards to GenAI module - GenAI module: Handles JSON parsing, operation routing, and result formatting Changes: - GenAI_Thread.h: Add GENAI_OP_JSON operation type, json_query field, and process_json_query() method declaration - GenAI_Thread.cpp: Implement process_json_query() with embed/rerank support and document_from_sql framework (stubbed for future MySQL connection handling) - MySQL_Session.cpp: Simplify genai handler to just call process_json_query() and parse JSON result (reduces net code by ~215 lines)	3 months ago
Rene Cannao	cc3e97b7b8	Merge EMBED and RERANK into unified GENAI: query syntax This commit refactors the experimental GenAI query syntax to use a single GENAI: keyword with type-based operations instead of separate EMBED: and RERANK: keywords. Changes: - Replace EMBED: and RERANK: detection with unified GENAI: detection - Merge genai_embedding and genai_rerank handlers into single genai handler - Add 'type' field to operation JSON ("embed" or "rerank") - Add 'columns' field for rerank operation (2 or 3, default 3) - columns=2: Returns only index and score - columns=3: Returns index, score, and document (default) Old syntax: EMBED: ["doc1", "doc2"] RERANK: {"query": "...", "documents": [...], "top_n": 5} New syntax: GENAI: {"type": "embed", "documents": ["doc1", "doc2"]} GENAI: {"type": "rerank", "query": "...", "documents": [...], "top_n": 5, "columns": 2} This provides a cleaner, more extensible API for future GenAI operations.	3 months ago
Rene Cannao	39939f598b	Add experimental GenAI RERANK: query support for MySQL This commit adds experimental support for reranking documents directly from MySQL queries using a special RERANK: syntax. Changes: - Add handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY___genai_rerank() - Add RERANK: query detection alongside EMBED: detection - Implement JSON parsing for query, documents array, and optional top_n - Build resultset with index, score, and document columns - Use MySQL ERR_Packet for error handling Query format: RERANK: {"query": "search query", "documents": ["doc1", "doc2", ...], "top_n": 5} Result format: 1 row per result, 3 columns (index, score, document)	3 months ago
Rene Cannao	253591d262	Add experimental GenAI EMBED: query support for MySQL This commit adds experimental support for generating embeddings directly from MySQL queries using a special EMBED: syntax. Changes: - Add MYDS_INTERNAL_GENAI to MySQL_DS_type enum for GenAI connections - Add handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY___genai_embedding() - Implement EMBED: query detection and JSON parsing for document arrays - Build CSV resultset with embeddings (1 row per document, 1 column) - Add myconn NULL check in MySQL_Thread for INTERNAL_GENAI type - Add "debug_genai" name to debug module array - Remove HAVE_LIBCURL checks (libcurl is always statically linked) - Use static curl header: "curl/curl.h" instead of <curl/curl.h> - Remove curl_global_cleanup() from GenAI module (should only be in main()) Query format: EMBED: ["doc1", "doc2", ...] Result format: 1 row per document, 1 column with CSV embeddings Error handling uses MySQL ERR_Packet instead of resultsets.	3 months ago
Rene Cannao	1da9e384d2	Add poll() fallback for GenAI module when epoll is not available This change adds compile-time detection and fallback to poll() on systems that don't support epoll(), improving portability across different platforms. Header changes (include/GenAI_Thread.h): - Make sys/epoll.h include conditional on #ifdef epoll_create1 Implementation changes (lib/GenAI_Thread.cpp): - Add poll.h include for poll() support - Add EPOLL_CREATE compatibility macro (epoll_create1 or epoll_create) - Add #include <poll.h> for poll() support - Update init() to use pipe() for wakeup when epoll is not available - Update register_client() to skip epoll_ctl when epoll is not available - Update unregister_client() to skip epoll_ctl when epoll is not available - Update listener_loop() to use poll() when epoll is not available The compile-time detection works by checking if epoll_create1 is defined (Linux-specific glibc function since 2.9). On systems without epoll, the code falls back to using poll() with a pipe for wakeup signaling.	3 months ago
Rene Cannao	960704066d	Implement real GenAI module with embedding and rerank support Header changes (include/GenAI_Thread.h): - Add GenAI_EmbeddingResult, GenAI_RerankResult, GenAI_RerankResultArray structs - Add GenAI_Document, GenAI_Request structures for internal queue - Add 5 configuration variables: genai_threads, genai_embedding_uri, genai_rerank_uri, genai_embedding_timeout_ms, genai_rerank_timeout_ms - Add status variables: threads_initialized, active_requests, completed_requests, failed_requests - Add public API methods: embed_documents(), rerank_documents() - Add client management: register_client(), unregister_client() - Add threading components: worker threads, listener thread, epoll Implementation changes (lib/GenAI_Thread.cpp): - Implement move constructors/destructors for result structures - Initialize default values for variables (threads=4, embedding port 8013, rerank port 8012, timeout 30s) - Implement get_variable/set_variable with validation for all 5 variables - Implement call_llama_batch_embedding() using libcurl - Implement call_llama_rerank() using libcurl - Implement embed_documents() public API (single or batch) - Implement rerank_documents() public API with top_n parameter - Implement register_client() for socket pair integration - Implement listener_loop() and worker_loop() for async processing - Add proper error handling and status tracking Debug integration (include/proxysql_structs.h): - Add PROXY_DEBUG_GENAI to debug_module enum	3 months ago

1 2 3 4 5 ...

1993 Commits (ad166c6b8a3a30d61ddd0dbbfc0b7b45f32f4df6)