# LLM Bridge Architecture ## System Overview ``` Client Query (NL2SQL: ...) ↓ MySQL_Session (detects prefix) ↓ Convert to JSON: {"type": "nl2sql", "query": "...", "schema": "..."} ↓ GenAI Module (async via socketpair) ├─ GenAI worker thread processes request └─ AI_Features_Manager::get_nl2sql() ↓ LLM_Bridge::convert() ├─ check_vector_cache() ← sqlite-vec similarity search ├─ build_prompt() ← Schema context via MySQL_Tool_Handler ├─ select_model() ← Ollama/OpenAI/Anthropic selection ├─ call_llm_api() ← libcurl HTTP request └─ validate_sql() ← Keyword validation ↓ Async response back to MySQL_Session ↓ Return Resultset (text_response, confidence, ...) ``` **Important**: NL2SQL uses an **asynchronous, non-blocking architecture**. The MySQL thread is not blocked while waiting for the LLM response. The request is sent via socketpair to the GenAI module, which processes it in a worker thread and delivers the result asynchronously. ## Async Flow Details 1. **MySQL Thread** (non-blocking): - Detects `NL2SQL:` prefix - Constructs JSON: `{"type": "nl2sql", "query": "...", "schema": "..."}` - Creates socketpair for async communication - Sends request to GenAI module immediately - Returns to handle other queries 2. **GenAI Worker Thread**: - Receives request via socketpair - Calls `process_json_query()` with nl2sql operation type - Invokes `LLM_Bridge::convert()` - Processes LLM response (HTTP via libcurl) - Sends result back via socketpair 3. **Response Delivery**: - MySQL thread receives notification via epoll - Retrieves result from socketpair - Builds resultset and sends to client ## Components ### 1. LLM_Bridge **Location**: `include/LLM_Bridge.h`, `lib/LLM_Bridge.cpp` Main class coordinating the NL2SQL conversion pipeline. **Key Methods:** - `convert()`: Main entry point for conversion - `check_vector_cache()`: Semantic similarity search - `build_prompt()`: Construct LLM prompt with schema context - `select_model()`: Choose best LLM provider - `call_ollama()`, `call_openai()`, `call_anthropic()`: LLM API calls **Configuration:** ```cpp struct { bool enabled; char* query_prefix; // Default: "NL2SQL:" char* model_provider; // Default: "ollama" char* ollama_model; // Default: "llama3.2" char* openai_model; // Default: "gpt-4o-mini" char* anthropic_model; // Default: "claude-3-haiku" int cache_similarity_threshold; // Default: 85 int timeout_ms; // Default: 30000 char* openai_key; char* anthropic_key; bool prefer_local; } config; ``` ### 2. LLM_Clients **Location**: `lib/LLM_Clients.cpp` HTTP clients for each LLM provider using libcurl. #### Ollama (Local) **Endpoint**: `POST http://localhost:11434/api/generate` **Request Format:** ```json { "model": "llama3.2", "prompt": "Convert to SQL: Show top customers", "stream": false, "options": { "temperature": 0.1, "num_predict": 500 } } ``` **Response Format:** ```json { "response": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10", "model": "llama3.2", "total_duration": 123456789 } ``` #### OpenAI (Cloud) **Endpoint**: `POST https://api.openai.com/v1/chat/completions` **Headers:** - `Content-Type: application/json` - `Authorization: Bearer sk-...` **Request Format:** ```json { "model": "gpt-4o-mini", "messages": [ {"role": "system", "content": "You are a SQL expert..."}, {"role": "user", "content": "Convert to SQL: Show top customers"} ], "temperature": 0.1, "max_tokens": 500 } ``` **Response Format:** ```json { "choices": [{ "message": { "content": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10", "role": "assistant" }, "finish_reason": "stop" }], "usage": {"total_tokens": 123} } ``` #### Anthropic (Cloud) **Endpoint**: `POST https://api.anthropic.com/v1/messages` **Headers:** - `Content-Type: application/json` - `x-api-key: sk-ant-...` - `anthropic-version: 2023-06-01` **Request Format:** ```json { "model": "claude-3-haiku-20240307", "max_tokens": 500, "messages": [ {"role": "user", "content": "Convert to SQL: Show top customers"} ], "system": "You are a SQL expert...", "temperature": 0.1 } ``` **Response Format:** ```json { "content": [{"type": "text", "text": "SELECT * FROM customers..."}], "model": "claude-3-haiku-20240307", "usage": {"input_tokens": 10, "output_tokens": 20} } ``` ### 3. Vector Cache **Location**: Uses `SQLite3DB` with sqlite-vec extension **Tables:** ```sql -- Cache entries CREATE TABLE llm_cache ( id INTEGER PRIMARY KEY AUTOINCREMENT, natural_language TEXT NOT NULL, text_response TEXT NOT NULL, model_provider TEXT, confidence REAL, created_at DATETIME DEFAULT CURRENT_TIMESTAMP ); -- Virtual table for similarity search CREATE VIRTUAL TABLE llm_cache_vec USING vec0( embedding FLOAT[1536], -- Dimension depends on embedding model id INTEGER PRIMARY KEY ); ``` **Similarity Search:** ```sql SELECT nc.text_response, nc.confidence, distance FROM llm_cache_vec JOIN llm_cache nc ON llm_cache_vec.id = nc.id WHERE embedding MATCH ? AND k = 10 -- Return top 10 matches ORDER BY distance LIMIT 1; ``` ### 4. MySQL_Session Integration **Location**: `lib/MySQL_Session.cpp` (around line ~6867) Query interception flow: 1. Detect `NL2SQL:` prefix in query 2. Extract natural language text 3. Call `GloAI->get_nl2sql()->convert()` 4. Return generated SQL as resultset 5. User can review and execute ### 5. AI_Features_Manager **Location**: `include/AI_Features_Manager.h`, `lib/AI_Features_Manager.cpp` Coordinates all AI features including NL2SQL. **Responsibilities:** - Initialize vector database - Create and manage LLM_Bridge instance - Handle configuration variables with `genai_llm_` prefix - Provide thread-safe access to components ## Flow Diagrams ### Conversion Flow ``` ┌─────────────────┐ │ NL2SQL Request │ └────────┬────────┘ │ ▼ ┌─────────────────────────┐ │ Check Vector Cache │ │ - Generate embedding │ │ - Similarity search │ └────────┬────────────────┘ │ ┌────┴────┐ │ Cache │ No ───────────────┐ │ Hit? │ │ └────┬────┘ │ │ Yes │ ▼ │ Return Cached ▼ ┌──────────────────┐ ┌─────────────────┐ │ Build Prompt │ │ Select Model │ │ - System role │ │ - Latency │ │ - Schema context │ │ - Preference │ │ - User query │ │ - API keys │ └────────┬─────────┘ └────────┬────────┘ │ │ └─────────┬───────────────┘ ▼ ┌──────────────────┐ │ Call LLM API │ │ - libcurl HTTP │ │ - JSON parse │ └────────┬─────────┘ │ ▼ ┌──────────────────┐ │ Validate SQL │ │ - Keyword check │ │ - Clean output │ └────────┬─────────┘ │ ▼ ┌──────────────────┐ │ Store in Cache │ │ - Embed query │ │ - Save result │ └────────┬─────────┘ │ ▼ ┌──────────────────┐ │ Return Result │ │ - text_response │ │ - confidence │ │ - explanation │ └──────────────────┘ ``` ### Model Selection Logic ``` ┌─────────────────────────────────┐ │ Start: Select Model │ └────────────┬────────────────────┘ │ ▼ ┌─────────────────────┐ │ max_latency_ms < │──── Yes ────┐ │ 500ms? │ │ └────────┬────────────┘ │ │ No │ ▼ │ ┌─────────────────────┐ │ │ Check provider │ │ │ preference │ │ └────────┬────────────┘ │ │ │ ┌──────┴──────┐ │ │ │ │ ▼ ▼ │ OpenAI Anthropic Ollama │ │ │ ▼ ▼ │ ┌─────────┐ ┌─────────┐ ┌─────────┐ │ API key │ │ API key │ │ Return │ │ set? │ │ set? │ │ OLLAMA │ └────┬────┘ └────┬────┘ └─────────┘ │ │ Yes Yes │ │ └──────┬─────┘ │ ▼ ┌──────────────┐ │ Return cloud │ │ provider │ └──────────────┘ ``` ## Data Structures ### LLM BridgeRequest ```cpp struct NL2SQLRequest { std::string natural_language; // Input query std::string schema_name; // Current schema int max_latency_ms; // Latency requirement bool allow_cache; // Enable cache lookup std::vector context_tables; // Optional table hints }; ``` ### LLM BridgeResult ```cpp struct NL2SQLResult { std::string text_response; // Generated SQL float confidence; // 0.0-1.0 score std::string explanation; // Model info std::vector tables_used; // Referenced tables bool cached; // From cache int64_t cache_id; // Cache entry ID }; ``` ## Configuration Management ### Variable Namespacing All LLM variables use `genai_llm_` prefix: ``` genai_llm_enabled genai_llm_query_prefix genai_llm_model_provider genai_llm_ollama_model genai_llm_openai_model genai_llm_anthropic_model genai_llm_cache_similarity_threshold genai_llm_timeout_ms genai_llm_openai_key genai_llm_anthropic_key genai_llm_prefer_local ``` ### Variable Persistence ``` Runtime (memory) ↑ | LOAD MYSQL VARIABLES TO RUNTIME | | SET genai_llm_... = 'value' | | SAVE MYSQL VARIABLES TO DISK ↓ Disk (config file) ``` ## Thread Safety - **LLM_Bridge**: NOT thread-safe by itself - **AI_Features_Manager**: Provides thread-safe access via `wrlock()`/`wrunlock()` - **Vector Cache**: Thread-safe via SQLite mutex ## Error Handling ### Error Categories 1. **LLM API Errors**: Timeout, connection failure, auth failure - Fallback: Try next available provider - Return: Empty SQL with error in explanation 2. **SQL Validation Failures**: Doesn't look like SQL - Return: SQL with warning comment - Confidence: Low (0.3) 3. **Cache Errors**: Database failures - Fallback: Continue without cache - Log: Warning in ProxySQL log ### Logging All NL2SQL operations log to `proxysql.log`: ``` NL2SQL: Converting query: Show top customers NL2SQL: Selecting local Ollama due to latency constraint NL2SQL: Calling Ollama with model: llama3.2 NL2SQL: Conversion complete. Confidence: 0.85 ``` ## Performance Considerations ### Optimization Strategies 1. **Caching**: Enable for repeated queries 2. **Local First**: Prefer Ollama for lower latency 3. **Timeout**: Set appropriate `genai_llm_timeout_ms` 4. **Batch Requests**: Not yet implemented (planned) ### Resource Usage - **Memory**: Vector cache grows with usage - **Network**: HTTP requests for each cache miss - **CPU**: Embedding generation for cache entries ## Future Enhancements - **Phase 3**: Full vector cache implementation - **Phase 3**: Schema context retrieval via MySQL_Tool_Handler - **Phase 4**: Async conversion API - **Phase 5**: Batch query conversion - **Phase 6**: Custom fine-tuned models ## See Also - [README.md](README.md) - User documentation - [API.md](API.md) - Complete API reference - [TESTING.md](TESTING.md) - Testing guide