14 KiB
LLM Bridge Architecture
System Overview
Client Query (NL2SQL: ...)
↓
MySQL_Session (detects prefix)
↓
Convert to JSON: {"type": "nl2sql", "query": "...", "schema": "..."}
↓
GenAI Module (async via socketpair)
├─ GenAI worker thread processes request
└─ AI_Features_Manager::get_nl2sql()
↓
LLM_Bridge::convert()
├─ check_vector_cache() ← sqlite-vec similarity search
├─ build_prompt() ← Schema context via MySQL_Tool_Handler
├─ select_model() ← Ollama/OpenAI/Anthropic selection
├─ call_llm_api() ← libcurl HTTP request
└─ validate_sql() ← Keyword validation
↓
Async response back to MySQL_Session
↓
Return Resultset (text_response, confidence, ...)
Important: NL2SQL uses an asynchronous, non-blocking architecture. The MySQL thread is not blocked while waiting for the LLM response. The request is sent via socketpair to the GenAI module, which processes it in a worker thread and delivers the result asynchronously.
Async Flow Details
-
MySQL Thread (non-blocking):
- Detects
NL2SQL:prefix - Constructs JSON:
{"type": "nl2sql", "query": "...", "schema": "..."} - Creates socketpair for async communication
- Sends request to GenAI module immediately
- Returns to handle other queries
- Detects
-
GenAI Worker Thread:
- Receives request via socketpair
- Calls
process_json_query()with nl2sql operation type - Invokes
LLM_Bridge::convert() - Processes LLM response (HTTP via libcurl)
- Sends result back via socketpair
-
Response Delivery:
- MySQL thread receives notification via epoll
- Retrieves result from socketpair
- Builds resultset and sends to client
Components
1. LLM_Bridge
Location: include/LLM_Bridge.h, lib/LLM_Bridge.cpp
Main class coordinating the NL2SQL conversion pipeline.
Key Methods:
convert(): Main entry point for conversioncheck_vector_cache(): Semantic similarity searchbuild_prompt(): Construct LLM prompt with schema contextselect_model(): Choose best LLM providercall_ollama(),call_openai(),call_anthropic(): LLM API calls
Configuration:
struct {
bool enabled;
char* query_prefix; // Default: "NL2SQL:"
char* model_provider; // Default: "ollama"
char* ollama_model; // Default: "llama3.2"
char* openai_model; // Default: "gpt-4o-mini"
char* anthropic_model; // Default: "claude-3-haiku"
int cache_similarity_threshold; // Default: 85
int timeout_ms; // Default: 30000
char* openai_key;
char* anthropic_key;
bool prefer_local;
} config;
2. LLM_Clients
Location: lib/LLM_Clients.cpp
HTTP clients for each LLM provider using libcurl.
Ollama (Local)
Endpoint: POST http://localhost:11434/api/generate
Request Format:
{
"model": "llama3.2",
"prompt": "Convert to SQL: Show top customers",
"stream": false,
"options": {
"temperature": 0.1,
"num_predict": 500
}
}
Response Format:
{
"response": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
"model": "llama3.2",
"total_duration": 123456789
}
OpenAI (Cloud)
Endpoint: POST https://api.openai.com/v1/chat/completions
Headers:
Content-Type: application/jsonAuthorization: Bearer sk-...
Request Format:
{
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "You are a SQL expert..."},
{"role": "user", "content": "Convert to SQL: Show top customers"}
],
"temperature": 0.1,
"max_tokens": 500
}
Response Format:
{
"choices": [{
"message": {
"content": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
"role": "assistant"
},
"finish_reason": "stop"
}],
"usage": {"total_tokens": 123}
}
Anthropic (Cloud)
Endpoint: POST https://api.anthropic.com/v1/messages
Headers:
Content-Type: application/jsonx-api-key: sk-ant-...anthropic-version: 2023-06-01
Request Format:
{
"model": "claude-3-haiku-20240307",
"max_tokens": 500,
"messages": [
{"role": "user", "content": "Convert to SQL: Show top customers"}
],
"system": "You are a SQL expert...",
"temperature": 0.1
}
Response Format:
{
"content": [{"type": "text", "text": "SELECT * FROM customers..."}],
"model": "claude-3-haiku-20240307",
"usage": {"input_tokens": 10, "output_tokens": 20}
}
3. Vector Cache
Location: Uses SQLite3DB with sqlite-vec extension
Tables:
-- Cache entries
CREATE TABLE llm_cache (
id INTEGER PRIMARY KEY AUTOINCREMENT,
natural_language TEXT NOT NULL,
text_response TEXT NOT NULL,
model_provider TEXT,
confidence REAL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
-- Virtual table for similarity search
CREATE VIRTUAL TABLE llm_cache_vec USING vec0(
embedding FLOAT[1536], -- Dimension depends on embedding model
id INTEGER PRIMARY KEY
);
Similarity Search:
SELECT nc.text_response, nc.confidence, distance
FROM llm_cache_vec
JOIN llm_cache nc ON llm_cache_vec.id = nc.id
WHERE embedding MATCH ?
AND k = 10 -- Return top 10 matches
ORDER BY distance
LIMIT 1;
4. MySQL_Session Integration
Location: lib/MySQL_Session.cpp (around line ~6867)
Query interception flow:
- Detect
NL2SQL:prefix in query - Extract natural language text
- Call
GloAI->get_nl2sql()->convert() - Return generated SQL as resultset
- User can review and execute
5. AI_Features_Manager
Location: include/AI_Features_Manager.h, lib/AI_Features_Manager.cpp
Coordinates all AI features including NL2SQL.
Responsibilities:
- Initialize vector database
- Create and manage LLM_Bridge instance
- Handle configuration variables with
genai_llm_prefix - Provide thread-safe access to components
Flow Diagrams
Conversion Flow
┌─────────────────┐
│ NL2SQL Request │
└────────┬────────┘
│
▼
┌─────────────────────────┐
│ Check Vector Cache │
│ - Generate embedding │
│ - Similarity search │
└────────┬────────────────┘
│
┌────┴────┐
│ Cache │ No ───────────────┐
│ Hit? │ │
└────┬────┘ │
│ Yes │
▼ │
Return Cached ▼
┌──────────────────┐ ┌─────────────────┐
│ Build Prompt │ │ Select Model │
│ - System role │ │ - Latency │
│ - Schema context │ │ - Preference │
│ - User query │ │ - API keys │
└────────┬─────────┘ └────────┬────────┘
│ │
└─────────┬───────────────┘
▼
┌──────────────────┐
│ Call LLM API │
│ - libcurl HTTP │
│ - JSON parse │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Validate SQL │
│ - Keyword check │
│ - Clean output │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Store in Cache │
│ - Embed query │
│ - Save result │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Return Result │
│ - text_response │
│ - confidence │
│ - explanation │
└──────────────────┘
Model Selection Logic
┌─────────────────────────────────┐
│ Start: Select Model │
└────────────┬────────────────────┘
│
▼
┌─────────────────────┐
│ max_latency_ms < │──── Yes ────┐
│ 500ms? │ │
└────────┬────────────┘ │
│ No │
▼ │
┌─────────────────────┐ │
│ Check provider │ │
│ preference │ │
└────────┬────────────┘ │
│ │
┌──────┴──────┐ │
│ │ │
▼ ▼ │
OpenAI Anthropic Ollama
│ │ │
▼ ▼ │
┌─────────┐ ┌─────────┐ ┌─────────┐
│ API key │ │ API key │ │ Return │
│ set? │ │ set? │ │ OLLAMA │
└────┬────┘ └────┬────┘ └─────────┘
│ │
Yes Yes
│ │
└──────┬─────┘
│
▼
┌──────────────┐
│ Return cloud │
│ provider │
└──────────────┘
Data Structures
LLM BridgeRequest
struct NL2SQLRequest {
std::string natural_language; // Input query
std::string schema_name; // Current schema
int max_latency_ms; // Latency requirement
bool allow_cache; // Enable cache lookup
std::vector<std::string> context_tables; // Optional table hints
};
LLM BridgeResult
struct NL2SQLResult {
std::string text_response; // Generated SQL
float confidence; // 0.0-1.0 score
std::string explanation; // Model info
std::vector<std::string> tables_used; // Referenced tables
bool cached; // From cache
int64_t cache_id; // Cache entry ID
};
Configuration Management
Variable Namespacing
All LLM variables use genai_llm_ prefix:
genai_llm_enabled
genai_llm_query_prefix
genai_llm_model_provider
genai_llm_ollama_model
genai_llm_openai_model
genai_llm_anthropic_model
genai_llm_cache_similarity_threshold
genai_llm_timeout_ms
genai_llm_openai_key
genai_llm_anthropic_key
genai_llm_prefer_local
Variable Persistence
Runtime (memory)
↑
| LOAD MYSQL VARIABLES TO RUNTIME
|
| SET genai_llm_... = 'value'
|
| SAVE MYSQL VARIABLES TO DISK
↓
Disk (config file)
Thread Safety
- LLM_Bridge: NOT thread-safe by itself
- AI_Features_Manager: Provides thread-safe access via
wrlock()/wrunlock() - Vector Cache: Thread-safe via SQLite mutex
Error Handling
Error Categories
-
LLM API Errors: Timeout, connection failure, auth failure
- Fallback: Try next available provider
- Return: Empty SQL with error in explanation
-
SQL Validation Failures: Doesn't look like SQL
- Return: SQL with warning comment
- Confidence: Low (0.3)
-
Cache Errors: Database failures
- Fallback: Continue without cache
- Log: Warning in ProxySQL log
Logging
All NL2SQL operations log to proxysql.log:
NL2SQL: Converting query: Show top customers
NL2SQL: Selecting local Ollama due to latency constraint
NL2SQL: Calling Ollama with model: llama3.2
NL2SQL: Conversion complete. Confidence: 0.85
Performance Considerations
Optimization Strategies
- Caching: Enable for repeated queries
- Local First: Prefer Ollama for lower latency
- Timeout: Set appropriate
genai_llm_timeout_ms - Batch Requests: Not yet implemented (planned)
Resource Usage
- Memory: Vector cache grows with usage
- Network: HTTP requests for each cache miss
- CPU: Embedding generation for cache entries
Future Enhancements
- Phase 3: Full vector cache implementation
- Phase 3: Schema context retrieval via MySQL_Tool_Handler
- Phase 4: Async conversion API
- Phase 5: Batch query conversion
- Phase 6: Custom fine-tuned models
See Also
- README.md - User documentation
- API.md - Complete API reference
- TESTING.md - Testing guide