You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
proxysql/doc/LLM_Bridge/ARCHITECTURE.md

464 lines
14 KiB

# LLM Bridge Architecture
## System Overview
```
Client Query (NL2SQL: ...)
MySQL_Session (detects prefix)
Convert to JSON: {"type": "nl2sql", "query": "...", "schema": "..."}
GenAI Module (async via socketpair)
├─ GenAI worker thread processes request
└─ AI_Features_Manager::get_nl2sql()
LLM_Bridge::convert()
├─ check_vector_cache() ← sqlite-vec similarity search
├─ build_prompt() ← Schema context via MySQL_Tool_Handler
├─ select_model() ← Ollama/OpenAI/Anthropic selection
├─ call_llm_api() ← libcurl HTTP request
└─ validate_sql() ← Keyword validation
Async response back to MySQL_Session
Return Resultset (text_response, confidence, ...)
```
**Important**: NL2SQL uses an **asynchronous, non-blocking architecture**. The MySQL thread is not blocked while waiting for the LLM response. The request is sent via socketpair to the GenAI module, which processes it in a worker thread and delivers the result asynchronously.
## Async Flow Details
1. **MySQL Thread** (non-blocking):
- Detects `NL2SQL:` prefix
- Constructs JSON: `{"type": "nl2sql", "query": "...", "schema": "..."}`
- Creates socketpair for async communication
- Sends request to GenAI module immediately
- Returns to handle other queries
2. **GenAI Worker Thread**:
- Receives request via socketpair
- Calls `process_json_query()` with nl2sql operation type
- Invokes `LLM_Bridge::convert()`
- Processes LLM response (HTTP via libcurl)
- Sends result back via socketpair
3. **Response Delivery**:
- MySQL thread receives notification via epoll
- Retrieves result from socketpair
- Builds resultset and sends to client
## Components
### 1. LLM_Bridge
**Location**: `include/LLM_Bridge.h`, `lib/LLM_Bridge.cpp`
Main class coordinating the NL2SQL conversion pipeline.
**Key Methods:**
- `convert()`: Main entry point for conversion
- `check_vector_cache()`: Semantic similarity search
- `build_prompt()`: Construct LLM prompt with schema context
- `select_model()`: Choose best LLM provider
- `call_ollama()`, `call_openai()`, `call_anthropic()`: LLM API calls
**Configuration:**
```cpp
struct {
bool enabled;
char* query_prefix; // Default: "NL2SQL:"
char* model_provider; // Default: "ollama"
char* ollama_model; // Default: "llama3.2"
char* openai_model; // Default: "gpt-4o-mini"
char* anthropic_model; // Default: "claude-3-haiku"
int cache_similarity_threshold; // Default: 85
int timeout_ms; // Default: 30000
char* openai_key;
char* anthropic_key;
bool prefer_local;
} config;
```
### 2. LLM_Clients
**Location**: `lib/LLM_Clients.cpp`
HTTP clients for each LLM provider using libcurl.
#### Ollama (Local)
**Endpoint**: `POST http://localhost:11434/api/generate`
**Request Format:**
```json
{
"model": "llama3.2",
"prompt": "Convert to SQL: Show top customers",
"stream": false,
"options": {
"temperature": 0.1,
"num_predict": 500
}
}
```
**Response Format:**
```json
{
"response": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
"model": "llama3.2",
"total_duration": 123456789
}
```
#### OpenAI (Cloud)
**Endpoint**: `POST https://api.openai.com/v1/chat/completions`
**Headers:**
- `Content-Type: application/json`
- `Authorization: Bearer sk-...`
**Request Format:**
```json
{
"model": "gpt-4o-mini",
"messages": [
{"role": "system", "content": "You are a SQL expert..."},
{"role": "user", "content": "Convert to SQL: Show top customers"}
],
"temperature": 0.1,
"max_tokens": 500
}
```
**Response Format:**
```json
{
"choices": [{
"message": {
"content": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
"role": "assistant"
},
"finish_reason": "stop"
}],
"usage": {"total_tokens": 123}
}
```
#### Anthropic (Cloud)
**Endpoint**: `POST https://api.anthropic.com/v1/messages`
**Headers:**
- `Content-Type: application/json`
- `x-api-key: sk-ant-...`
- `anthropic-version: 2023-06-01`
**Request Format:**
```json
{
"model": "claude-3-haiku-20240307",
"max_tokens": 500,
"messages": [
{"role": "user", "content": "Convert to SQL: Show top customers"}
],
"system": "You are a SQL expert...",
"temperature": 0.1
}
```
**Response Format:**
```json
{
"content": [{"type": "text", "text": "SELECT * FROM customers..."}],
"model": "claude-3-haiku-20240307",
"usage": {"input_tokens": 10, "output_tokens": 20}
}
```
### 3. Vector Cache
**Location**: Uses `SQLite3DB` with sqlite-vec extension
**Tables:**
```sql
-- Cache entries
CREATE TABLE llm_cache (
id INTEGER PRIMARY KEY AUTOINCREMENT,
natural_language TEXT NOT NULL,
text_response TEXT NOT NULL,
model_provider TEXT,
confidence REAL,
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);
-- Virtual table for similarity search
CREATE VIRTUAL TABLE llm_cache_vec USING vec0(
embedding FLOAT[1536], -- Dimension depends on embedding model
id INTEGER PRIMARY KEY
);
```
**Similarity Search:**
```sql
SELECT nc.text_response, nc.confidence, distance
FROM llm_cache_vec
JOIN llm_cache nc ON llm_cache_vec.id = nc.id
WHERE embedding MATCH ?
AND k = 10 -- Return top 10 matches
ORDER BY distance
LIMIT 1;
```
### 4. MySQL_Session Integration
**Location**: `lib/MySQL_Session.cpp` (around line ~6867)
Query interception flow:
1. Detect `NL2SQL:` prefix in query
2. Extract natural language text
3. Call `GloAI->get_nl2sql()->convert()`
4. Return generated SQL as resultset
5. User can review and execute
### 5. AI_Features_Manager
**Location**: `include/AI_Features_Manager.h`, `lib/AI_Features_Manager.cpp`
Coordinates all AI features including NL2SQL.
**Responsibilities:**
- Initialize vector database
- Create and manage LLM_Bridge instance
- Handle configuration variables with `genai_llm_` prefix
- Provide thread-safe access to components
## Flow Diagrams
### Conversion Flow
```
┌─────────────────┐
│ NL2SQL Request │
└────────┬────────┘
┌─────────────────────────┐
│ Check Vector Cache │
│ - Generate embedding │
│ - Similarity search │
└────────┬────────────────┘
┌────┴────┐
│ Cache │ No ───────────────┐
│ Hit? │ │
└────┬────┘ │
│ Yes │
▼ │
Return Cached ▼
┌──────────────────┐ ┌─────────────────┐
│ Build Prompt │ │ Select Model │
│ - System role │ │ - Latency │
│ - Schema context │ │ - Preference │
│ - User query │ │ - API keys │
└────────┬─────────┘ └────────┬────────┘
│ │
└─────────┬───────────────┘
┌──────────────────┐
│ Call LLM API │
│ - libcurl HTTP │
│ - JSON parse │
└────────┬─────────┘
┌──────────────────┐
│ Validate SQL │
│ - Keyword check │
│ - Clean output │
└────────┬─────────┘
┌──────────────────┐
│ Store in Cache │
│ - Embed query │
│ - Save result │
└────────┬─────────┘
┌──────────────────┐
│ Return Result │
│ - text_response │
│ - confidence │
│ - explanation │
└──────────────────┘
```
### Model Selection Logic
```
┌─────────────────────────────────┐
│ Start: Select Model │
└────────────┬────────────────────┘
┌─────────────────────┐
│ max_latency_ms < │──── Yes ────┐
│ 500ms? │ │
└────────┬────────────┘ │
│ No │
▼ │
┌─────────────────────┐ │
│ Check provider │ │
│ preference │ │
└────────┬────────────┘ │
│ │
┌──────┴──────┐ │
│ │ │
▼ ▼ │
OpenAI Anthropic Ollama
│ │ │
▼ ▼ │
┌─────────┐ ┌─────────┐ ┌─────────┐
│ API key │ │ API key │ │ Return │
│ set? │ │ set? │ │ OLLAMA │
└────┬────┘ └────┬────┘ └─────────┘
│ │
Yes Yes
│ │
└──────┬─────┘
┌──────────────┐
│ Return cloud │
│ provider │
└──────────────┘
```
## Data Structures
### LLM BridgeRequest
```cpp
struct NL2SQLRequest {
std::string natural_language; // Input query
std::string schema_name; // Current schema
int max_latency_ms; // Latency requirement
bool allow_cache; // Enable cache lookup
std::vector<std::string> context_tables; // Optional table hints
};
```
### LLM BridgeResult
```cpp
struct NL2SQLResult {
std::string text_response; // Generated SQL
float confidence; // 0.0-1.0 score
std::string explanation; // Model info
std::vector<std::string> tables_used; // Referenced tables
bool cached; // From cache
int64_t cache_id; // Cache entry ID
};
```
## Configuration Management
### Variable Namespacing
All LLM variables use `genai_llm_` prefix:
```
genai_llm_enabled
genai_llm_query_prefix
genai_llm_model_provider
genai_llm_ollama_model
genai_llm_openai_model
genai_llm_anthropic_model
genai_llm_cache_similarity_threshold
genai_llm_timeout_ms
genai_llm_openai_key
genai_llm_anthropic_key
genai_llm_prefer_local
```
### Variable Persistence
```
Runtime (memory)
| LOAD MYSQL VARIABLES TO RUNTIME
|
| SET genai_llm_... = 'value'
|
| SAVE MYSQL VARIABLES TO DISK
Disk (config file)
```
## Thread Safety
- **LLM_Bridge**: NOT thread-safe by itself
- **AI_Features_Manager**: Provides thread-safe access via `wrlock()`/`wrunlock()`
- **Vector Cache**: Thread-safe via SQLite mutex
## Error Handling
### Error Categories
1. **LLM API Errors**: Timeout, connection failure, auth failure
- Fallback: Try next available provider
- Return: Empty SQL with error in explanation
2. **SQL Validation Failures**: Doesn't look like SQL
- Return: SQL with warning comment
- Confidence: Low (0.3)
3. **Cache Errors**: Database failures
- Fallback: Continue without cache
- Log: Warning in ProxySQL log
### Logging
All NL2SQL operations log to `proxysql.log`:
```
NL2SQL: Converting query: Show top customers
NL2SQL: Selecting local Ollama due to latency constraint
NL2SQL: Calling Ollama with model: llama3.2
NL2SQL: Conversion complete. Confidence: 0.85
```
## Performance Considerations
### Optimization Strategies
1. **Caching**: Enable for repeated queries
2. **Local First**: Prefer Ollama for lower latency
3. **Timeout**: Set appropriate `genai_llm_timeout_ms`
4. **Batch Requests**: Not yet implemented (planned)
### Resource Usage
- **Memory**: Vector cache grows with usage
- **Network**: HTTP requests for each cache miss
- **CPU**: Embedding generation for cache entries
## Future Enhancements
- **Phase 3**: Full vector cache implementation
- **Phase 3**: Schema context retrieval via MySQL_Tool_Handler
- **Phase 4**: Async conversion API
- **Phase 5**: Batch query conversion
- **Phase 6**: Custom fine-tuned models
## See Also
- [README.md](README.md) - User documentation
- [API.md](API.md) - Complete API reference
- [TESTING.md](TESTING.md) - Testing guide