mirror of https://github.com/sysown/proxysql
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
464 lines
14 KiB
464 lines
14 KiB
# LLM Bridge Architecture
|
|
|
|
## System Overview
|
|
|
|
```
|
|
Client Query (NL2SQL: ...)
|
|
↓
|
|
MySQL_Session (detects prefix)
|
|
↓
|
|
Convert to JSON: {"type": "nl2sql", "query": "...", "schema": "..."}
|
|
↓
|
|
GenAI Module (async via socketpair)
|
|
├─ GenAI worker thread processes request
|
|
└─ AI_Features_Manager::get_nl2sql()
|
|
↓
|
|
LLM_Bridge::convert()
|
|
├─ check_vector_cache() ← sqlite-vec similarity search
|
|
├─ build_prompt() ← Schema context via MySQL_Tool_Handler
|
|
├─ select_model() ← Ollama/OpenAI/Anthropic selection
|
|
├─ call_llm_api() ← libcurl HTTP request
|
|
└─ validate_sql() ← Keyword validation
|
|
↓
|
|
Async response back to MySQL_Session
|
|
↓
|
|
Return Resultset (text_response, confidence, ...)
|
|
```
|
|
|
|
**Important**: NL2SQL uses an **asynchronous, non-blocking architecture**. The MySQL thread is not blocked while waiting for the LLM response. The request is sent via socketpair to the GenAI module, which processes it in a worker thread and delivers the result asynchronously.
|
|
|
|
## Async Flow Details
|
|
|
|
1. **MySQL Thread** (non-blocking):
|
|
- Detects `NL2SQL:` prefix
|
|
- Constructs JSON: `{"type": "nl2sql", "query": "...", "schema": "..."}`
|
|
- Creates socketpair for async communication
|
|
- Sends request to GenAI module immediately
|
|
- Returns to handle other queries
|
|
|
|
2. **GenAI Worker Thread**:
|
|
- Receives request via socketpair
|
|
- Calls `process_json_query()` with nl2sql operation type
|
|
- Invokes `LLM_Bridge::convert()`
|
|
- Processes LLM response (HTTP via libcurl)
|
|
- Sends result back via socketpair
|
|
|
|
3. **Response Delivery**:
|
|
- MySQL thread receives notification via epoll
|
|
- Retrieves result from socketpair
|
|
- Builds resultset and sends to client
|
|
|
|
## Components
|
|
|
|
### 1. LLM_Bridge
|
|
|
|
**Location**: `include/LLM_Bridge.h`, `lib/LLM_Bridge.cpp`
|
|
|
|
Main class coordinating the NL2SQL conversion pipeline.
|
|
|
|
**Key Methods:**
|
|
- `convert()`: Main entry point for conversion
|
|
- `check_vector_cache()`: Semantic similarity search
|
|
- `build_prompt()`: Construct LLM prompt with schema context
|
|
- `select_model()`: Choose best LLM provider
|
|
- `call_ollama()`, `call_openai()`, `call_anthropic()`: LLM API calls
|
|
|
|
**Configuration:**
|
|
```cpp
|
|
struct {
|
|
bool enabled;
|
|
char* query_prefix; // Default: "NL2SQL:"
|
|
char* model_provider; // Default: "ollama"
|
|
char* ollama_model; // Default: "llama3.2"
|
|
char* openai_model; // Default: "gpt-4o-mini"
|
|
char* anthropic_model; // Default: "claude-3-haiku"
|
|
int cache_similarity_threshold; // Default: 85
|
|
int timeout_ms; // Default: 30000
|
|
char* openai_key;
|
|
char* anthropic_key;
|
|
bool prefer_local;
|
|
} config;
|
|
```
|
|
|
|
### 2. LLM_Clients
|
|
|
|
**Location**: `lib/LLM_Clients.cpp`
|
|
|
|
HTTP clients for each LLM provider using libcurl.
|
|
|
|
#### Ollama (Local)
|
|
|
|
**Endpoint**: `POST http://localhost:11434/api/generate`
|
|
|
|
**Request Format:**
|
|
```json
|
|
{
|
|
"model": "llama3.2",
|
|
"prompt": "Convert to SQL: Show top customers",
|
|
"stream": false,
|
|
"options": {
|
|
"temperature": 0.1,
|
|
"num_predict": 500
|
|
}
|
|
}
|
|
```
|
|
|
|
**Response Format:**
|
|
```json
|
|
{
|
|
"response": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
|
|
"model": "llama3.2",
|
|
"total_duration": 123456789
|
|
}
|
|
```
|
|
|
|
#### OpenAI (Cloud)
|
|
|
|
**Endpoint**: `POST https://api.openai.com/v1/chat/completions`
|
|
|
|
**Headers:**
|
|
- `Content-Type: application/json`
|
|
- `Authorization: Bearer sk-...`
|
|
|
|
**Request Format:**
|
|
```json
|
|
{
|
|
"model": "gpt-4o-mini",
|
|
"messages": [
|
|
{"role": "system", "content": "You are a SQL expert..."},
|
|
{"role": "user", "content": "Convert to SQL: Show top customers"}
|
|
],
|
|
"temperature": 0.1,
|
|
"max_tokens": 500
|
|
}
|
|
```
|
|
|
|
**Response Format:**
|
|
```json
|
|
{
|
|
"choices": [{
|
|
"message": {
|
|
"content": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
|
|
"role": "assistant"
|
|
},
|
|
"finish_reason": "stop"
|
|
}],
|
|
"usage": {"total_tokens": 123}
|
|
}
|
|
```
|
|
|
|
#### Anthropic (Cloud)
|
|
|
|
**Endpoint**: `POST https://api.anthropic.com/v1/messages`
|
|
|
|
**Headers:**
|
|
- `Content-Type: application/json`
|
|
- `x-api-key: sk-ant-...`
|
|
- `anthropic-version: 2023-06-01`
|
|
|
|
**Request Format:**
|
|
```json
|
|
{
|
|
"model": "claude-3-haiku-20240307",
|
|
"max_tokens": 500,
|
|
"messages": [
|
|
{"role": "user", "content": "Convert to SQL: Show top customers"}
|
|
],
|
|
"system": "You are a SQL expert...",
|
|
"temperature": 0.1
|
|
}
|
|
```
|
|
|
|
**Response Format:**
|
|
```json
|
|
{
|
|
"content": [{"type": "text", "text": "SELECT * FROM customers..."}],
|
|
"model": "claude-3-haiku-20240307",
|
|
"usage": {"input_tokens": 10, "output_tokens": 20}
|
|
}
|
|
```
|
|
|
|
### 3. Vector Cache
|
|
|
|
**Location**: Uses `SQLite3DB` with sqlite-vec extension
|
|
|
|
**Tables:**
|
|
|
|
```sql
|
|
-- Cache entries
|
|
CREATE TABLE llm_cache (
|
|
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
|
natural_language TEXT NOT NULL,
|
|
text_response TEXT NOT NULL,
|
|
model_provider TEXT,
|
|
confidence REAL,
|
|
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
|
);
|
|
|
|
-- Virtual table for similarity search
|
|
CREATE VIRTUAL TABLE llm_cache_vec USING vec0(
|
|
embedding FLOAT[1536], -- Dimension depends on embedding model
|
|
id INTEGER PRIMARY KEY
|
|
);
|
|
```
|
|
|
|
**Similarity Search:**
|
|
```sql
|
|
SELECT nc.text_response, nc.confidence, distance
|
|
FROM llm_cache_vec
|
|
JOIN llm_cache nc ON llm_cache_vec.id = nc.id
|
|
WHERE embedding MATCH ?
|
|
AND k = 10 -- Return top 10 matches
|
|
ORDER BY distance
|
|
LIMIT 1;
|
|
```
|
|
|
|
### 4. MySQL_Session Integration
|
|
|
|
**Location**: `lib/MySQL_Session.cpp` (around line ~6867)
|
|
|
|
Query interception flow:
|
|
|
|
1. Detect `NL2SQL:` prefix in query
|
|
2. Extract natural language text
|
|
3. Call `GloAI->get_nl2sql()->convert()`
|
|
4. Return generated SQL as resultset
|
|
5. User can review and execute
|
|
|
|
### 5. AI_Features_Manager
|
|
|
|
**Location**: `include/AI_Features_Manager.h`, `lib/AI_Features_Manager.cpp`
|
|
|
|
Coordinates all AI features including NL2SQL.
|
|
|
|
**Responsibilities:**
|
|
- Initialize vector database
|
|
- Create and manage LLM_Bridge instance
|
|
- Handle configuration variables with `genai_llm_` prefix
|
|
- Provide thread-safe access to components
|
|
|
|
## Flow Diagrams
|
|
|
|
### Conversion Flow
|
|
|
|
```
|
|
┌─────────────────┐
|
|
│ NL2SQL Request │
|
|
└────────┬────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────────┐
|
|
│ Check Vector Cache │
|
|
│ - Generate embedding │
|
|
│ - Similarity search │
|
|
└────────┬────────────────┘
|
|
│
|
|
┌────┴────┐
|
|
│ Cache │ No ───────────────┐
|
|
│ Hit? │ │
|
|
└────┬────┘ │
|
|
│ Yes │
|
|
▼ │
|
|
Return Cached ▼
|
|
┌──────────────────┐ ┌─────────────────┐
|
|
│ Build Prompt │ │ Select Model │
|
|
│ - System role │ │ - Latency │
|
|
│ - Schema context │ │ - Preference │
|
|
│ - User query │ │ - API keys │
|
|
└────────┬─────────┘ └────────┬────────┘
|
|
│ │
|
|
└─────────┬───────────────┘
|
|
▼
|
|
┌──────────────────┐
|
|
│ Call LLM API │
|
|
│ - libcurl HTTP │
|
|
│ - JSON parse │
|
|
└────────┬─────────┘
|
|
│
|
|
▼
|
|
┌──────────────────┐
|
|
│ Validate SQL │
|
|
│ - Keyword check │
|
|
│ - Clean output │
|
|
└────────┬─────────┘
|
|
│
|
|
▼
|
|
┌──────────────────┐
|
|
│ Store in Cache │
|
|
│ - Embed query │
|
|
│ - Save result │
|
|
└────────┬─────────┘
|
|
│
|
|
▼
|
|
┌──────────────────┐
|
|
│ Return Result │
|
|
│ - text_response │
|
|
│ - confidence │
|
|
│ - explanation │
|
|
└──────────────────┘
|
|
```
|
|
|
|
### Model Selection Logic
|
|
|
|
```
|
|
┌─────────────────────────────────┐
|
|
│ Start: Select Model │
|
|
└────────────┬────────────────────┘
|
|
│
|
|
▼
|
|
┌─────────────────────┐
|
|
│ max_latency_ms < │──── Yes ────┐
|
|
│ 500ms? │ │
|
|
└────────┬────────────┘ │
|
|
│ No │
|
|
▼ │
|
|
┌─────────────────────┐ │
|
|
│ Check provider │ │
|
|
│ preference │ │
|
|
└────────┬────────────┘ │
|
|
│ │
|
|
┌──────┴──────┐ │
|
|
│ │ │
|
|
▼ ▼ │
|
|
OpenAI Anthropic Ollama
|
|
│ │ │
|
|
▼ ▼ │
|
|
┌─────────┐ ┌─────────┐ ┌─────────┐
|
|
│ API key │ │ API key │ │ Return │
|
|
│ set? │ │ set? │ │ OLLAMA │
|
|
└────┬────┘ └────┬────┘ └─────────┘
|
|
│ │
|
|
Yes Yes
|
|
│ │
|
|
└──────┬─────┘
|
|
│
|
|
▼
|
|
┌──────────────┐
|
|
│ Return cloud │
|
|
│ provider │
|
|
└──────────────┘
|
|
```
|
|
|
|
## Data Structures
|
|
|
|
### LLM BridgeRequest
|
|
|
|
```cpp
|
|
struct NL2SQLRequest {
|
|
std::string natural_language; // Input query
|
|
std::string schema_name; // Current schema
|
|
int max_latency_ms; // Latency requirement
|
|
bool allow_cache; // Enable cache lookup
|
|
std::vector<std::string> context_tables; // Optional table hints
|
|
};
|
|
```
|
|
|
|
### LLM BridgeResult
|
|
|
|
```cpp
|
|
struct NL2SQLResult {
|
|
std::string text_response; // Generated SQL
|
|
float confidence; // 0.0-1.0 score
|
|
std::string explanation; // Model info
|
|
std::vector<std::string> tables_used; // Referenced tables
|
|
bool cached; // From cache
|
|
int64_t cache_id; // Cache entry ID
|
|
};
|
|
```
|
|
|
|
## Configuration Management
|
|
|
|
### Variable Namespacing
|
|
|
|
All LLM variables use `genai_llm_` prefix:
|
|
|
|
```
|
|
genai_llm_enabled
|
|
genai_llm_query_prefix
|
|
genai_llm_model_provider
|
|
genai_llm_ollama_model
|
|
genai_llm_openai_model
|
|
genai_llm_anthropic_model
|
|
genai_llm_cache_similarity_threshold
|
|
genai_llm_timeout_ms
|
|
genai_llm_openai_key
|
|
genai_llm_anthropic_key
|
|
genai_llm_prefer_local
|
|
```
|
|
|
|
### Variable Persistence
|
|
|
|
```
|
|
Runtime (memory)
|
|
↑
|
|
| LOAD MYSQL VARIABLES TO RUNTIME
|
|
|
|
|
| SET genai_llm_... = 'value'
|
|
|
|
|
| SAVE MYSQL VARIABLES TO DISK
|
|
↓
|
|
Disk (config file)
|
|
```
|
|
|
|
## Thread Safety
|
|
|
|
- **LLM_Bridge**: NOT thread-safe by itself
|
|
- **AI_Features_Manager**: Provides thread-safe access via `wrlock()`/`wrunlock()`
|
|
- **Vector Cache**: Thread-safe via SQLite mutex
|
|
|
|
## Error Handling
|
|
|
|
### Error Categories
|
|
|
|
1. **LLM API Errors**: Timeout, connection failure, auth failure
|
|
- Fallback: Try next available provider
|
|
- Return: Empty SQL with error in explanation
|
|
|
|
2. **SQL Validation Failures**: Doesn't look like SQL
|
|
- Return: SQL with warning comment
|
|
- Confidence: Low (0.3)
|
|
|
|
3. **Cache Errors**: Database failures
|
|
- Fallback: Continue without cache
|
|
- Log: Warning in ProxySQL log
|
|
|
|
### Logging
|
|
|
|
All NL2SQL operations log to `proxysql.log`:
|
|
|
|
```
|
|
NL2SQL: Converting query: Show top customers
|
|
NL2SQL: Selecting local Ollama due to latency constraint
|
|
NL2SQL: Calling Ollama with model: llama3.2
|
|
NL2SQL: Conversion complete. Confidence: 0.85
|
|
```
|
|
|
|
## Performance Considerations
|
|
|
|
### Optimization Strategies
|
|
|
|
1. **Caching**: Enable for repeated queries
|
|
2. **Local First**: Prefer Ollama for lower latency
|
|
3. **Timeout**: Set appropriate `genai_llm_timeout_ms`
|
|
4. **Batch Requests**: Not yet implemented (planned)
|
|
|
|
### Resource Usage
|
|
|
|
- **Memory**: Vector cache grows with usage
|
|
- **Network**: HTTP requests for each cache miss
|
|
- **CPU**: Embedding generation for cache entries
|
|
|
|
## Future Enhancements
|
|
|
|
- **Phase 3**: Full vector cache implementation
|
|
- **Phase 3**: Schema context retrieval via MySQL_Tool_Handler
|
|
- **Phase 4**: Async conversion API
|
|
- **Phase 5**: Batch query conversion
|
|
- **Phase 6**: Custom fine-tuned models
|
|
|
|
## See Also
|
|
|
|
- [README.md](README.md) - User documentation
|
|
- [API.md](API.md) - Complete API reference
|
|
- [TESTING.md](TESTING.md) - Testing guide
|