14 KiB

Raw Blame History

LLM Bridge Architecture

System Overview

Client Query (NL2SQL: ...)
    ↓
MySQL_Session (detects prefix)
    ↓
Convert to JSON: {"type": "nl2sql", "query": "...", "schema": "..."}
    ↓
GenAI Module (async via socketpair)
    ├─ GenAI worker thread processes request
    └─ AI_Features_Manager::get_nl2sql()
        ↓
    LLM_Bridge::convert()
        ├─ check_vector_cache()  ← sqlite-vec similarity search
        ├─ build_prompt()         ← Schema context via MySQL_Tool_Handler
        ├─ select_model()         ← Ollama/OpenAI/Anthropic selection
        ├─ call_llm_api()         ← libcurl HTTP request
        └─ validate_sql()         ← Keyword validation
        ↓
    Async response back to MySQL_Session
    ↓
Return Resultset (text_response, confidence, ...)

Important: NL2SQL uses an asynchronous, non-blocking architecture. The MySQL thread is not blocked while waiting for the LLM response. The request is sent via socketpair to the GenAI module, which processes it in a worker thread and delivers the result asynchronously.

Async Flow Details

MySQL Thread (non-blocking):
- Detects NL2SQL: prefix
- Constructs JSON: {"type": "nl2sql", "query": "...", "schema": "..."}
- Creates socketpair for async communication
- Sends request to GenAI module immediately
- Returns to handle other queries
GenAI Worker Thread:
- Receives request via socketpair
- Calls process_json_query() with nl2sql operation type
- Invokes LLM_Bridge::convert()
- Processes LLM response (HTTP via libcurl)
- Sends result back via socketpair
Response Delivery:
- MySQL thread receives notification via epoll
- Retrieves result from socketpair
- Builds resultset and sends to client

Components

1. LLM_Bridge

Location: include/LLM_Bridge.h, lib/LLM_Bridge.cpp

Main class coordinating the NL2SQL conversion pipeline.

Key Methods:

convert(): Main entry point for conversion
check_vector_cache(): Semantic similarity search
build_prompt(): Construct LLM prompt with schema context
select_model(): Choose best LLM provider
call_ollama(), call_openai(), call_anthropic(): LLM API calls

Configuration:

struct {
    bool enabled;
    char* query_prefix;              // Default: "NL2SQL:"
    char* model_provider;            // Default: "ollama"
    char* ollama_model;              // Default: "llama3.2"
    char* openai_model;              // Default: "gpt-4o-mini"
    char* anthropic_model;           // Default: "claude-3-haiku"
    int cache_similarity_threshold;  // Default: 85
    int timeout_ms;                  // Default: 30000
    char* openai_key;
    char* anthropic_key;
    bool prefer_local;
} config;

2. LLM_Clients

Location: lib/LLM_Clients.cpp

HTTP clients for each LLM provider using libcurl.

Ollama (Local)

Endpoint: POST http://localhost:11434/api/generate

Request Format:

{
  "model": "llama3.2",
  "prompt": "Convert to SQL: Show top customers",
  "stream": false,
  "options": {
    "temperature": 0.1,
    "num_predict": 500
  }
}

Response Format:

{
  "response": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
  "model": "llama3.2",
  "total_duration": 123456789
}

OpenAI (Cloud)

Endpoint: POST https://api.openai.com/v1/chat/completions

Headers:

Content-Type: application/json
Authorization: Bearer sk-...

Request Format:

{
  "model": "gpt-4o-mini",
  "messages": [
    {"role": "system", "content": "You are a SQL expert..."},
    {"role": "user", "content": "Convert to SQL: Show top customers"}
  ],
  "temperature": 0.1,
  "max_tokens": 500
}

Response Format:

{
  "choices": [{
    "message": {
      "content": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
      "role": "assistant"
    },
    "finish_reason": "stop"
  }],
  "usage": {"total_tokens": 123}
}

Anthropic (Cloud)

Endpoint: POST https://api.anthropic.com/v1/messages

Headers:

Content-Type: application/json
x-api-key: sk-ant-...
anthropic-version: 2023-06-01

Request Format:

{
  "model": "claude-3-haiku-20240307",
  "max_tokens": 500,
  "messages": [
    {"role": "user", "content": "Convert to SQL: Show top customers"}
  ],
  "system": "You are a SQL expert...",
  "temperature": 0.1
}

Response Format:

{
  "content": [{"type": "text", "text": "SELECT * FROM customers..."}],
  "model": "claude-3-haiku-20240307",
  "usage": {"input_tokens": 10, "output_tokens": 20}
}

3. Vector Cache

Location: Uses SQLite3DB with sqlite-vec extension

Tables:

-- Cache entries
CREATE TABLE llm_cache (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    natural_language TEXT NOT NULL,
    text_response TEXT NOT NULL,
    model_provider TEXT,
    confidence REAL,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Virtual table for similarity search
CREATE VIRTUAL TABLE llm_cache_vec USING vec0(
    embedding FLOAT[1536],  -- Dimension depends on embedding model
    id INTEGER PRIMARY KEY
);

Similarity Search:

SELECT nc.text_response, nc.confidence, distance
FROM llm_cache_vec
JOIN llm_cache nc ON llm_cache_vec.id = nc.id
WHERE embedding MATCH ?
AND k = 10  -- Return top 10 matches
ORDER BY distance
LIMIT 1;

4. MySQL_Session Integration

Location: lib/MySQL_Session.cpp (around line ~6867)

Query interception flow:

Detect NL2SQL: prefix in query
Extract natural language text
Call GloAI->get_nl2sql()->convert()
Return generated SQL as resultset
User can review and execute

5. AI_Features_Manager

Location: include/AI_Features_Manager.h, lib/AI_Features_Manager.cpp

Coordinates all AI features including NL2SQL.

Responsibilities:

Initialize vector database
Create and manage LLM_Bridge instance
Handle configuration variables with genai_llm_ prefix
Provide thread-safe access to components

Flow Diagrams

Conversion Flow

┌─────────────────┐
│ NL2SQL Request  │
└────────┬────────┘
         │
         ▼
┌─────────────────────────┐
│ Check Vector Cache      │
│ - Generate embedding    │
│ - Similarity search     │
└────────┬────────────────┘
         │
    ┌────┴────┐
    │ Cache   │ No ───────────────┐
    │ Hit?    │                    │
    └────┬────┘                    │
         │ Yes                     │
         ▼                          │
    Return Cached                   ▼
┌──────────────────┐      ┌─────────────────┐
│   Build Prompt   │      │ Select Model    │
│ - System role    │      │ - Latency       │
│ - Schema context │      │ - Preference    │
│ - User query     │      │ - API keys      │
└────────┬─────────┘      └────────┬────────┘
         │                         │
         └─────────┬───────────────┘
                   ▼
         ┌──────────────────┐
         │   Call LLM API   │
         │ - libcurl HTTP   │
         │ - JSON parse     │
         └────────┬─────────┘
                  │
                  ▼
         ┌──────────────────┐
         │  Validate SQL    │
         │ - Keyword check  │
         │ - Clean output   │
         └────────┬─────────┘
                  │
                  ▼
         ┌──────────────────┐
         │ Store in Cache   │
         │ - Embed query    │
         │ - Save result    │
         └────────┬─────────┘
                  │
                  ▼
         ┌──────────────────┐
         │  Return Result   │
         │ - text_response      │
         │ - confidence     │
         │ - explanation    │
         └──────────────────┘

Model Selection Logic

┌─────────────────────────────────┐
│     Start: Select Model         │
└────────────┬────────────────────┘
             │
             ▼
    ┌─────────────────────┐
    │ max_latency_ms <    │──── Yes ────┐
    │ 500ms?              │              │
    └────────┬────────────┘              │
             │ No                        │
             ▼                           │
    ┌─────────────────────┐              │
    │ Check provider      │              │
    │ preference          │              │
    └────────┬────────────┘              │
             │                           │
      ┌──────┴──────┐                   │
      │             │                   │
      ▼             ▼                   │
   OpenAI      Anthropic             Ollama
      │             │                   │
      ▼             ▼                   │
 ┌─────────┐  ┌─────────┐         ┌─────────┐
 │ API key │  │ API key │         │ Return  │
 │ set?    │  │ set?    │         │ OLLAMA  │
 └────┬────┘  └────┬────┘         └─────────┘
      │            │
     Yes          Yes
      │            │
      └──────┬─────┘
             │
             ▼
     ┌──────────────┐
     │ Return cloud │
     │ provider     │
     └──────────────┘

Data Structures

LLM BridgeRequest

struct NL2SQLRequest {
    std::string natural_language;           // Input query
    std::string schema_name;                 // Current schema
    int max_latency_ms;                      // Latency requirement
    bool allow_cache;                        // Enable cache lookup
    std::vector<std::string> context_tables; // Optional table hints
};

LLM BridgeResult

struct NL2SQLResult {
    std::string text_response;                  // Generated SQL
    float confidence;                        // 0.0-1.0 score
    std::string explanation;                 // Model info
    std::vector<std::string> tables_used;    // Referenced tables
    bool cached;                             // From cache
    int64_t cache_id;                        // Cache entry ID
};

Configuration Management

Variable Namespacing

All LLM variables use genai_llm_ prefix:

genai_llm_enabled
genai_llm_query_prefix
genai_llm_model_provider
genai_llm_ollama_model
genai_llm_openai_model
genai_llm_anthropic_model
genai_llm_cache_similarity_threshold
genai_llm_timeout_ms
genai_llm_openai_key
genai_llm_anthropic_key
genai_llm_prefer_local

Variable Persistence

Runtime (memory)
    ↑
    | LOAD MYSQL VARIABLES TO RUNTIME
    |
    | SET genai_llm_... = 'value'
    |
    | SAVE MYSQL VARIABLES TO DISK
    ↓
Disk (config file)

Thread Safety

LLM_Bridge: NOT thread-safe by itself
AI_Features_Manager: Provides thread-safe access via wrlock()/wrunlock()
Vector Cache: Thread-safe via SQLite mutex

Error Handling

Error Categories

LLM API Errors: Timeout, connection failure, auth failure
- Fallback: Try next available provider
- Return: Empty SQL with error in explanation
SQL Validation Failures: Doesn't look like SQL
- Return: SQL with warning comment
- Confidence: Low (0.3)
Cache Errors: Database failures
- Fallback: Continue without cache
- Log: Warning in ProxySQL log

Logging

All NL2SQL operations log to proxysql.log:

NL2SQL: Converting query: Show top customers
NL2SQL: Selecting local Ollama due to latency constraint
NL2SQL: Calling Ollama with model: llama3.2
NL2SQL: Conversion complete. Confidence: 0.85

Performance Considerations

Optimization Strategies

Caching: Enable for repeated queries
Local First: Prefer Ollama for lower latency
Timeout: Set appropriate genai_llm_timeout_ms
Batch Requests: Not yet implemented (planned)

Resource Usage

Memory: Vector cache grows with usage
Network: HTTP requests for each cache miss
CPU: Embedding generation for cache entries

Future Enhancements

Phase 3: Full vector cache implementation
Phase 3: Schema context retrieval via MySQL_Tool_Handler
Phase 4: Async conversion API
Phase 5: Batch query conversion
Phase 6: Custom fine-tuned models

14 KiB Raw Blame History

LLM Bridge Architecture

System Overview

Async Flow Details

Components

1. LLM_Bridge

2. LLM_Clients

Ollama (Local)

OpenAI (Cloud)

Anthropic (Cloud)

3. Vector Cache

4. MySQL_Session Integration

5. AI_Features_Manager

Flow Diagrams

Conversion Flow

Model Selection Logic

Data Structures

LLM BridgeRequest

LLM BridgeResult

Configuration Management

Variable Namespacing

Variable Persistence

Thread Safety

Error Handling

Error Categories

Logging

Performance Considerations

Optimization Strategies

Resource Usage

Future Enhancements

See Also

14 KiB

Raw Blame History