You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
proxysql/doc/LLM_Bridge/ARCHITECTURE.md

14 KiB

LLM Bridge Architecture

System Overview

Client Query (NL2SQL: ...)
    ↓
MySQL_Session (detects prefix)
    ↓
Convert to JSON: {"type": "nl2sql", "query": "...", "schema": "..."}
    ↓
GenAI Module (async via socketpair)
    ├─ GenAI worker thread processes request
    └─ AI_Features_Manager::get_nl2sql()
        ↓
    LLM_Bridge::convert()
        ├─ check_vector_cache()  ← sqlite-vec similarity search
        ├─ build_prompt()         ← Schema context via MySQL_Tool_Handler
        ├─ select_model()         ← Ollama/OpenAI/Anthropic selection
        ├─ call_llm_api()         ← libcurl HTTP request
        └─ validate_sql()         ← Keyword validation
        ↓
    Async response back to MySQL_Session
    ↓
Return Resultset (text_response, confidence, ...)

Important: NL2SQL uses an asynchronous, non-blocking architecture. The MySQL thread is not blocked while waiting for the LLM response. The request is sent via socketpair to the GenAI module, which processes it in a worker thread and delivers the result asynchronously.

Async Flow Details

  1. MySQL Thread (non-blocking):

    • Detects NL2SQL: prefix
    • Constructs JSON: {"type": "nl2sql", "query": "...", "schema": "..."}
    • Creates socketpair for async communication
    • Sends request to GenAI module immediately
    • Returns to handle other queries
  2. GenAI Worker Thread:

    • Receives request via socketpair
    • Calls process_json_query() with nl2sql operation type
    • Invokes LLM_Bridge::convert()
    • Processes LLM response (HTTP via libcurl)
    • Sends result back via socketpair
  3. Response Delivery:

    • MySQL thread receives notification via epoll
    • Retrieves result from socketpair
    • Builds resultset and sends to client

Components

1. LLM_Bridge

Location: include/LLM_Bridge.h, lib/LLM_Bridge.cpp

Main class coordinating the NL2SQL conversion pipeline.

Key Methods:

  • convert(): Main entry point for conversion
  • check_vector_cache(): Semantic similarity search
  • build_prompt(): Construct LLM prompt with schema context
  • select_model(): Choose best LLM provider
  • call_ollama(), call_openai(), call_anthropic(): LLM API calls

Configuration:

struct {
    bool enabled;
    char* query_prefix;              // Default: "NL2SQL:"
    char* model_provider;            // Default: "ollama"
    char* ollama_model;              // Default: "llama3.2"
    char* openai_model;              // Default: "gpt-4o-mini"
    char* anthropic_model;           // Default: "claude-3-haiku"
    int cache_similarity_threshold;  // Default: 85
    int timeout_ms;                  // Default: 30000
    char* openai_key;
    char* anthropic_key;
    bool prefer_local;
} config;

2. LLM_Clients

Location: lib/LLM_Clients.cpp

HTTP clients for each LLM provider using libcurl.

Ollama (Local)

Endpoint: POST http://localhost:11434/api/generate

Request Format:

{
  "model": "llama3.2",
  "prompt": "Convert to SQL: Show top customers",
  "stream": false,
  "options": {
    "temperature": 0.1,
    "num_predict": 500
  }
}

Response Format:

{
  "response": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
  "model": "llama3.2",
  "total_duration": 123456789
}

OpenAI (Cloud)

Endpoint: POST https://api.openai.com/v1/chat/completions

Headers:

  • Content-Type: application/json
  • Authorization: Bearer sk-...

Request Format:

{
  "model": "gpt-4o-mini",
  "messages": [
    {"role": "system", "content": "You are a SQL expert..."},
    {"role": "user", "content": "Convert to SQL: Show top customers"}
  ],
  "temperature": 0.1,
  "max_tokens": 500
}

Response Format:

{
  "choices": [{
    "message": {
      "content": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
      "role": "assistant"
    },
    "finish_reason": "stop"
  }],
  "usage": {"total_tokens": 123}
}

Anthropic (Cloud)

Endpoint: POST https://api.anthropic.com/v1/messages

Headers:

  • Content-Type: application/json
  • x-api-key: sk-ant-...
  • anthropic-version: 2023-06-01

Request Format:

{
  "model": "claude-3-haiku-20240307",
  "max_tokens": 500,
  "messages": [
    {"role": "user", "content": "Convert to SQL: Show top customers"}
  ],
  "system": "You are a SQL expert...",
  "temperature": 0.1
}

Response Format:

{
  "content": [{"type": "text", "text": "SELECT * FROM customers..."}],
  "model": "claude-3-haiku-20240307",
  "usage": {"input_tokens": 10, "output_tokens": 20}
}

3. Vector Cache

Location: Uses SQLite3DB with sqlite-vec extension

Tables:

-- Cache entries
CREATE TABLE llm_cache (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    natural_language TEXT NOT NULL,
    text_response TEXT NOT NULL,
    model_provider TEXT,
    confidence REAL,
    created_at DATETIME DEFAULT CURRENT_TIMESTAMP
);

-- Virtual table for similarity search
CREATE VIRTUAL TABLE llm_cache_vec USING vec0(
    embedding FLOAT[1536],  -- Dimension depends on embedding model
    id INTEGER PRIMARY KEY
);

Similarity Search:

SELECT nc.text_response, nc.confidence, distance
FROM llm_cache_vec
JOIN llm_cache nc ON llm_cache_vec.id = nc.id
WHERE embedding MATCH ?
AND k = 10  -- Return top 10 matches
ORDER BY distance
LIMIT 1;

4. MySQL_Session Integration

Location: lib/MySQL_Session.cpp (around line ~6867)

Query interception flow:

  1. Detect NL2SQL: prefix in query
  2. Extract natural language text
  3. Call GloAI->get_nl2sql()->convert()
  4. Return generated SQL as resultset
  5. User can review and execute

5. AI_Features_Manager

Location: include/AI_Features_Manager.h, lib/AI_Features_Manager.cpp

Coordinates all AI features including NL2SQL.

Responsibilities:

  • Initialize vector database
  • Create and manage LLM_Bridge instance
  • Handle configuration variables with genai_llm_ prefix
  • Provide thread-safe access to components

Flow Diagrams

Conversion Flow

┌─────────────────┐
│ NL2SQL Request  │
└────────┬────────┘
         │
         ▼
┌─────────────────────────┐
│ Check Vector Cache      │
│ - Generate embedding    │
│ - Similarity search     │
└────────┬────────────────┘
         │
    ┌────┴────┐
    │ Cache   │ No ───────────────┐
    │ Hit?    │                    │
    └────┬────┘                    │
         │ Yes                     │
         ▼                          │
    Return Cached                   ▼
┌──────────────────┐      ┌─────────────────┐
│   Build Prompt   │      │ Select Model    │
│ - System role    │      │ - Latency       │
│ - Schema context │      │ - Preference    │
│ - User query     │      │ - API keys      │
└────────┬─────────┘      └────────┬────────┘
         │                         │
         └─────────┬───────────────┘
                   ▼
         ┌──────────────────┐
         │   Call LLM API   │
         │ - libcurl HTTP   │
         │ - JSON parse     │
         └────────┬─────────┘
                  │
                  ▼
         ┌──────────────────┐
         │  Validate SQL    │
         │ - Keyword check  │
         │ - Clean output   │
         └────────┬─────────┘
                  │
                  ▼
         ┌──────────────────┐
         │ Store in Cache   │
         │ - Embed query    │
         │ - Save result    │
         └────────┬─────────┘
                  │
                  ▼
         ┌──────────────────┐
         │  Return Result   │
         │ - text_response      │
         │ - confidence     │
         │ - explanation    │
         └──────────────────┘

Model Selection Logic

┌─────────────────────────────────┐
│     Start: Select Model         │
└────────────┬────────────────────┘
             │
             ▼
    ┌─────────────────────┐
    │ max_latency_ms <    │──── Yes ────┐
    │ 500ms?              │              │
    └────────┬────────────┘              │
             │ No                        │
             ▼                           │
    ┌─────────────────────┐              │
    │ Check provider      │              │
    │ preference          │              │
    └────────┬────────────┘              │
             │                           │
      ┌──────┴──────┐                   │
      │             │                   │
      ▼             ▼                   │
   OpenAI      Anthropic             Ollama
      │             │                   │
      ▼             ▼                   │
 ┌─────────┐  ┌─────────┐         ┌─────────┐
 │ API key │  │ API key │         │ Return  │
 │ set?    │  │ set?    │         │ OLLAMA  │
 └────┬────┘  └────┬────┘         └─────────┘
      │            │
     Yes          Yes
      │            │
      └──────┬─────┘
             │
             ▼
     ┌──────────────┐
     │ Return cloud │
     │ provider     │
     └──────────────┘

Data Structures

LLM BridgeRequest

struct NL2SQLRequest {
    std::string natural_language;           // Input query
    std::string schema_name;                 // Current schema
    int max_latency_ms;                      // Latency requirement
    bool allow_cache;                        // Enable cache lookup
    std::vector<std::string> context_tables; // Optional table hints
};

LLM BridgeResult

struct NL2SQLResult {
    std::string text_response;                  // Generated SQL
    float confidence;                        // 0.0-1.0 score
    std::string explanation;                 // Model info
    std::vector<std::string> tables_used;    // Referenced tables
    bool cached;                             // From cache
    int64_t cache_id;                        // Cache entry ID
};

Configuration Management

Variable Namespacing

All LLM variables use genai_llm_ prefix:

genai_llm_enabled
genai_llm_query_prefix
genai_llm_model_provider
genai_llm_ollama_model
genai_llm_openai_model
genai_llm_anthropic_model
genai_llm_cache_similarity_threshold
genai_llm_timeout_ms
genai_llm_openai_key
genai_llm_anthropic_key
genai_llm_prefer_local

Variable Persistence

Runtime (memory)
    ↑
    | LOAD MYSQL VARIABLES TO RUNTIME
    |
    | SET genai_llm_... = 'value'
    |
    | SAVE MYSQL VARIABLES TO DISK
    ↓
Disk (config file)

Thread Safety

  • LLM_Bridge: NOT thread-safe by itself
  • AI_Features_Manager: Provides thread-safe access via wrlock()/wrunlock()
  • Vector Cache: Thread-safe via SQLite mutex

Error Handling

Error Categories

  1. LLM API Errors: Timeout, connection failure, auth failure

    • Fallback: Try next available provider
    • Return: Empty SQL with error in explanation
  2. SQL Validation Failures: Doesn't look like SQL

    • Return: SQL with warning comment
    • Confidence: Low (0.3)
  3. Cache Errors: Database failures

    • Fallback: Continue without cache
    • Log: Warning in ProxySQL log

Logging

All NL2SQL operations log to proxysql.log:

NL2SQL: Converting query: Show top customers
NL2SQL: Selecting local Ollama due to latency constraint
NL2SQL: Calling Ollama with model: llama3.2
NL2SQL: Conversion complete. Confidence: 0.85

Performance Considerations

Optimization Strategies

  1. Caching: Enable for repeated queries
  2. Local First: Prefer Ollama for lower latency
  3. Timeout: Set appropriate genai_llm_timeout_ms
  4. Batch Requests: Not yet implemented (planned)

Resource Usage

  • Memory: Vector cache grows with usage
  • Network: HTTP requests for each cache miss
  • CPU: Embedding generation for cache entries

Future Enhancements

  • Phase 3: Full vector cache implementation
  • Phase 3: Schema context retrieval via MySQL_Tool_Handler
  • Phase 4: Async conversion API
  • Phase 5: Batch query conversion
  • Phase 6: Custom fine-tuned models

See Also