You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
proxysql/doc/LLM_Bridge
Rene Cannao 1193a55e7b
docs: Remove Version History section from LLM Bridge README
1 month ago
..
API.md docs: Rename NL2SQL documentation to LLM Bridge 1 month ago
ARCHITECTURE.md docs: Rename NL2SQL documentation to LLM Bridge 1 month ago
README.md docs: Remove Version History section from LLM Bridge README 1 month ago
TESTING.md docs: Rename NL2SQL documentation to LLM Bridge 1 month ago

README.md

LLM Bridge - Generic LLM Access for ProxySQL

Overview

LLM Bridge is a ProxySQL feature that provides generic access to Large Language Models (LLMs) through the MySQL protocol. It allows you to send any prompt to an LLM and receive the response as a MySQL resultset.

Note: This feature was previously called "NL2SQL" (Natural Language to SQL) but has been converted to a generic LLM bridge. Future NL2SQL functionality will be implemented as a Web UI using external agents (Claude Code + MCP server).

Features

  • Generic Provider Support: Works with any OpenAI-compatible or Anthropic-compatible endpoint
  • Semantic Caching: Vector-based cache for similar prompts using sqlite-vec
  • Multi-Provider: Switch between LLM providers seamlessly
  • Versatile: Use LLMs for summarization, code generation, translation, analysis, etc.

Supported Endpoints:

  • Ollama (via OpenAI-compatible /v1/chat/completions endpoint)
  • OpenAI
  • Anthropic
  • vLLM
  • LM Studio
  • Z.ai
  • Any other OpenAI-compatible or Anthropic-compatible endpoint

Quick Start

1. Enable LLM Bridge

-- Via admin interface
SET genai-llm_enabled='true';
LOAD GENAI VARIABLES TO RUNTIME;

2. Configure LLM Provider

ProxySQL uses a generic provider configuration that supports any OpenAI-compatible or Anthropic-compatible endpoint.

Using Ollama (default):

Ollama is used via its OpenAI-compatible endpoint:

SET genai-llm_provider='openai';
SET genai-llm_provider_url='http://localhost:11434/v1/chat/completions';
SET genai-llm_provider_model='llama3.2';
SET genai-llm_provider_key='';  -- Empty for local Ollama
LOAD GENAI VARIABLES TO RUNTIME;

Using OpenAI:

SET genai-llm_provider='openai';
SET genai-llm_provider_url='https://api.openai.com/v1/chat/completions';
SET genai-llm_provider_model='gpt-4';
SET genai-llm_provider_key='sk-...';  -- Your OpenAI API key
LOAD GENAI VARIABLES TO RUNTIME;

Using Anthropic:

SET genai-llm_provider='anthropic';
SET genai-llm_provider_url='https://api.anthropic.com/v1/messages';
SET genai-llm_provider_model='claude-3-opus-20240229';
SET genai-llm_provider_key='sk-ant-...';  -- Your Anthropic API key
LOAD GENAI VARIABLES TO RUNTIME;

Using any OpenAI-compatible endpoint:

This works with any OpenAI-compatible API (vLLM, LM Studio, Z.ai, etc.):

SET genai-llm_provider='openai';
SET genai-llm_provider_url='https://your-endpoint.com/v1/chat/completions';
SET genai-llm_provider_model='your-model-name';
SET genai-llm_provider_key='your-api-key';  -- Empty for local endpoints
LOAD GENAI VARIABLES TO RUNTIME;

3. Use the LLM Bridge

Once configured, you can send prompts using the /* LLM: */ prefix:

-- Summarize text
mysql> /* LLM: */ Summarize the customer feedback from last week

-- Explain SQL queries
mysql> /* LLM: */ Explain this query: SELECT COUNT(*) FROM users WHERE active = 1

-- Generate code
mysql> /* LLM: */ Generate a Python function to validate email addresses

-- Translate text
mysql> /* LLM: */ Translate "Hello world" to Spanish

-- Analyze data
mysql> /* LLM: */ Analyze the following sales data and provide insights

Important: LLM queries are executed in the MySQL module (your regular SQL client), not in the ProxySQL Admin interface. The Admin interface is only for configuration.

Response Format

The LLM Bridge returns a resultset with the following columns:

Column Description
text_response The LLM's text response
explanation Which model/provider generated the response
cached Whether the response was from cache (true/false)
provider The provider used (openai/anthropic)

Configuration Variables

Variable Default Description
genai-llm_enabled false Master enable for LLM bridge
genai-llm_provider openai Provider type (openai/anthropic)
genai-llm_provider_url http://localhost:11434/v1/chat/completions LLM endpoint URL
genai-llm_provider_model llama3.2 Model name
genai-llm_provider_key (empty) API key (optional for local)
genai-llm_cache_enabled true Enable semantic cache
genai-llm_cache_similarity_threshold 85 Cache similarity threshold (0-100)
genai-llm_timeout_ms 30000 Request timeout in milliseconds

Request Configuration (Advanced)

When using LLM bridge programmatically, you can configure retry behavior:

Parameter Default Description
max_retries 3 Maximum retry attempts for transient failures
retry_backoff_ms 1000 Initial backoff in milliseconds
retry_multiplier 2.0 Backoff multiplier for exponential backoff
retry_max_backoff_ms 30000 Maximum backoff in milliseconds
allow_cache true Enable semantic cache lookup

Error Handling

LLM Bridge provides structured error information to help diagnose issues:

Error Code Description HTTP Status
ERR_API_KEY_MISSING API key not configured N/A
ERR_API_KEY_INVALID API key format is invalid N/A
ERR_TIMEOUT Request timed out N/A
ERR_CONNECTION_FAILED Network connection failed 0
ERR_RATE_LIMITED Rate limited by provider 429
ERR_SERVER_ERROR Server error 500-599
ERR_EMPTY_RESPONSE Empty response from LLM N/A
ERR_INVALID_RESPONSE Malformed response from LLM N/A
ERR_VALIDATION_FAILED Input validation failed N/A
ERR_UNKNOWN_PROVIDER Invalid provider name N/A
ERR_REQUEST_TOO_LARGE Request exceeds size limit 413

Result Fields:

  • error_code: Structured error code (e.g., "ERR_API_KEY_MISSING")
  • error_details: Detailed error context with query, provider, URL
  • http_status_code: HTTP status code if applicable
  • provider_used: Which provider was attempted

Request Correlation

Each LLM request generates a unique request ID for log correlation:

LLM [a1b2c3d4-e5f6-7890-abcd-ef1234567890]: REQUEST url=http://... model=llama3.2
LLM [a1b2c3d4-e5f6-7890-abcd-ef1234567890]: RESPONSE status=200 duration_ms=1234

This allows tracing a single request through all log lines for debugging.

Use Cases

1. Text Summarization

/* LLM: */ Summarize this text: [long text...]

2. Code Generation

/* LLM: */ Write a Python function to check if a number is prime
/* LLM: */ Generate a SQL query to find duplicate users

3. Query Explanation

/* LLM: */ Explain what this query does: SELECT * FROM orders WHERE status = 'pending'
/* LLM: */ Why is this query slow: SELECT * FROM users JOIN orders ON...

4. Data Analysis

/* LLM: */ Analyze this CSV data and identify trends: [data...]
/* LLM: */ What insights can you derive from these sales figures?

5. Translation

/* LLM: */ Translate "Good morning" to French, German, and Spanish
/* LLM: */ Convert this SQL query to PostgreSQL dialect

6. Documentation

/* LLM: */ Write documentation for this function: [code...]
/* LLM: */ Generate API documentation for the users endpoint

7. Code Review

/* LLM: */ Review this code for security issues: [code...]
/* LLM: */ Suggest optimizations for this query

Examples

Basic Usage

-- Get a summary
mysql> /* LLM: */ What is machine learning?

-- Generate code
mysql> /* LLM: */ Write a function to calculate fibonacci numbers in JavaScript

-- Explain concepts
mysql> /* LLM: */ Explain the difference between INNER JOIN and LEFT JOIN

Complex Prompts

-- Multi-step reasoning
mysql> /* LLM: */ Analyze the performance implications of using VARCHAR(255) vs TEXT in MySQL

-- Code with specific requirements
mysql> /* LLM: */ Write a Python script that reads a CSV file, filters rows where amount > 100, and outputs to JSON

-- Technical documentation
mysql> /* LLM: */ Create API documentation for a user registration endpoint with validation rules

Results

LLM Bridge returns a resultset with:

Column Type Description
text_response TEXT LLM's text response
explanation TEXT Which model was used
cached BOOLEAN Whether from semantic cache
error_code TEXT Structured error code (if error)
error_details TEXT Detailed error context (if error)
http_status_code INT HTTP status code (if applicable)
provider TEXT Which provider was used

Example successful response:

+-------------------------------------------------------------+----------------------+------+----------+
| text_response                                               | explanation          | cached | provider |
+-------------------------------------------------------------+----------------------+------+----------+
| Machine learning is a subset of artificial intelligence   | Generated by llama3.2 |      0 | openai   |
| that enables systems to learn from data...               |                      |        |          |
+-------------------------------------------------------------+----------------------+------+----------+

Example error response:

+-----------------------------------------------------------------------+
| text_response                                                         |
+-----------------------------------------------------------------------+
| -- LLM processing failed                                              |
|                                                                       |
| error_code: ERR_API_KEY_MISSING                                       |
| error_details: LLM processing failed:                                 |
|   Query: What is machine learning?                                     |
|   Provider: openai                                                    |
|   URL: https://api.openai.com/v1/chat/completions                    |
|   Error: API key not configured                                       |
|                                                                       |
| http_status_code: 0                                                  |
| provider_used: openai                                                 |
+-----------------------------------------------------------------------+

Troubleshooting

LLM Bridge returns empty result

  1. Check AI module is initialized:

    SELECT * FROM runtime_mysql_servers WHERE variable_name LIKE 'ai_%';
    
  2. Verify LLM is accessible:

    # For Ollama
    curl http://localhost:11434/api/tags
    
    # For cloud APIs, check your API keys
    
  3. Check logs with request ID:

    # Find all log lines for a specific request
    tail -f proxysql.log | grep "LLM \[a1b2c3d4"
    
  4. Check error details:

    • Review error_code for structured error type
    • Review error_details for full context including query, provider, URL
    • Review http_status_code for HTTP-level errors (429 = rate limit, 500+ = server error)

Retry Behavior

LLM Bridge automatically retries on transient failures:

  • Rate limiting (HTTP 429): Retries with exponential backoff
  • Server errors (500-504): Retries with exponential backoff
  • Network errors: Retries with exponential backoff

Default retry behavior:

  • Maximum retries: 3
  • Initial backoff: 1000ms
  • Multiplier: 2.0x
  • Maximum backoff: 30000ms

Log output during retry:

LLM [request-id]: ERROR phase=llm error=Empty response status=0
LLM [request-id]: Retryable error (status=0), retrying in 1000ms (attempt 1/4)
LLM [request-id]: Request succeeded after 1 retries

Slow Responses

  1. Try a different model:

    SET genai-llm_provider_model='llama3.2';  -- Faster than GPT-4
    LOAD GENAI VARIABLES TO RUNTIME;
    
  2. Use local Ollama for faster responses:

    SET genai-llm_provider_url='http://localhost:11434/v1/chat/completions';
    LOAD GENAI VARIABLES TO RUNTIME;
    
  3. Increase timeout for complex prompts:

    SET genai-llm_timeout_ms=60000;
    LOAD GENAI VARIABLES TO RUNTIME;
    

Cache Issues

-- Check cache stats
SHOW STATUS LIKE 'llm_%';

-- Cache is automatically managed based on semantic similarity
-- Adjust similarity threshold if needed
SET genai-llm_cache_similarity_threshold=80;  -- Lower = more matches
LOAD GENAI VARIABLES TO RUNTIME;

Status Variables

Monitor LLM bridge usage:

SELECT * FROM stats_mysql_global WHERE variable_name LIKE 'llm_%';

Available status variables:

  • llm_total_requests - Total number of LLM requests
  • llm_cache_hits - Number of cache hits
  • llm_cache_misses - Number of cache misses
  • llm_local_model_calls - Calls to local models
  • llm_cloud_model_calls - Calls to cloud APIs
  • llm_total_response_time_ms - Total response time
  • llm_cache_total_lookup_time_ms - Total cache lookup time
  • llm_cache_total_store_time_ms - Total cache store time

Performance

Operation Typical Latency
Local Ollama ~1-2 seconds
Cloud API ~2-5 seconds
Cache hit < 50ms

Tips for better performance:

  • Use local Ollama for faster responses
  • Enable caching for repeated prompts
  • Use genai-llm_timeout_ms to limit wait time
  • Consider pre-warming cache with common prompts

Migration from NL2SQL

If you were using the old /* NL2SQL: */ prefix:

  1. Update your queries from /* NL2SQL: */ to /* LLM: */
  2. Update configuration variables from genai-nl2sql_* to genai-llm_*
  3. Note that the response format has changed:
    • Removed: sql_query, confidence columns
    • Added: text_response, provider columns
  4. The ai_nl2sql_convert MCP tool is deprecated and will return an error

Old NL2SQL Usage:

/* NL2SQL: */ Show top 10 customers by revenue
-- Returns: sql_query, confidence, explanation, cached

New LLM Bridge Usage:

/* LLM: */ Show top 10 customers by revenue
-- Returns: text_response, explanation, cached, provider

For true NL2SQL functionality (schema-aware SQL generation with iteration), consider using external agents that can:

  1. Analyze your database schema
  2. Iterate on query refinement
  3. Validate generated queries
  4. Execute and review results

Security

Important Notes

  • LLM responses are NOT executed automatically
  • Text responses are returned for review
  • Always validate generated code before execution
  • Keep API keys secure (use environment variables)

Best Practices

  1. Review generated code: Always check output before running
  2. Use read-only accounts: Test with limited permissions first
  3. Keep API keys secure: Don't commit them to version control
  4. Use caching wisely: Balance speed vs. data freshness
  5. Monitor usage: Check status variables regularly

API Reference

For complete API documentation, see API.md.

Architecture

For system architecture details, see ARCHITECTURE.md.

Testing

For testing information, see TESTING.md.

License

This feature is part of ProxySQL and follows the same license.