mirror of https://github.com/sysown/proxysql
User Documentation: - README.md: Complete user guide with examples, configuration, troubleshooting Developer Documentation: - ARCHITECTURE.md: System architecture, components, flow diagrams - API.md: Complete API reference for all variables, structures, and methods - TESTING.md: Testing guide with templates and best practices All documentation follows "very very very" thorough standards with comprehensive examples, diagrams, and cross-references.pull/5310/head
parent
aee9c3117b
commit
e2d71ec4a2
@ -0,0 +1,438 @@
|
||||
# NL2SQL API Reference
|
||||
|
||||
## Complete API Documentation
|
||||
|
||||
This document provides a comprehensive reference for all NL2SQL APIs, including configuration variables, data structures, and methods.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
- [Configuration Variables](#configuration-variables)
|
||||
- [Data Structures](#data-structures)
|
||||
- [NL2SQL_Converter Class](#nl2sql_converter-class)
|
||||
- [AI_Features_Manager Class](#ai_features_manager-class)
|
||||
- [MySQL Protocol Integration](#mysql-protocol-integration)
|
||||
|
||||
## Configuration Variables
|
||||
|
||||
All NL2SQL variables use the `ai_nl2sql_` prefix and are accessible via the ProxySQL admin interface.
|
||||
|
||||
### Master Switch
|
||||
|
||||
#### `ai_nl2sql_enabled`
|
||||
|
||||
- **Type**: Boolean
|
||||
- **Default**: `true`
|
||||
- **Description**: Enable/disable NL2SQL feature
|
||||
- **Runtime**: Yes
|
||||
- **Example**:
|
||||
```sql
|
||||
SET ai_nl2sql_enabled='true';
|
||||
LOAD MYSQL VARIABLES TO RUNTIME;
|
||||
```
|
||||
|
||||
### Query Detection
|
||||
|
||||
#### `ai_nl2sql_query_prefix`
|
||||
|
||||
- **Type**: String
|
||||
- **Default**: `NL2SQL:`
|
||||
- **Description**: Prefix that identifies NL2SQL queries
|
||||
- **Runtime**: Yes
|
||||
- **Example**:
|
||||
```sql
|
||||
SET ai_nl2sql_query_prefix='SQL:';
|
||||
-- Now use: SQL: Show customers
|
||||
```
|
||||
|
||||
### Model Selection
|
||||
|
||||
#### `ai_nl2sql_model_provider`
|
||||
|
||||
- **Type**: Enum (`ollama`, `openai`, `anthropic`)
|
||||
- **Default**: `ollama`
|
||||
- **Description**: Preferred LLM provider
|
||||
- **Runtime**: Yes
|
||||
- **Example**:
|
||||
```sql
|
||||
SET ai_nl2sql_model_provider='openai';
|
||||
LOAD MYSQL VARIABLES TO RUNTIME;
|
||||
```
|
||||
|
||||
#### `ai_nl2sql_ollama_model`
|
||||
|
||||
- **Type**: String
|
||||
- **Default**: `llama3.2`
|
||||
- **Description**: Ollama model name
|
||||
- **Runtime**: Yes
|
||||
- **Example**:
|
||||
```sql
|
||||
SET ai_nl2sql_ollama_model='llama3.3';
|
||||
```
|
||||
|
||||
#### `ai_nl2sql_openai_model`
|
||||
|
||||
- **Type**: String
|
||||
- **Default**: `gpt-4o-mini`
|
||||
- **Description**: OpenAI model name
|
||||
- **Runtime**: Yes
|
||||
- **Example**:
|
||||
```sql
|
||||
SET ai_nl2sql_openai_model='gpt-4o';
|
||||
```
|
||||
|
||||
#### `ai_nl2sql_anthropic_model`
|
||||
|
||||
- **Type**: String
|
||||
- **Default**: `claude-3-haiku`
|
||||
- **Description**: Anthropic model name
|
||||
- **Runtime**: Yes
|
||||
- **Example**:
|
||||
```sql
|
||||
SET ai_nl2sql_anthropic_model='claude-3-5-sonnet-20241022';
|
||||
```
|
||||
|
||||
### API Keys
|
||||
|
||||
#### `ai_nl2sql_openai_key`
|
||||
|
||||
- **Type**: String (sensitive)
|
||||
- **Default**: NULL
|
||||
- **Description**: OpenAI API key
|
||||
- **Runtime**: Yes
|
||||
- **Example**:
|
||||
```sql
|
||||
SET ai_nl2sql_openai_key='sk-proj-...';
|
||||
```
|
||||
|
||||
#### `ai_nl2sql_anthropic_key`
|
||||
|
||||
- **Type**: String (sensitive)
|
||||
- **Default**: NULL
|
||||
- **Description**: Anthropic API key
|
||||
- **Runtime**: Yes
|
||||
- **Example**:
|
||||
```sql
|
||||
SET ai_nl2sql_anthropic_key='sk-ant-...';
|
||||
```
|
||||
|
||||
### Cache Configuration
|
||||
|
||||
#### `ai_nl2sql_cache_similarity_threshold`
|
||||
|
||||
- **Type**: Integer (0-100)
|
||||
- **Default**: `85`
|
||||
- **Description**: Minimum similarity score for cache hit
|
||||
- **Runtime**: Yes
|
||||
- **Example**:
|
||||
```sql
|
||||
SET ai_nl2sql_cache_similarity_threshold='90';
|
||||
```
|
||||
|
||||
### Performance
|
||||
|
||||
#### `ai_nl2sql_timeout_ms`
|
||||
|
||||
- **Type**: Integer
|
||||
- **Default**: `30000` (30 seconds)
|
||||
- **Description**: Maximum time to wait for LLM response
|
||||
- **Runtime**: Yes
|
||||
- **Example**:
|
||||
```sql
|
||||
SET ai_nl2sql_timeout_ms='60000';
|
||||
```
|
||||
|
||||
### Routing
|
||||
|
||||
#### `ai_nl2sql_prefer_local`
|
||||
|
||||
- **Type**: Boolean
|
||||
- **Default**: `true`
|
||||
- **Description**: Prefer local Ollama over cloud APIs
|
||||
- **Runtime**: Yes
|
||||
- **Example**:
|
||||
```sql
|
||||
SET ai_nl2sql_prefer_local='false';
|
||||
```
|
||||
|
||||
## Data Structures
|
||||
|
||||
### NL2SQLRequest
|
||||
|
||||
```cpp
|
||||
struct NL2SQLRequest {
|
||||
std::string natural_language; // Natural language query text
|
||||
std::string schema_name; // Current database/schema name
|
||||
int max_latency_ms; // Max acceptable latency (ms)
|
||||
bool allow_cache; // Enable semantic cache lookup
|
||||
std::vector<std::string> context_tables; // Optional table hints for schema
|
||||
|
||||
NL2SQLRequest() : max_latency_ms(0), allow_cache(true) {}
|
||||
};
|
||||
```
|
||||
|
||||
#### Fields
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `natural_language` | string | "" | The user's query in natural language |
|
||||
| `schema_name` | string | "" | Current database/schema name |
|
||||
| `max_latency_ms` | int | 0 | Max acceptable latency (0 = no constraint) |
|
||||
| `allow_cache` | bool | true | Whether to check semantic cache |
|
||||
| `context_tables` | vector<string> | {} | Optional table hints for schema context |
|
||||
|
||||
### NL2SQLResult
|
||||
|
||||
```cpp
|
||||
struct NL2SQLResult {
|
||||
std::string sql_query; // Generated SQL query
|
||||
float confidence; // Confidence score 0.0-1.0
|
||||
std::string explanation; // Which model generated this
|
||||
std::vector<std::string> tables_used; // Tables referenced in SQL
|
||||
bool cached; // True if from semantic cache
|
||||
int64_t cache_id; // Cache entry ID for tracking
|
||||
|
||||
NL2SQLResult() : confidence(0.0f), cached(false), cache_id(0) {}
|
||||
};
|
||||
```
|
||||
|
||||
#### Fields
|
||||
|
||||
| Field | Type | Default | Description |
|
||||
|-------|------|---------|-------------|
|
||||
| `sql_query` | string | "" | Generated SQL query |
|
||||
| `confidence` | float | 0.0 | Confidence score (0.0-1.0) |
|
||||
| `explanation` | string | "" | Model/provider info |
|
||||
| `tables_used` | vector<string> | {} | Tables referenced in SQL |
|
||||
| `cached` | bool | false | Whether result came from cache |
|
||||
| `cache_id` | int64 | 0 | Cache entry ID |
|
||||
|
||||
### ModelProvider Enum
|
||||
|
||||
```cpp
|
||||
enum class ModelProvider {
|
||||
LOCAL_OLLAMA, // Local models via Ollama
|
||||
CLOUD_OPENAI, // OpenAI API
|
||||
CLOUD_ANTHROPIC, // Anthropic API
|
||||
FALLBACK_ERROR // No model available
|
||||
};
|
||||
```
|
||||
|
||||
## NL2SQL_Converter Class
|
||||
|
||||
### Constructor
|
||||
|
||||
```cpp
|
||||
NL2SQL_Converter::NL2SQL_Converter();
|
||||
```
|
||||
|
||||
Initializes with default configuration values.
|
||||
|
||||
### Destructor
|
||||
|
||||
```cpp
|
||||
NL2SQL_Converter::~NL2SQL_Converter();
|
||||
```
|
||||
|
||||
Frees allocated resources.
|
||||
|
||||
### Methods
|
||||
|
||||
#### `init()`
|
||||
|
||||
```cpp
|
||||
int NL2SQL_Converter::init();
|
||||
```
|
||||
|
||||
Initialize the NL2SQL converter.
|
||||
|
||||
**Returns**: `0` on success, non-zero on failure
|
||||
|
||||
#### `close()`
|
||||
|
||||
```cpp
|
||||
void NL2SQL_Converter::close();
|
||||
```
|
||||
|
||||
Shutdown and cleanup resources.
|
||||
|
||||
#### `convert()`
|
||||
|
||||
```cpp
|
||||
NL2SQLResult NL2SQL_Converter::convert(const NL2SQLRequest& req);
|
||||
```
|
||||
|
||||
Convert natural language to SQL.
|
||||
|
||||
**Parameters**:
|
||||
- `req`: NL2SQL request with natural language query and context
|
||||
|
||||
**Returns**: NL2SQLResult with generated SQL and metadata
|
||||
|
||||
**Example**:
|
||||
```cpp
|
||||
NL2SQLRequest req;
|
||||
req.natural_language = "Show top 10 customers";
|
||||
req.allow_cache = true;
|
||||
NL2SQLResult result = converter->convert(req);
|
||||
if (result.confidence > 0.7f) {
|
||||
execute_sql(result.sql_query);
|
||||
}
|
||||
```
|
||||
|
||||
#### `clear_cache()`
|
||||
|
||||
```cpp
|
||||
void NL2SQL_Converter::clear_cache();
|
||||
```
|
||||
|
||||
Clear all cached NL2SQL conversions.
|
||||
|
||||
#### `get_cache_stats()`
|
||||
|
||||
```cpp
|
||||
std::string NL2SQL_Converter::get_cache_stats();
|
||||
```
|
||||
|
||||
Get cache statistics as JSON.
|
||||
|
||||
**Returns**: JSON string with cache metrics
|
||||
|
||||
**Example**:
|
||||
```json
|
||||
{
|
||||
"entries": 150,
|
||||
"hits": 1200,
|
||||
"misses": 300
|
||||
}
|
||||
```
|
||||
|
||||
## AI_Features_Manager Class
|
||||
|
||||
### Methods
|
||||
|
||||
#### `get_nl2sql()`
|
||||
|
||||
```cpp
|
||||
NL2SQL_Converter* AI_Features_Manager::get_nl2sql();
|
||||
```
|
||||
|
||||
Get the NL2SQL converter instance.
|
||||
|
||||
**Returns**: Pointer to NL2SQL_Converter or NULL
|
||||
|
||||
**Example**:
|
||||
```cpp
|
||||
NL2SQL_Converter* nl2sql = GloAI->get_nl2sql();
|
||||
if (nl2sql) {
|
||||
NL2SQLResult result = nl2sql->convert(req);
|
||||
}
|
||||
```
|
||||
|
||||
#### `get_variable()`
|
||||
|
||||
```cpp
|
||||
char* AI_Features_Manager::get_variable(const char* name);
|
||||
```
|
||||
|
||||
Get configuration variable value.
|
||||
|
||||
**Parameters**:
|
||||
- `name`: Variable name (without `ai_nl2sql_` prefix)
|
||||
|
||||
**Returns**: Variable value or NULL
|
||||
|
||||
**Example**:
|
||||
```cpp
|
||||
char* model = GloAI->get_variable("ollama_model");
|
||||
```
|
||||
|
||||
#### `set_variable()`
|
||||
|
||||
```cpp
|
||||
bool AI_Features_Manager::set_variable(const char* name, const char* value);
|
||||
```
|
||||
|
||||
Set configuration variable value.
|
||||
|
||||
**Parameters**:
|
||||
- `name`: Variable name (without `ai_nl2sql_` prefix)
|
||||
- `value`: New value
|
||||
|
||||
**Returns**: true on success, false on failure
|
||||
|
||||
**Example**:
|
||||
```cpp
|
||||
GloAI->set_variable("ollama_model", "llama3.3");
|
||||
```
|
||||
|
||||
## MySQL Protocol Integration
|
||||
|
||||
### Query Format
|
||||
|
||||
NL2SQL queries use a special prefix:
|
||||
|
||||
```sql
|
||||
NL2SQL: <natural language query>
|
||||
```
|
||||
|
||||
### Result Format
|
||||
|
||||
Results are returned as a standard MySQL resultset with columns:
|
||||
|
||||
| Column | Type | Description |
|
||||
|--------|------|-------------|
|
||||
| `sql_query` | TEXT | Generated SQL query |
|
||||
| `confidence` | FLOAT | Confidence score |
|
||||
| `explanation` | TEXT | Model info |
|
||||
| `cached` | BOOLEAN | From cache |
|
||||
| `cache_id` | BIGINT | Cache entry ID |
|
||||
|
||||
### Example Session
|
||||
|
||||
```sql
|
||||
mysql> USE my_database;
|
||||
mysql> NL2SQL: Show top 10 customers by revenue;
|
||||
+---------------------------------------------+------------+-------------------------+--------+----------+
|
||||
| sql_query | confidence | explanation | cached | cache_id |
|
||||
+---------------------------------------------+------------+-------------------------+--------+----------+
|
||||
| SELECT * FROM customers ORDER BY revenue | 0.850 | Generated by Ollama | 0 | 0 |
|
||||
| DESC LIMIT 10 | | llama3.2 | | |
|
||||
+---------------------------------------------+------------+-------------------------+--------+----------+
|
||||
1 row in set (1.23 sec)
|
||||
```
|
||||
|
||||
## Error Codes
|
||||
|
||||
| Code | Description | Action |
|
||||
|------|-------------|--------|
|
||||
| `ER_NL2SQL_DISABLED` | NL2SQL feature is disabled | Enable via `ai_nl2sql_enabled` |
|
||||
| `ER_NL2SQL_TIMEOUT` | LLM request timed out | Increase `ai_nl2sql_timeout_ms` |
|
||||
| `ER_NL2SQL_NO_MODEL` | No LLM model available | Configure API key or Ollama |
|
||||
| `ER_NL2SQL_API_ERROR` | LLM API returned error | Check logs and API key |
|
||||
| `ER_NL2SQL_INVALID_QUERY` | Query doesn't start with prefix | Use correct prefix format |
|
||||
|
||||
## Status Variables
|
||||
|
||||
Monitor NL2SQL performance via status variables:
|
||||
|
||||
```sql
|
||||
-- View all AI status variables
|
||||
SELECT * FROM runtime_mysql_servers
|
||||
WHERE variable_name LIKE 'ai_nl2sql_%';
|
||||
|
||||
-- Key metrics
|
||||
SELECT * FROM stats_ai_nl2sql;
|
||||
```
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `nl2sql_total_requests` | Total NL2SQL conversions |
|
||||
| `nl2sql_cache_hits` | Cache hit count |
|
||||
| `nl2sql_local_model_calls` | Ollama API calls |
|
||||
| `nl2sql_cloud_model_calls` | Cloud API calls |
|
||||
|
||||
## See Also
|
||||
|
||||
- [README.md](README.md) - User documentation
|
||||
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
|
||||
- [TESTING.md](TESTING.md) - Testing guide
|
||||
@ -0,0 +1,434 @@
|
||||
# NL2SQL Architecture
|
||||
|
||||
## System Overview
|
||||
|
||||
```
|
||||
Client Query (NL2SQL: ...)
|
||||
↓
|
||||
MySQL_Session (detects prefix)
|
||||
↓
|
||||
AI_Features_Manager::get_nl2sql()
|
||||
↓
|
||||
NL2SQL_Converter::convert()
|
||||
├─ check_vector_cache() ← sqlite-vec similarity search
|
||||
├─ build_prompt() ← Schema context via MySQL_Tool_Handler
|
||||
├─ select_model() ← Ollama/OpenAI/Anthropic selection
|
||||
├─ call_llm_api() ← libcurl HTTP request
|
||||
└─ validate_sql() ← Keyword validation
|
||||
↓
|
||||
Return Resultset (sql_query, confidence, ...)
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
### 1. NL2SQL_Converter
|
||||
|
||||
**Location**: `include/NL2SQL_Converter.h`, `lib/NL2SQL_Converter.cpp`
|
||||
|
||||
Main class coordinating the NL2SQL conversion pipeline.
|
||||
|
||||
**Key Methods:**
|
||||
- `convert()`: Main entry point for conversion
|
||||
- `check_vector_cache()`: Semantic similarity search
|
||||
- `build_prompt()`: Construct LLM prompt with schema context
|
||||
- `select_model()`: Choose best LLM provider
|
||||
- `call_ollama()`, `call_openai()`, `call_anthropic()`: LLM API calls
|
||||
|
||||
**Configuration:**
|
||||
```cpp
|
||||
struct {
|
||||
bool enabled;
|
||||
char* query_prefix; // Default: "NL2SQL:"
|
||||
char* model_provider; // Default: "ollama"
|
||||
char* ollama_model; // Default: "llama3.2"
|
||||
char* openai_model; // Default: "gpt-4o-mini"
|
||||
char* anthropic_model; // Default: "claude-3-haiku"
|
||||
int cache_similarity_threshold; // Default: 85
|
||||
int timeout_ms; // Default: 30000
|
||||
char* openai_key;
|
||||
char* anthropic_key;
|
||||
bool prefer_local;
|
||||
} config;
|
||||
```
|
||||
|
||||
### 2. LLM_Clients
|
||||
|
||||
**Location**: `lib/LLM_Clients.cpp`
|
||||
|
||||
HTTP clients for each LLM provider using libcurl.
|
||||
|
||||
#### Ollama (Local)
|
||||
|
||||
**Endpoint**: `POST http://localhost:11434/api/generate`
|
||||
|
||||
**Request Format:**
|
||||
```json
|
||||
{
|
||||
"model": "llama3.2",
|
||||
"prompt": "Convert to SQL: Show top customers",
|
||||
"stream": false,
|
||||
"options": {
|
||||
"temperature": 0.1,
|
||||
"num_predict": 500
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
**Response Format:**
|
||||
```json
|
||||
{
|
||||
"response": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
|
||||
"model": "llama3.2",
|
||||
"total_duration": 123456789
|
||||
}
|
||||
```
|
||||
|
||||
#### OpenAI (Cloud)
|
||||
|
||||
**Endpoint**: `POST https://api.openai.com/v1/chat/completions`
|
||||
|
||||
**Headers:**
|
||||
- `Content-Type: application/json`
|
||||
- `Authorization: Bearer sk-...`
|
||||
|
||||
**Request Format:**
|
||||
```json
|
||||
{
|
||||
"model": "gpt-4o-mini",
|
||||
"messages": [
|
||||
{"role": "system", "content": "You are a SQL expert..."},
|
||||
{"role": "user", "content": "Convert to SQL: Show top customers"}
|
||||
],
|
||||
"temperature": 0.1,
|
||||
"max_tokens": 500
|
||||
}
|
||||
```
|
||||
|
||||
**Response Format:**
|
||||
```json
|
||||
{
|
||||
"choices": [{
|
||||
"message": {
|
||||
"content": "SELECT * FROM customers ORDER BY revenue DESC LIMIT 10",
|
||||
"role": "assistant"
|
||||
},
|
||||
"finish_reason": "stop"
|
||||
}],
|
||||
"usage": {"total_tokens": 123}
|
||||
}
|
||||
```
|
||||
|
||||
#### Anthropic (Cloud)
|
||||
|
||||
**Endpoint**: `POST https://api.anthropic.com/v1/messages`
|
||||
|
||||
**Headers:**
|
||||
- `Content-Type: application/json`
|
||||
- `x-api-key: sk-ant-...`
|
||||
- `anthropic-version: 2023-06-01`
|
||||
|
||||
**Request Format:**
|
||||
```json
|
||||
{
|
||||
"model": "claude-3-haiku-20240307",
|
||||
"max_tokens": 500,
|
||||
"messages": [
|
||||
{"role": "user", "content": "Convert to SQL: Show top customers"}
|
||||
],
|
||||
"system": "You are a SQL expert...",
|
||||
"temperature": 0.1
|
||||
}
|
||||
```
|
||||
|
||||
**Response Format:**
|
||||
```json
|
||||
{
|
||||
"content": [{"type": "text", "text": "SELECT * FROM customers..."}],
|
||||
"model": "claude-3-haiku-20240307",
|
||||
"usage": {"input_tokens": 10, "output_tokens": 20}
|
||||
}
|
||||
```
|
||||
|
||||
### 3. Vector Cache
|
||||
|
||||
**Location**: Uses `SQLite3DB` with sqlite-vec extension
|
||||
|
||||
**Tables:**
|
||||
|
||||
```sql
|
||||
-- Cache entries
|
||||
CREATE TABLE nl2sql_cache (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
natural_language TEXT NOT NULL,
|
||||
sql_query TEXT NOT NULL,
|
||||
model_provider TEXT,
|
||||
confidence REAL,
|
||||
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
|
||||
);
|
||||
|
||||
-- Virtual table for similarity search
|
||||
CREATE VIRTUAL TABLE nl2sql_cache_vec USING vec0(
|
||||
embedding FLOAT[1536], -- Dimension depends on embedding model
|
||||
id INTEGER PRIMARY KEY
|
||||
);
|
||||
```
|
||||
|
||||
**Similarity Search:**
|
||||
```sql
|
||||
SELECT nc.sql_query, nc.confidence, distance
|
||||
FROM nl2sql_cache_vec
|
||||
JOIN nl2sql_cache nc ON nl2sql_cache_vec.id = nc.id
|
||||
WHERE embedding MATCH ?
|
||||
AND k = 10 -- Return top 10 matches
|
||||
ORDER BY distance
|
||||
LIMIT 1;
|
||||
```
|
||||
|
||||
### 4. MySQL_Session Integration
|
||||
|
||||
**Location**: `lib/MySQL_Session.cpp` (around line ~6867)
|
||||
|
||||
Query interception flow:
|
||||
|
||||
1. Detect `NL2SQL:` prefix in query
|
||||
2. Extract natural language text
|
||||
3. Call `GloAI->get_nl2sql()->convert()`
|
||||
4. Return generated SQL as resultset
|
||||
5. User can review and execute
|
||||
|
||||
### 5. AI_Features_Manager
|
||||
|
||||
**Location**: `include/AI_Features_Manager.h`, `lib/AI_Features_Manager.cpp`
|
||||
|
||||
Coordinates all AI features including NL2SQL.
|
||||
|
||||
**Responsibilities:**
|
||||
- Initialize vector database
|
||||
- Create and manage NL2SQL_Converter instance
|
||||
- Handle configuration variables with `ai_nl2sql_` prefix
|
||||
- Provide thread-safe access to components
|
||||
|
||||
## Flow Diagrams
|
||||
|
||||
### Conversion Flow
|
||||
|
||||
```
|
||||
┌─────────────────┐
|
||||
│ NL2SQL Request │
|
||||
└────────┬────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────────┐
|
||||
│ Check Vector Cache │
|
||||
│ - Generate embedding │
|
||||
│ - Similarity search │
|
||||
└────────┬────────────────┘
|
||||
│
|
||||
┌────┴────┐
|
||||
│ Cache │ No ───────────────┐
|
||||
│ Hit? │ │
|
||||
└────┬────┘ │
|
||||
│ Yes │
|
||||
▼ │
|
||||
Return Cached ▼
|
||||
┌──────────────────┐ ┌─────────────────┐
|
||||
│ Build Prompt │ │ Select Model │
|
||||
│ - System role │ │ - Latency │
|
||||
│ - Schema context │ │ - Preference │
|
||||
│ - User query │ │ - API keys │
|
||||
└────────┬─────────┘ └────────┬────────┘
|
||||
│ │
|
||||
└─────────┬───────────────┘
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Call LLM API │
|
||||
│ - libcurl HTTP │
|
||||
│ - JSON parse │
|
||||
└────────┬─────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Validate SQL │
|
||||
│ - Keyword check │
|
||||
│ - Clean output │
|
||||
└────────┬─────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Store in Cache │
|
||||
│ - Embed query │
|
||||
│ - Save result │
|
||||
└────────┬─────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────┐
|
||||
│ Return Result │
|
||||
│ - sql_query │
|
||||
│ - confidence │
|
||||
│ - explanation │
|
||||
└──────────────────┘
|
||||
```
|
||||
|
||||
### Model Selection Logic
|
||||
|
||||
```
|
||||
┌─────────────────────────────────┐
|
||||
│ Start: Select Model │
|
||||
└────────────┬────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌─────────────────────┐
|
||||
│ max_latency_ms < │──── Yes ────┐
|
||||
│ 500ms? │ │
|
||||
└────────┬────────────┘ │
|
||||
│ No │
|
||||
▼ │
|
||||
┌─────────────────────┐ │
|
||||
│ Check provider │ │
|
||||
│ preference │ │
|
||||
└────────┬────────────┘ │
|
||||
│ │
|
||||
┌──────┴──────┐ │
|
||||
│ │ │
|
||||
▼ ▼ │
|
||||
OpenAI Anthropic Ollama
|
||||
│ │ │
|
||||
▼ ▼ │
|
||||
┌─────────┐ ┌─────────┐ ┌─────────┐
|
||||
│ API key │ │ API key │ │ Return │
|
||||
│ set? │ │ set? │ │ OLLAMA │
|
||||
└────┬────┘ └────┬────┘ └─────────┘
|
||||
│ │
|
||||
Yes Yes
|
||||
│ │
|
||||
└──────┬─────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────┐
|
||||
│ Return cloud │
|
||||
│ provider │
|
||||
└──────────────┘
|
||||
```
|
||||
|
||||
## Data Structures
|
||||
|
||||
### NL2SQLRequest
|
||||
|
||||
```cpp
|
||||
struct NL2SQLRequest {
|
||||
std::string natural_language; // Input query
|
||||
std::string schema_name; // Current schema
|
||||
int max_latency_ms; // Latency requirement
|
||||
bool allow_cache; // Enable cache lookup
|
||||
std::vector<std::string> context_tables; // Optional table hints
|
||||
};
|
||||
```
|
||||
|
||||
### NL2SQLResult
|
||||
|
||||
```cpp
|
||||
struct NL2SQLResult {
|
||||
std::string sql_query; // Generated SQL
|
||||
float confidence; // 0.0-1.0 score
|
||||
std::string explanation; // Model info
|
||||
std::vector<std::string> tables_used; // Referenced tables
|
||||
bool cached; // From cache
|
||||
int64_t cache_id; // Cache entry ID
|
||||
};
|
||||
```
|
||||
|
||||
## Configuration Management
|
||||
|
||||
### Variable Namespacing
|
||||
|
||||
All NL2SQL variables use `ai_nl2sql_` prefix:
|
||||
|
||||
```
|
||||
ai_nl2sql_enabled
|
||||
ai_nl2sql_query_prefix
|
||||
ai_nl2sql_model_provider
|
||||
ai_nl2sql_ollama_model
|
||||
ai_nl2sql_openai_model
|
||||
ai_nl2sql_anthropic_model
|
||||
ai_nl2sql_cache_similarity_threshold
|
||||
ai_nl2sql_timeout_ms
|
||||
ai_nl2sql_openai_key
|
||||
ai_nl2sql_anthropic_key
|
||||
ai_nl2sql_prefer_local
|
||||
```
|
||||
|
||||
### Variable Persistence
|
||||
|
||||
```
|
||||
Runtime (memory)
|
||||
↑
|
||||
| LOAD MYSQL VARIABLES TO RUNTIME
|
||||
|
|
||||
| SET ai_nl2sql_... = 'value'
|
||||
|
|
||||
| SAVE MYSQL VARIABLES TO DISK
|
||||
↓
|
||||
Disk (config file)
|
||||
```
|
||||
|
||||
## Thread Safety
|
||||
|
||||
- **NL2SQL_Converter**: NOT thread-safe by itself
|
||||
- **AI_Features_Manager**: Provides thread-safe access via `wrlock()`/`wrunlock()`
|
||||
- **Vector Cache**: Thread-safe via SQLite mutex
|
||||
|
||||
## Error Handling
|
||||
|
||||
### Error Categories
|
||||
|
||||
1. **LLM API Errors**: Timeout, connection failure, auth failure
|
||||
- Fallback: Try next available provider
|
||||
- Return: Empty SQL with error in explanation
|
||||
|
||||
2. **SQL Validation Failures**: Doesn't look like SQL
|
||||
- Return: SQL with warning comment
|
||||
- Confidence: Low (0.3)
|
||||
|
||||
3. **Cache Errors**: Database failures
|
||||
- Fallback: Continue without cache
|
||||
- Log: Warning in ProxySQL log
|
||||
|
||||
### Logging
|
||||
|
||||
All NL2SQL operations log to `proxysql.log`:
|
||||
|
||||
```
|
||||
NL2SQL: Converting query: Show top customers
|
||||
NL2SQL: Selecting local Ollama due to latency constraint
|
||||
NL2SQL: Calling Ollama with model: llama3.2
|
||||
NL2SQL: Conversion complete. Confidence: 0.85
|
||||
```
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Optimization Strategies
|
||||
|
||||
1. **Caching**: Enable for repeated queries
|
||||
2. **Local First**: Prefer Ollama for lower latency
|
||||
3. **Timeout**: Set appropriate `ai_nl2sql_timeout_ms`
|
||||
4. **Batch Requests**: Not yet implemented (planned)
|
||||
|
||||
### Resource Usage
|
||||
|
||||
- **Memory**: Vector cache grows with usage
|
||||
- **Network**: HTTP requests for each cache miss
|
||||
- **CPU**: Embedding generation for cache entries
|
||||
|
||||
## Future Enhancements
|
||||
|
||||
- **Phase 3**: Full vector cache implementation
|
||||
- **Phase 3**: Schema context retrieval via MySQL_Tool_Handler
|
||||
- **Phase 4**: Async conversion API
|
||||
- **Phase 5**: Batch query conversion
|
||||
- **Phase 6**: Custom fine-tuned models
|
||||
|
||||
## See Also
|
||||
|
||||
- [README.md](README.md) - User documentation
|
||||
- [API.md](API.md) - Complete API reference
|
||||
- [TESTING.md](TESTING.md) - Testing guide
|
||||
@ -0,0 +1,220 @@
|
||||
# NL2SQL - Natural Language to SQL for ProxySQL
|
||||
|
||||
## Overview
|
||||
|
||||
NL2SQL (Natural Language to SQL) is a ProxySQL feature that converts natural language questions into SQL queries using Large Language Models (LLMs).
|
||||
|
||||
## Features
|
||||
|
||||
- **Hybrid Deployment**: Local Ollama + Cloud APIs (OpenAI, Anthropic)
|
||||
- **Semantic Caching**: Vector-based cache for similar queries using sqlite-vec
|
||||
- **Schema Awareness**: Understands your database schema for better conversions
|
||||
- **Multi-Provider**: Switch between LLM providers seamlessly
|
||||
- **Security**: Generated SQL is returned for review before execution
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Enable NL2SQL
|
||||
|
||||
```sql
|
||||
-- Via admin interface
|
||||
SET ai_nl2sql_enabled='true';
|
||||
LOAD MYSQL VARIABLES TO RUNTIME;
|
||||
```
|
||||
|
||||
### 2. Configure LLM Provider
|
||||
|
||||
**Using local Ollama (default):**
|
||||
|
||||
```sql
|
||||
SET ai_nl2sql_model_provider='ollama';
|
||||
SET ai_nl2sql_ollama_model='llama3.2';
|
||||
LOAD MYSQL VARIABLES TO RUNTIME;
|
||||
```
|
||||
|
||||
**Using OpenAI:**
|
||||
|
||||
```sql
|
||||
SET ai_nl2sql_model_provider='openai';
|
||||
SET ai_nl2sql_openai_model='gpt-4o-mini';
|
||||
SET ai_nl2sql_openai_key='sk-...';
|
||||
LOAD MYSQL VARIABLES TO RUNTIME;
|
||||
```
|
||||
|
||||
**Using Anthropic:**
|
||||
|
||||
```sql
|
||||
SET ai_nl2sql_model_provider='anthropic';
|
||||
SET ai_nl2sql_anthropic_model='claude-3-haiku';
|
||||
SET ai_nl2sql_anthropic_key='sk-ant-...';
|
||||
LOAD MYSQL VARIABLES TO RUNTIME;
|
||||
```
|
||||
|
||||
### 3. Use NL2SQL
|
||||
|
||||
```sql
|
||||
-- In your SQL client, prefix your query with "NL2SQL:"
|
||||
mysql> SELECT * FROM runtime_mysql_servers WHERE variable_name='ai_nl2sql_enabled';
|
||||
|
||||
-- Query converted to SQL
|
||||
mysql> NL2SQL: Show top 10 customers by revenue;
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
### Variables
|
||||
|
||||
| Variable | Default | Description |
|
||||
|----------|---------|-------------|
|
||||
| `ai_nl2sql_enabled` | true | Enable/disable NL2SQL |
|
||||
| `ai_nl2sql_query_prefix` | NL2SQL: | Prefix for NL2SQL queries |
|
||||
| `ai_nl2sql_model_provider` | ollama | LLM provider (ollama/openai/anthropic) |
|
||||
| `ai_nl2sql_ollama_model` | llama3.2 | Ollama model name |
|
||||
| `ai_nl2sql_openai_model` | gpt-4o-mini | OpenAI model name |
|
||||
| `ai_nl2sql_anthropic_model` | claude-3-haiku | Anthropic model name |
|
||||
| `ai_nl2sql_cache_similarity_threshold` | 85 | Semantic similarity threshold (0-100) |
|
||||
| `ai_nl2sql_timeout_ms` | 30000 | LLM request timeout in milliseconds |
|
||||
| `ai_nl2sql_prefer_local` | true | Prefer local models when possible |
|
||||
|
||||
### Model Selection
|
||||
|
||||
The system automatically selects the best model based on:
|
||||
|
||||
1. **Latency requirements**: Local Ollama for fast queries (< 500ms)
|
||||
2. **API key availability**: Falls back to Ollama if keys missing
|
||||
3. **User preference**: Respects `ai_nl2sql_model_provider` setting
|
||||
|
||||
## Examples
|
||||
|
||||
### Basic Queries
|
||||
|
||||
```
|
||||
NL2SQL: Show all users
|
||||
NL2SQL: Find orders with amount > 100
|
||||
NL2SQL: Count customers by country
|
||||
```
|
||||
|
||||
### Complex Queries
|
||||
|
||||
```
|
||||
NL2SQL: Show top 5 customers by total order amount
|
||||
NL2SQL: Find customers who placed orders in the last 30 days
|
||||
NL2SQL: What is the average order value per month?
|
||||
```
|
||||
|
||||
### Schema-Aware Queries
|
||||
|
||||
```
|
||||
-- Switch to your schema first
|
||||
USE my_database;
|
||||
NL2SQL: List all products in the Electronics category
|
||||
NL2SQL: Find orders that contain specific products
|
||||
```
|
||||
|
||||
### Results
|
||||
|
||||
NL2SQL returns a resultset with:
|
||||
- `sql_query`: Generated SQL
|
||||
- `confidence`: 0.0-1.0 score
|
||||
- `explanation`: Which model was used
|
||||
- `cached`: Whether from semantic cache
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### NL2SQL returns empty result
|
||||
|
||||
1. Check AI module is initialized:
|
||||
```sql
|
||||
SELECT * FROM runtime_mysql_servers WHERE variable_name LIKE 'ai_%';
|
||||
```
|
||||
|
||||
2. Verify LLM is accessible:
|
||||
```bash
|
||||
# For Ollama
|
||||
curl http://localhost:11434/api/tags
|
||||
|
||||
# For cloud APIs, check your API keys
|
||||
```
|
||||
|
||||
3. Check logs:
|
||||
```bash
|
||||
tail -f proxysql.log | grep NL2SQL
|
||||
```
|
||||
|
||||
### Poor quality SQL
|
||||
|
||||
1. **Try a different model:**
|
||||
```sql
|
||||
SET ai_nl2sql_ollama_model='llama3.3';
|
||||
```
|
||||
|
||||
2. **Increase timeout for complex queries:**
|
||||
```sql
|
||||
SET ai_nl2sql_timeout_ms=60000;
|
||||
```
|
||||
|
||||
3. **Check confidence score:**
|
||||
- High confidence (> 0.7): Generally reliable
|
||||
- Medium confidence (0.4-0.7): Review before using
|
||||
- Low confidence (< 0.4): May need manual correction
|
||||
|
||||
### Cache Issues
|
||||
|
||||
```sql
|
||||
-- Clear cache (Phase 3 feature)
|
||||
-- TODO: Add cache clearing command
|
||||
|
||||
-- Check cache stats
|
||||
SELECT * FROM stats_ai_nl2sql_cache;
|
||||
```
|
||||
|
||||
## Performance
|
||||
|
||||
| Operation | Typical Latency |
|
||||
|-----------|-----------------|
|
||||
| Local Ollama | ~1-2 seconds |
|
||||
| Cloud API | ~2-5 seconds |
|
||||
| Cache hit | < 50ms |
|
||||
|
||||
**Tips for better performance:**
|
||||
- Use local Ollama for faster responses
|
||||
- Enable caching for repeated queries
|
||||
- Use `ai_nl2sql_timeout_ms` to limit wait time
|
||||
- Consider pre-warming cache with common queries
|
||||
|
||||
## Security
|
||||
|
||||
### Important Notes
|
||||
|
||||
- NL2SQL queries are **NOT executed automatically**
|
||||
- Generated SQL is returned for **review first**
|
||||
- Always validate generated SQL before execution
|
||||
- Keep API keys secure (use environment variables)
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Review generated SQL**: Always check the output before running
|
||||
2. **Use read-only accounts**: Test with limited permissions first
|
||||
3. **Monitor confidence scores**: Low confidence may indicate errors
|
||||
4. **Keep API keys secure**: Don't commit them to version control
|
||||
5. **Use caching wisely**: Balance speed vs. data freshness
|
||||
|
||||
## API Reference
|
||||
|
||||
For complete API documentation, see [API.md](API.md).
|
||||
|
||||
## Architecture
|
||||
|
||||
For system architecture details, see [ARCHITECTURE.md](ARCHITECTURE.md).
|
||||
|
||||
## Testing
|
||||
|
||||
For testing information, see [TESTING.md](TESTING.md).
|
||||
|
||||
## Version History
|
||||
|
||||
- **0.1.0** (2025-01-16): Initial release with Ollama, OpenAI, Anthropic support
|
||||
|
||||
## License
|
||||
|
||||
This feature is part of ProxySQL and follows the same license.
|
||||
@ -0,0 +1,411 @@
|
||||
# NL2SQL Testing Guide
|
||||
|
||||
## Test Suite Overview
|
||||
|
||||
| Test Type | Location | Purpose | LLM Required |
|
||||
|-----------|----------|---------|--------------|
|
||||
| Unit Tests | `test/tap/tests/nl2sql_*.cpp` | Test individual components | Mocked |
|
||||
| Integration | `test/tap/tests/nl2sql_integration-t.cpp` | Test with real database | Mocked/Live |
|
||||
| E2E | `scripts/mcp/test_nl2sql_e2e.sh` | Complete workflow | Live |
|
||||
| MCP Tools | `scripts/mcp/test_nl2sql_tools.sh` | MCP protocol | Live |
|
||||
|
||||
## Test Infrastructure
|
||||
|
||||
### TAP Framework
|
||||
|
||||
ProxySQL uses the Test Anything Protocol (TAP) for C++ tests.
|
||||
|
||||
**Key Functions:**
|
||||
```cpp
|
||||
plan(number_of_tests); // Declare how many tests
|
||||
ok(condition, description); // Test with description
|
||||
diag(message); // Print diagnostic message
|
||||
skip(count, reason); // Skip tests
|
||||
exit_status(); // Return proper exit code
|
||||
```
|
||||
|
||||
**Example:**
|
||||
```cpp
|
||||
#include "tap.h"
|
||||
|
||||
int main() {
|
||||
plan(3);
|
||||
ok(1 + 1 == 2, "Basic math works");
|
||||
ok(true, "Always true");
|
||||
diag("This is a diagnostic message");
|
||||
return exit_status();
|
||||
}
|
||||
```
|
||||
|
||||
### CommandLine Helper
|
||||
|
||||
Gets test connection parameters from environment:
|
||||
|
||||
```cpp
|
||||
CommandLine cl;
|
||||
if (cl.getEnv()) {
|
||||
diag("Failed to get environment");
|
||||
return -1;
|
||||
}
|
||||
|
||||
// cl.host, cl.admin_username, cl.admin_password, cl.admin_port
|
||||
```
|
||||
|
||||
## Running Tests
|
||||
|
||||
### Unit Tests
|
||||
|
||||
```bash
|
||||
cd test/tap
|
||||
|
||||
# Build specific test
|
||||
make nl2sql_unit_base-t
|
||||
|
||||
# Run the test
|
||||
./nl2sql_unit_base
|
||||
|
||||
# Build all NL2SQL tests
|
||||
make nl2sql_*
|
||||
```
|
||||
|
||||
### Integration Tests
|
||||
|
||||
```bash
|
||||
cd test/tap
|
||||
make nl2sql_integration-t
|
||||
./nl2sql_integration
|
||||
```
|
||||
|
||||
### E2E Tests
|
||||
|
||||
```bash
|
||||
# With mocked LLM (faster)
|
||||
./scripts/mcp/test_nl2sql_e2e.sh --mock
|
||||
|
||||
# With live LLM
|
||||
./scripts/mcp/test_nl2sql_e2e.sh --live
|
||||
```
|
||||
|
||||
### All Tests
|
||||
|
||||
```bash
|
||||
# Run all NL2SQL tests
|
||||
make test_nl2sql
|
||||
|
||||
# Run with verbose output
|
||||
PROXYSQL_VERBOSE=1 make test_nl2sql
|
||||
```
|
||||
|
||||
## Test Coverage
|
||||
|
||||
### Unit Tests (`nl2sql_unit_base-t.cpp`)
|
||||
|
||||
- [x] Initialization
|
||||
- [x] Basic conversion (mocked)
|
||||
- [x] Configuration management
|
||||
- [x] Variable persistence
|
||||
- [x] Error handling
|
||||
|
||||
### Prompt Builder Tests (`nl2sql_prompt_builder-t.cpp`)
|
||||
|
||||
- [x] Basic prompt construction
|
||||
- [x] Schema context inclusion
|
||||
- [x] System instruction formatting
|
||||
- [x] Edge cases (empty, special characters)
|
||||
- [x] Prompt structure validation
|
||||
|
||||
### Model Selection Tests (`nl2sql_model_selection-t.cpp`)
|
||||
|
||||
- [x] Latency-based selection
|
||||
- [x] Provider preference handling
|
||||
- [x] API key fallback logic
|
||||
- [x] Default selection
|
||||
- [x] Configuration integration
|
||||
|
||||
### Integration Tests (`nl2sql_integration-t.cpp`)
|
||||
|
||||
- [ ] Schema-aware conversion
|
||||
- [ ] Multi-table queries
|
||||
- [ ] Complex SQL patterns
|
||||
- [ ] Error recovery
|
||||
|
||||
### E2E Tests (`test_nl2sql_e2e.sh`)
|
||||
|
||||
- [x] Simple SELECT
|
||||
- [x] WHERE conditions
|
||||
- [x] JOIN queries
|
||||
- [x] Aggregations
|
||||
- [x] Date handling
|
||||
|
||||
## Writing New Tests
|
||||
|
||||
### Test File Template
|
||||
|
||||
```cpp
|
||||
/**
|
||||
* @file nl2sql_your_feature-t.cpp
|
||||
* @brief TAP tests for your feature
|
||||
*
|
||||
* @date 2025-01-16
|
||||
*/
|
||||
|
||||
#include <algorithm>
|
||||
#include <string>
|
||||
#include <string.h>
|
||||
#include <stdio.h>
|
||||
#include <unistd.h>
|
||||
#include <vector>
|
||||
|
||||
#include "mysql.h"
|
||||
#include "mysqld_error.h"
|
||||
|
||||
#include "tap.h"
|
||||
#include "command_line.h"
|
||||
#include "utils.h"
|
||||
|
||||
using std::string;
|
||||
|
||||
MYSQL* g_admin = NULL;
|
||||
|
||||
// ============================================================================
|
||||
// Helper Functions
|
||||
// ============================================================================
|
||||
|
||||
string get_variable(const char* name) {
|
||||
// Implementation
|
||||
}
|
||||
|
||||
bool set_variable(const char* name, const char* value) {
|
||||
// Implementation
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Test: Your Test Category
|
||||
// ============================================================================
|
||||
|
||||
void test_your_category() {
|
||||
diag("=== Your Test Category ===");
|
||||
|
||||
// Test 1
|
||||
ok(condition, "Test description");
|
||||
|
||||
// Test 2
|
||||
ok(condition, "Another test");
|
||||
}
|
||||
|
||||
// ============================================================================
|
||||
// Main
|
||||
// ============================================================================
|
||||
|
||||
int main(int argc, char** argv) {
|
||||
CommandLine cl;
|
||||
if (cl.getEnv()) {
|
||||
diag("Error getting environment");
|
||||
return exit_status();
|
||||
}
|
||||
|
||||
g_admin = mysql_init(NULL);
|
||||
if (!mysql_real_connect(g_admin, cl.host, cl.admin_username,
|
||||
cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
|
||||
diag("Failed to connect to admin");
|
||||
return exit_status();
|
||||
}
|
||||
|
||||
plan(number_of_tests);
|
||||
|
||||
test_your_category();
|
||||
|
||||
mysql_close(g_admin);
|
||||
return exit_status();
|
||||
}
|
||||
```
|
||||
|
||||
### Test Naming Conventions
|
||||
|
||||
- **Files**: `nl2sql_feature_name-t.cpp`
|
||||
- **Functions**: `test_feature_category()`
|
||||
- **Descriptions**: "Feature does something"
|
||||
|
||||
### Test Organization
|
||||
|
||||
```cpp
|
||||
// Section dividers
|
||||
// ============================================================================
|
||||
// Section Name
|
||||
// ============================================================================
|
||||
|
||||
// Test function with docstring
|
||||
/**
|
||||
* @test Test name
|
||||
* @description What it tests
|
||||
* @expected What should happen
|
||||
*/
|
||||
void test_something() {
|
||||
diag("=== Test Category ===");
|
||||
// Tests...
|
||||
}
|
||||
```
|
||||
|
||||
### Best Practices
|
||||
|
||||
1. **Use diag() for section headers**:
|
||||
```cpp
|
||||
diag("=== Configuration Tests ===");
|
||||
```
|
||||
|
||||
2. **Provide meaningful test descriptions**:
|
||||
```cpp
|
||||
ok(result == expected, "Variable set to 'value' reflects in runtime");
|
||||
```
|
||||
|
||||
3. **Clean up after tests**:
|
||||
```cpp
|
||||
// Restore original values
|
||||
set_variable("model", orig_value.c_str());
|
||||
```
|
||||
|
||||
4. **Handle both stub and real implementations**:
|
||||
```cpp
|
||||
ok(value == expected || value.empty(),
|
||||
"Value matches expected or is empty (stub)");
|
||||
```
|
||||
|
||||
## Mocking LLM Responses
|
||||
|
||||
For fast unit tests, mock LLM responses:
|
||||
|
||||
```cpp
|
||||
string mock_llm_response(const string& query) {
|
||||
if (query.find("SELECT") != string::npos) {
|
||||
return "SELECT * FROM table";
|
||||
}
|
||||
// Other patterns...
|
||||
}
|
||||
```
|
||||
|
||||
## Debugging Tests
|
||||
|
||||
### Enable Verbose Output
|
||||
|
||||
```bash
|
||||
# Verbose TAP output
|
||||
./nl2sql_unit_base -v
|
||||
|
||||
# ProxySQL debug output
|
||||
PROXYSQL_VERBOSE=1 ./nl2sql_unit_base
|
||||
```
|
||||
|
||||
### GDB Debugging
|
||||
|
||||
```bash
|
||||
gdb ./nl2sql_unit_base
|
||||
(gdb) break main
|
||||
(gdb) run
|
||||
(gdb) backtrace
|
||||
```
|
||||
|
||||
### SQL Debugging
|
||||
|
||||
```cpp
|
||||
// Print generated SQL
|
||||
diag("Generated SQL: %s", sql.c_str());
|
||||
|
||||
// Check MySQL errors
|
||||
if (mysql_query(admin, query)) {
|
||||
diag("MySQL error: %s", mysql_error(admin));
|
||||
}
|
||||
```
|
||||
|
||||
## Continuous Integration
|
||||
|
||||
### GitHub Actions (Planned)
|
||||
|
||||
```yaml
|
||||
name: NL2SQL Tests
|
||||
on: [push, pull_request]
|
||||
jobs:
|
||||
test:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- uses: actions/checkout@v2
|
||||
- name: Build ProxySQL
|
||||
run: make
|
||||
- name: Run NL2SQL Tests
|
||||
run: make test_nl2sql
|
||||
```
|
||||
|
||||
## Test Data
|
||||
|
||||
### Sample Schema
|
||||
|
||||
Tests use a standard test schema:
|
||||
|
||||
```sql
|
||||
CREATE TABLE customers (
|
||||
id INT PRIMARY KEY AUTO_INCREMENT,
|
||||
name VARCHAR(100),
|
||||
country VARCHAR(50),
|
||||
created_at DATE
|
||||
);
|
||||
|
||||
CREATE TABLE orders (
|
||||
id INT PRIMARY KEY AUTO_INCREMENT,
|
||||
customer_id INT,
|
||||
total DECIMAL(10,2),
|
||||
status VARCHAR(20),
|
||||
FOREIGN KEY (customer_id) REFERENCES customers(id)
|
||||
);
|
||||
```
|
||||
|
||||
### Sample Queries
|
||||
|
||||
```sql
|
||||
-- Simple
|
||||
NL2SQL: Show all customers
|
||||
|
||||
-- With conditions
|
||||
NL2SQL: Find customers from USA
|
||||
|
||||
-- JOIN
|
||||
NL2SQL: Show orders with customer names
|
||||
|
||||
-- Aggregation
|
||||
NL2SQL: Count customers by country
|
||||
```
|
||||
|
||||
## Performance Testing
|
||||
|
||||
### Benchmark Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# benchmark_nl2sql.sh
|
||||
|
||||
for i in {1..100}; do
|
||||
start=$(date +%s%N)
|
||||
mysql -h 127.0.0.1 -P 6033 -e "NL2SQL: Show top customers"
|
||||
end=$(date +%s%N)
|
||||
echo $((end - start))
|
||||
done | awk '{sum+=$1} END {print sum/NR " ns average"}'
|
||||
```
|
||||
|
||||
## Known Issues
|
||||
|
||||
1. **Stub Implementation**: Many features return empty/placeholder values
|
||||
2. **Live LLM Required**: Some tests need Ollama running
|
||||
3. **Timing Dependent**: Cache tests may fail on slow systems
|
||||
|
||||
## Contributing Tests
|
||||
|
||||
When contributing new tests:
|
||||
|
||||
1. Follow the template above
|
||||
2. Add to Makefile if needed
|
||||
3. Update this documentation
|
||||
4. Ensure tests pass with `make test_nl2sql`
|
||||
|
||||
## See Also
|
||||
|
||||
- [README.md](README.md) - User documentation
|
||||
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
|
||||
- [API.md](API.md) - API reference
|
||||
Loading…
Reference in new issue