mirror of https://github.com/sysown/proxysql
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
9.6 KiB
9.6 KiB
LLM Bridge Testing Guide
Test Suite Overview
| Test Type | Location | Purpose | LLM Required |
|---|---|---|---|
| Unit Tests | test/tap/tests/nl2sql_*.cpp |
Test individual components | Mocked |
| Validation Tests | test/tap/tests/ai_validation-t.cpp |
Test config validation | No |
| Integration | test/tap/tests/nl2sql_integration-t.cpp |
Test with real database | Mocked/Live |
| E2E | scripts/mcp/test_nl2sql_e2e.sh |
Complete workflow | Live |
| MCP Tools | scripts/mcp/test_nl2sql_tools.sh |
MCP protocol | Live |
Test Infrastructure
TAP Framework
ProxySQL uses the Test Anything Protocol (TAP) for C++ tests.
Key Functions:
plan(number_of_tests); // Declare how many tests
ok(condition, description); // Test with description
diag(message); // Print diagnostic message
skip(count, reason); // Skip tests
exit_status(); // Return proper exit code
Example:
#include "tap.h"
int main() {
plan(3);
ok(1 + 1 == 2, "Basic math works");
ok(true, "Always true");
diag("This is a diagnostic message");
return exit_status();
}
CommandLine Helper
Gets test connection parameters from environment:
CommandLine cl;
if (cl.getEnv()) {
diag("Failed to get environment");
return -1;
}
// cl.host, cl.admin_username, cl.admin_password, cl.admin_port
Running Tests
Unit Tests
cd test/tap
# Build specific test
make nl2sql_unit_base-t
# Run the test
./nl2sql_unit_base
# Build all NL2SQL tests
make nl2sql_*
Integration Tests
cd test/tap
make nl2sql_integration-t
./nl2sql_integration
E2E Tests
# With mocked LLM (faster)
./scripts/mcp/test_nl2sql_e2e.sh --mock
# With live LLM
./scripts/mcp/test_nl2sql_e2e.sh --live
All Tests
# Run all NL2SQL tests
make test_nl2sql
# Run with verbose output
PROXYSQL_VERBOSE=1 make test_nl2sql
Test Coverage
Unit Tests (nl2sql_unit_base-t.cpp)
- Initialization
- Basic conversion (mocked)
- Configuration management
- Variable persistence
- Error handling
Prompt Builder Tests (nl2sql_prompt_builder-t.cpp)
- Basic prompt construction
- Schema context inclusion
- System instruction formatting
- Edge cases (empty, special characters)
- Prompt structure validation
Model Selection Tests (nl2sql_model_selection-t.cpp)
- Latency-based selection
- Provider preference handling
- API key fallback logic
- Default selection
- Configuration integration
Validation Tests (ai_validation-t.cpp)
These are self-contained unit tests for configuration validation functions. They test the validation logic without requiring a running ProxySQL instance or LLM.
Test Categories:
- URL format validation (15 tests)
- Valid URLs (http://, https://)
- Invalid URLs (missing protocol, wrong protocol, missing host)
- Edge cases (NULL, empty, long URLs)
- API key format validation (14 tests)
- Valid keys (OpenAI, Anthropic, custom)
- Whitespace rejection (spaces, tabs, newlines)
- Length validation (minimums, provider-specific formats)
- Numeric range validation (13 tests)
- Boundary values (min, max, within range)
- Invalid values (out of range, empty, non-numeric)
- Variable-specific ranges (cache threshold, timeout, rate limit)
- Provider name validation (8 tests)
- Valid providers (openai, anthropic)
- Invalid providers (ollama, uppercase, unknown)
- Edge cases (NULL, empty, with spaces)
- Edge cases and boundary conditions (11 tests)
- NULL pointer handling
- Very long values
- URL special characters (query strings, ports, fragments)
- API key boundary lengths
Running Validation Tests:
cd test/tap/tests
make ai_validation-t
./ai_validation-t
Expected Output:
1..61
# 2026-01-16 18:47:09 === URL Format Validation Tests ===
ok 1 - URL 'http://localhost:11434/v1/chat/completions' is valid
...
ok 61 - Anthropic key at 25 character boundary accepted
Integration Tests (nl2sql_integration-t.cpp)
- Schema-aware conversion
- Multi-table queries
- Complex SQL patterns
- Error recovery
E2E Tests (test_nl2sql_e2e.sh)
- Simple SELECT
- WHERE conditions
- JOIN queries
- Aggregations
- Date handling
Writing New Tests
Test File Template
/**
* @file nl2sql_your_feature-t.cpp
* @brief TAP tests for your feature
*
* @date 2025-01-16
*/
#include <algorithm>
#include <string>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <vector>
#include "mysql.h"
#include "mysqld_error.h"
#include "tap.h"
#include "command_line.h"
#include "utils.h"
using std::string;
MYSQL* g_admin = NULL;
// ============================================================================
// Helper Functions
// ============================================================================
string get_variable(const char* name) {
// Implementation
}
bool set_variable(const char* name, const char* value) {
// Implementation
}
// ============================================================================
// Test: Your Test Category
// ============================================================================
void test_your_category() {
diag("=== Your Test Category ===");
// Test 1
ok(condition, "Test description");
// Test 2
ok(condition, "Another test");
}
// ============================================================================
// Main
// ============================================================================
int main(int argc, char** argv) {
CommandLine cl;
if (cl.getEnv()) {
diag("Error getting environment");
return exit_status();
}
g_admin = mysql_init(NULL);
if (!mysql_real_connect(g_admin, cl.host, cl.admin_username,
cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
diag("Failed to connect to admin");
return exit_status();
}
plan(number_of_tests);
test_your_category();
mysql_close(g_admin);
return exit_status();
}
Test Naming Conventions
- Files:
nl2sql_feature_name-t.cpp - Functions:
test_feature_category() - Descriptions: "Feature does something"
Test Organization
// Section dividers
// ============================================================================
// Section Name
// ============================================================================
// Test function with docstring
/**
* @test Test name
* @description What it tests
* @expected What should happen
*/
void test_something() {
diag("=== Test Category ===");
// Tests...
}
Best Practices
-
Use diag() for section headers:
diag("=== Configuration Tests ==="); -
Provide meaningful test descriptions:
ok(result == expected, "Variable set to 'value' reflects in runtime"); -
Clean up after tests:
// Restore original values set_variable("model", orig_value.c_str()); -
Handle both stub and real implementations:
ok(value == expected || value.empty(), "Value matches expected or is empty (stub)");
Mocking LLM Responses
For fast unit tests, mock LLM responses:
string mock_llm_response(const string& query) {
if (query.find("SELECT") != string::npos) {
return "SELECT * FROM table";
}
// Other patterns...
}
Debugging Tests
Enable Verbose Output
# Verbose TAP output
./nl2sql_unit_base -v
# ProxySQL debug output
PROXYSQL_VERBOSE=1 ./nl2sql_unit_base
GDB Debugging
gdb ./nl2sql_unit_base
(gdb) break main
(gdb) run
(gdb) backtrace
SQL Debugging
// Print generated SQL
diag("Generated SQL: %s", sql.c_str());
// Check MySQL errors
if (mytext_response(admin, query)) {
diag("MySQL error: %s", mysql_error(admin));
}
Continuous Integration
GitHub Actions (Planned)
name: NL2SQL Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Build ProxySQL
run: make
- name: Run NL2SQL Tests
run: make test_nl2sql
Test Data
Sample Schema
Tests use a standard test schema:
CREATE TABLE customers (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
country VARCHAR(50),
created_at DATE
);
CREATE TABLE orders (
id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT,
total DECIMAL(10,2),
status VARCHAR(20),
FOREIGN KEY (customer_id) REFERENCES customers(id)
);
Sample Queries
-- Simple
NL2SQL: Show all customers
-- With conditions
NL2SQL: Find customers from USA
-- JOIN
NL2SQL: Show orders with customer names
-- Aggregation
NL2SQL: Count customers by country
Performance Testing
Benchmark Script
#!/bin/bash
# benchmark_nl2sql.sh
for i in {1..100}; do
start=$(date +%s%N)
mysql -h 127.0.0.1 -P 6033 -e "NL2SQL: Show top customers"
end=$(date +%s%N)
echo $((end - start))
done | awk '{sum+=$1} END {print sum/NR " ns average"}'
Known Issues
- Stub Implementation: Many features return empty/placeholder values
- Live LLM Required: Some tests need Ollama running
- Timing Dependent: Cache tests may fail on slow systems
Contributing Tests
When contributing new tests:
- Follow the template above
- Add to Makefile if needed
- Update this documentation
- Ensure tests pass with
make test_nl2sql
See Also
- README.md - User documentation
- ARCHITECTURE.md - System architecture
- API.md - API reference