You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
proxysql/doc/NL2SQL/TESTING.md

412 lines
8.1 KiB

# NL2SQL Testing Guide
## Test Suite Overview
| Test Type | Location | Purpose | LLM Required |
|-----------|----------|---------|--------------|
| Unit Tests | `test/tap/tests/nl2sql_*.cpp` | Test individual components | Mocked |
| Integration | `test/tap/tests/nl2sql_integration-t.cpp` | Test with real database | Mocked/Live |
| E2E | `scripts/mcp/test_nl2sql_e2e.sh` | Complete workflow | Live |
| MCP Tools | `scripts/mcp/test_nl2sql_tools.sh` | MCP protocol | Live |
## Test Infrastructure
### TAP Framework
ProxySQL uses the Test Anything Protocol (TAP) for C++ tests.
**Key Functions:**
```cpp
plan(number_of_tests); // Declare how many tests
ok(condition, description); // Test with description
diag(message); // Print diagnostic message
skip(count, reason); // Skip tests
exit_status(); // Return proper exit code
```
**Example:**
```cpp
#include "tap.h"
int main() {
plan(3);
ok(1 + 1 == 2, "Basic math works");
ok(true, "Always true");
diag("This is a diagnostic message");
return exit_status();
}
```
### CommandLine Helper
Gets test connection parameters from environment:
```cpp
CommandLine cl;
if (cl.getEnv()) {
diag("Failed to get environment");
return -1;
}
// cl.host, cl.admin_username, cl.admin_password, cl.admin_port
```
## Running Tests
### Unit Tests
```bash
cd test/tap
# Build specific test
make nl2sql_unit_base-t
# Run the test
./nl2sql_unit_base
# Build all NL2SQL tests
make nl2sql_*
```
### Integration Tests
```bash
cd test/tap
make nl2sql_integration-t
./nl2sql_integration
```
### E2E Tests
```bash
# With mocked LLM (faster)
./scripts/mcp/test_nl2sql_e2e.sh --mock
# With live LLM
./scripts/mcp/test_nl2sql_e2e.sh --live
```
### All Tests
```bash
# Run all NL2SQL tests
make test_nl2sql
# Run with verbose output
PROXYSQL_VERBOSE=1 make test_nl2sql
```
## Test Coverage
### Unit Tests (`nl2sql_unit_base-t.cpp`)
- [x] Initialization
- [x] Basic conversion (mocked)
- [x] Configuration management
- [x] Variable persistence
- [x] Error handling
### Prompt Builder Tests (`nl2sql_prompt_builder-t.cpp`)
- [x] Basic prompt construction
- [x] Schema context inclusion
- [x] System instruction formatting
- [x] Edge cases (empty, special characters)
- [x] Prompt structure validation
### Model Selection Tests (`nl2sql_model_selection-t.cpp`)
- [x] Latency-based selection
- [x] Provider preference handling
- [x] API key fallback logic
- [x] Default selection
- [x] Configuration integration
### Integration Tests (`nl2sql_integration-t.cpp`)
- [ ] Schema-aware conversion
- [ ] Multi-table queries
- [ ] Complex SQL patterns
- [ ] Error recovery
### E2E Tests (`test_nl2sql_e2e.sh`)
- [x] Simple SELECT
- [x] WHERE conditions
- [x] JOIN queries
- [x] Aggregations
- [x] Date handling
## Writing New Tests
### Test File Template
```cpp
/**
* @file nl2sql_your_feature-t.cpp
* @brief TAP tests for your feature
*
* @date 2025-01-16
*/
#include <algorithm>
#include <string>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <vector>
#include "mysql.h"
#include "mysqld_error.h"
#include "tap.h"
#include "command_line.h"
#include "utils.h"
using std::string;
MYSQL* g_admin = NULL;
// ============================================================================
// Helper Functions
// ============================================================================
string get_variable(const char* name) {
// Implementation
}
bool set_variable(const char* name, const char* value) {
// Implementation
}
// ============================================================================
// Test: Your Test Category
// ============================================================================
void test_your_category() {
diag("=== Your Test Category ===");
// Test 1
ok(condition, "Test description");
// Test 2
ok(condition, "Another test");
}
// ============================================================================
// Main
// ============================================================================
int main(int argc, char** argv) {
CommandLine cl;
if (cl.getEnv()) {
diag("Error getting environment");
return exit_status();
}
g_admin = mysql_init(NULL);
if (!mysql_real_connect(g_admin, cl.host, cl.admin_username,
cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
diag("Failed to connect to admin");
return exit_status();
}
plan(number_of_tests);
test_your_category();
mysql_close(g_admin);
return exit_status();
}
```
### Test Naming Conventions
- **Files**: `nl2sql_feature_name-t.cpp`
- **Functions**: `test_feature_category()`
- **Descriptions**: "Feature does something"
### Test Organization
```cpp
// Section dividers
// ============================================================================
// Section Name
// ============================================================================
// Test function with docstring
/**
* @test Test name
* @description What it tests
* @expected What should happen
*/
void test_something() {
diag("=== Test Category ===");
// Tests...
}
```
### Best Practices
1. **Use diag() for section headers**:
```cpp
diag("=== Configuration Tests ===");
```
2. **Provide meaningful test descriptions**:
```cpp
ok(result == expected, "Variable set to 'value' reflects in runtime");
```
3. **Clean up after tests**:
```cpp
// Restore original values
set_variable("model", orig_value.c_str());
```
4. **Handle both stub and real implementations**:
```cpp
ok(value == expected || value.empty(),
"Value matches expected or is empty (stub)");
```
## Mocking LLM Responses
For fast unit tests, mock LLM responses:
```cpp
string mock_llm_response(const string& query) {
if (query.find("SELECT") != string::npos) {
return "SELECT * FROM table";
}
// Other patterns...
}
```
## Debugging Tests
### Enable Verbose Output
```bash
# Verbose TAP output
./nl2sql_unit_base -v
# ProxySQL debug output
PROXYSQL_VERBOSE=1 ./nl2sql_unit_base
```
### GDB Debugging
```bash
gdb ./nl2sql_unit_base
(gdb) break main
(gdb) run
(gdb) backtrace
```
### SQL Debugging
```cpp
// Print generated SQL
diag("Generated SQL: %s", sql.c_str());
// Check MySQL errors
if (mysql_query(admin, query)) {
diag("MySQL error: %s", mysql_error(admin));
}
```
## Continuous Integration
### GitHub Actions (Planned)
```yaml
name: NL2SQL Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Build ProxySQL
run: make
- name: Run NL2SQL Tests
run: make test_nl2sql
```
## Test Data
### Sample Schema
Tests use a standard test schema:
```sql
CREATE TABLE customers (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(100),
country VARCHAR(50),
created_at DATE
);
CREATE TABLE orders (
id INT PRIMARY KEY AUTO_INCREMENT,
customer_id INT,
total DECIMAL(10,2),
status VARCHAR(20),
FOREIGN KEY (customer_id) REFERENCES customers(id)
);
```
### Sample Queries
```sql
-- Simple
NL2SQL: Show all customers
-- With conditions
NL2SQL: Find customers from USA
-- JOIN
NL2SQL: Show orders with customer names
-- Aggregation
NL2SQL: Count customers by country
```
## Performance Testing
### Benchmark Script
```bash
#!/bin/bash
# benchmark_nl2sql.sh
for i in {1..100}; do
start=$(date +%s%N)
mysql -h 127.0.0.1 -P 6033 -e "NL2SQL: Show top customers"
end=$(date +%s%N)
echo $((end - start))
done | awk '{sum+=$1} END {print sum/NR " ns average"}'
```
## Known Issues
1. **Stub Implementation**: Many features return empty/placeholder values
2. **Live LLM Required**: Some tests need Ollama running
3. **Timing Dependent**: Cache tests may fail on slow systems
## Contributing Tests
When contributing new tests:
1. Follow the template above
2. Add to Makefile if needed
3. Update this documentation
4. Ensure tests pass with `make test_nl2sql`
## See Also
- [README.md](README.md) - User documentation
- [ARCHITECTURE.md](ARCHITECTURE.md) - System architecture
- [API.md](API.md) - API reference