9.6 KiB

Raw Blame History

LLM Bridge Testing Guide

Test Suite Overview

Test Type	Location	Purpose	LLM Required
Unit Tests	`test/tap/tests/nl2sql_*.cpp`	Test individual components	Mocked
Validation Tests	`test/tap/tests/ai_validation-t.cpp`	Test config validation	No
Integration	`test/tap/tests/nl2sql_integration-t.cpp`	Test with real database	Mocked/Live
E2E	`scripts/mcp/test_nl2sql_e2e.sh`	Complete workflow	Live
MCP Tools	`scripts/mcp/test_nl2sql_tools.sh`	MCP protocol	Live

Test Infrastructure

TAP Framework

ProxySQL uses the Test Anything Protocol (TAP) for C++ tests.

Key Functions:

plan(number_of_tests);     // Declare how many tests
ok(condition, description); // Test with description
diag(message);              // Print diagnostic message
skip(count, reason);        // Skip tests
exit_status();              // Return proper exit code

Example:

#include "tap.h"

int main() {
    plan(3);
    ok(1 + 1 == 2, "Basic math works");
    ok(true, "Always true");
    diag("This is a diagnostic message");
    return exit_status();
}

CommandLine Helper

Gets test connection parameters from environment:

CommandLine cl;
if (cl.getEnv()) {
    diag("Failed to get environment");
    return -1;
}

// cl.host, cl.admin_username, cl.admin_password, cl.admin_port

Running Tests

Unit Tests

cd test/tap

# Build specific test
make nl2sql_unit_base-t

# Run the test
./nl2sql_unit_base

# Build all NL2SQL tests
make nl2sql_*

Integration Tests

cd test/tap
make nl2sql_integration-t
./nl2sql_integration

E2E Tests

# With mocked LLM (faster)
./scripts/mcp/test_nl2sql_e2e.sh --mock

# With live LLM
./scripts/mcp/test_nl2sql_e2e.sh --live

All Tests

# Run all NL2SQL tests
make test_nl2sql

# Run with verbose output
PROXYSQL_VERBOSE=1 make test_nl2sql

Test Coverage

Unit Tests (`nl2sql_unit_base-t.cpp`)

Initialization
Basic conversion (mocked)
Configuration management
Variable persistence
Error handling

Prompt Builder Tests (`nl2sql_prompt_builder-t.cpp`)

Basic prompt construction
Schema context inclusion
System instruction formatting
Edge cases (empty, special characters)
Prompt structure validation

Model Selection Tests (`nl2sql_model_selection-t.cpp`)

Latency-based selection
Provider preference handling
API key fallback logic
Default selection
Configuration integration

Validation Tests (`ai_validation-t.cpp`)

These are self-contained unit tests for configuration validation functions. They test the validation logic without requiring a running ProxySQL instance or LLM.

Test Categories:

URL format validation (15 tests)
- Valid URLs (http://, https://)
- Invalid URLs (missing protocol, wrong protocol, missing host)
- Edge cases (NULL, empty, long URLs)
API key format validation (14 tests)
- Valid keys (OpenAI, Anthropic, custom)
- Whitespace rejection (spaces, tabs, newlines)
- Length validation (minimums, provider-specific formats)
Numeric range validation (13 tests)
- Boundary values (min, max, within range)
- Invalid values (out of range, empty, non-numeric)
- Variable-specific ranges (cache threshold, timeout, rate limit)
Provider name validation (8 tests)
- Valid providers (openai, anthropic)
- Invalid providers (ollama, uppercase, unknown)
- Edge cases (NULL, empty, with spaces)
Edge cases and boundary conditions (11 tests)
- NULL pointer handling
- Very long values
- URL special characters (query strings, ports, fragments)
- API key boundary lengths

Running Validation Tests:

cd test/tap/tests
make ai_validation-t
./ai_validation-t

Expected Output:

1..61
# 2026-01-16 18:47:09  === URL Format Validation Tests ===
ok 1 - URL 'http://localhost:11434/v1/chat/completions' is valid
...
ok 61 - Anthropic key at 25 character boundary accepted

Integration Tests (`nl2sql_integration-t.cpp`)

Schema-aware conversion
Multi-table queries
Complex SQL patterns
Error recovery

E2E Tests (`test_nl2sql_e2e.sh`)

Simple SELECT
WHERE conditions
JOIN queries
Aggregations
Date handling

Writing New Tests

Test File Template

/**
 * @file nl2sql_your_feature-t.cpp
 * @brief TAP tests for your feature
 *
 * @date 2025-01-16
 */

#include <algorithm>
#include <string>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <vector>

#include "mysql.h"
#include "mysqld_error.h"

#include "tap.h"
#include "command_line.h"
#include "utils.h"

using std::string;

MYSQL* g_admin = NULL;

// ============================================================================
// Helper Functions
// ============================================================================

string get_variable(const char* name) {
    // Implementation
}

bool set_variable(const char* name, const char* value) {
    // Implementation
}

// ============================================================================
// Test: Your Test Category
// ============================================================================

void test_your_category() {
    diag("=== Your Test Category ===");

    // Test 1
    ok(condition, "Test description");

    // Test 2
    ok(condition, "Another test");
}

// ============================================================================
// Main
// ============================================================================

int main(int argc, char** argv) {
    CommandLine cl;
    if (cl.getEnv()) {
        diag("Error getting environment");
        return exit_status();
    }

    g_admin = mysql_init(NULL);
    if (!mysql_real_connect(g_admin, cl.host, cl.admin_username,
                            cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
        diag("Failed to connect to admin");
        return exit_status();
    }

    plan(number_of_tests);

    test_your_category();

    mysql_close(g_admin);
    return exit_status();
}

Test Naming Conventions

Files: nl2sql_feature_name-t.cpp
Functions: test_feature_category()
Descriptions: "Feature does something"

Test Organization

// Section dividers
// ============================================================================
// Section Name
// ============================================================================

// Test function with docstring
/**
 * @test Test name
 * @description What it tests
 * @expected What should happen
 */
void test_something() {
    diag("=== Test Category ===");
    // Tests...
}

Best Practices

Use diag() for section headers:
```
diag("=== Configuration Tests ===");
```

Provide meaningful test descriptions:

ok(result == expected, "Variable set to 'value' reflects in runtime");

Clean up after tests:

// Restore original values
set_variable("model", orig_value.c_str());

Handle both stub and real implementations:

ok(value == expected || value.empty(),
   "Value matches expected or is empty (stub)");

Mocking LLM Responses

For fast unit tests, mock LLM responses:

string mock_llm_response(const string& query) {
    if (query.find("SELECT") != string::npos) {
        return "SELECT * FROM table";
    }
    // Other patterns...
}

Debugging Tests

Enable Verbose Output

# Verbose TAP output
./nl2sql_unit_base -v

# ProxySQL debug output
PROXYSQL_VERBOSE=1 ./nl2sql_unit_base

GDB Debugging

gdb ./nl2sql_unit_base
(gdb) break main
(gdb) run
(gdb) backtrace

SQL Debugging

// Print generated SQL
diag("Generated SQL: %s", sql.c_str());

// Check MySQL errors
if (mytext_response(admin, query)) {
    diag("MySQL error: %s", mysql_error(admin));
}

Continuous Integration

GitHub Actions (Planned)

name: NL2SQL Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Build ProxySQL
        run: make
      - name: Run NL2SQL Tests
        run: make test_nl2sql

Test Data

Sample Schema

Tests use a standard test schema:

CREATE TABLE customers (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100),
    country VARCHAR(50),
    created_at DATE
);

CREATE TABLE orders (
    id INT PRIMARY KEY AUTO_INCREMENT,
    customer_id INT,
    total DECIMAL(10,2),
    status VARCHAR(20),
    FOREIGN KEY (customer_id) REFERENCES customers(id)
);

Sample Queries

-- Simple
NL2SQL: Show all customers

-- With conditions
NL2SQL: Find customers from USA

-- JOIN
NL2SQL: Show orders with customer names

-- Aggregation
NL2SQL: Count customers by country

Performance Testing

Benchmark Script

#!/bin/bash
# benchmark_nl2sql.sh

for i in {1..100}; do
    start=$(date +%s%N)
    mysql -h 127.0.0.1 -P 6033 -e "NL2SQL: Show top customers"
    end=$(date +%s%N)
    echo $((end - start))
done | awk '{sum+=$1} END {print sum/NR " ns average"}'

Known Issues

Stub Implementation: Many features return empty/placeholder values
Live LLM Required: Some tests need Ollama running
Timing Dependent: Cache tests may fail on slow systems

Contributing Tests

When contributing new tests:

Follow the template above
Add to Makefile if needed
Update this documentation
Ensure tests pass with make test_nl2sql

9.6 KiB Raw Blame History