You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
proxysql/doc/NL2SQL/TESTING.md

8.1 KiB

NL2SQL Testing Guide

Test Suite Overview

Test Type Location Purpose LLM Required
Unit Tests test/tap/tests/nl2sql_*.cpp Test individual components Mocked
Integration test/tap/tests/nl2sql_integration-t.cpp Test with real database Mocked/Live
E2E scripts/mcp/test_nl2sql_e2e.sh Complete workflow Live
MCP Tools scripts/mcp/test_nl2sql_tools.sh MCP protocol Live

Test Infrastructure

TAP Framework

ProxySQL uses the Test Anything Protocol (TAP) for C++ tests.

Key Functions:

plan(number_of_tests);     // Declare how many tests
ok(condition, description); // Test with description
diag(message);              // Print diagnostic message
skip(count, reason);        // Skip tests
exit_status();              // Return proper exit code

Example:

#include "tap.h"

int main() {
    plan(3);
    ok(1 + 1 == 2, "Basic math works");
    ok(true, "Always true");
    diag("This is a diagnostic message");
    return exit_status();
}

CommandLine Helper

Gets test connection parameters from environment:

CommandLine cl;
if (cl.getEnv()) {
    diag("Failed to get environment");
    return -1;
}

// cl.host, cl.admin_username, cl.admin_password, cl.admin_port

Running Tests

Unit Tests

cd test/tap

# Build specific test
make nl2sql_unit_base-t

# Run the test
./nl2sql_unit_base

# Build all NL2SQL tests
make nl2sql_*

Integration Tests

cd test/tap
make nl2sql_integration-t
./nl2sql_integration

E2E Tests

# With mocked LLM (faster)
./scripts/mcp/test_nl2sql_e2e.sh --mock

# With live LLM
./scripts/mcp/test_nl2sql_e2e.sh --live

All Tests

# Run all NL2SQL tests
make test_nl2sql

# Run with verbose output
PROXYSQL_VERBOSE=1 make test_nl2sql

Test Coverage

Unit Tests (nl2sql_unit_base-t.cpp)

  • Initialization
  • Basic conversion (mocked)
  • Configuration management
  • Variable persistence
  • Error handling

Prompt Builder Tests (nl2sql_prompt_builder-t.cpp)

  • Basic prompt construction
  • Schema context inclusion
  • System instruction formatting
  • Edge cases (empty, special characters)
  • Prompt structure validation

Model Selection Tests (nl2sql_model_selection-t.cpp)

  • Latency-based selection
  • Provider preference handling
  • API key fallback logic
  • Default selection
  • Configuration integration

Integration Tests (nl2sql_integration-t.cpp)

  • Schema-aware conversion
  • Multi-table queries
  • Complex SQL patterns
  • Error recovery

E2E Tests (test_nl2sql_e2e.sh)

  • Simple SELECT
  • WHERE conditions
  • JOIN queries
  • Aggregations
  • Date handling

Writing New Tests

Test File Template

/**
 * @file nl2sql_your_feature-t.cpp
 * @brief TAP tests for your feature
 *
 * @date 2025-01-16
 */

#include <algorithm>
#include <string>
#include <string.h>
#include <stdio.h>
#include <unistd.h>
#include <vector>

#include "mysql.h"
#include "mysqld_error.h"

#include "tap.h"
#include "command_line.h"
#include "utils.h"

using std::string;

MYSQL* g_admin = NULL;

// ============================================================================
// Helper Functions
// ============================================================================

string get_variable(const char* name) {
    // Implementation
}

bool set_variable(const char* name, const char* value) {
    // Implementation
}

// ============================================================================
// Test: Your Test Category
// ============================================================================

void test_your_category() {
    diag("=== Your Test Category ===");

    // Test 1
    ok(condition, "Test description");

    // Test 2
    ok(condition, "Another test");
}

// ============================================================================
// Main
// ============================================================================

int main(int argc, char** argv) {
    CommandLine cl;
    if (cl.getEnv()) {
        diag("Error getting environment");
        return exit_status();
    }

    g_admin = mysql_init(NULL);
    if (!mysql_real_connect(g_admin, cl.host, cl.admin_username,
                            cl.admin_password, NULL, cl.admin_port, NULL, 0)) {
        diag("Failed to connect to admin");
        return exit_status();
    }

    plan(number_of_tests);

    test_your_category();

    mysql_close(g_admin);
    return exit_status();
}

Test Naming Conventions

  • Files: nl2sql_feature_name-t.cpp
  • Functions: test_feature_category()
  • Descriptions: "Feature does something"

Test Organization

// Section dividers
// ============================================================================
// Section Name
// ============================================================================

// Test function with docstring
/**
 * @test Test name
 * @description What it tests
 * @expected What should happen
 */
void test_something() {
    diag("=== Test Category ===");
    // Tests...
}

Best Practices

  1. Use diag() for section headers:

    diag("=== Configuration Tests ===");
    
  2. Provide meaningful test descriptions:

    ok(result == expected, "Variable set to 'value' reflects in runtime");
    
  3. Clean up after tests:

    // Restore original values
    set_variable("model", orig_value.c_str());
    
  4. Handle both stub and real implementations:

    ok(value == expected || value.empty(),
       "Value matches expected or is empty (stub)");
    

Mocking LLM Responses

For fast unit tests, mock LLM responses:

string mock_llm_response(const string& query) {
    if (query.find("SELECT") != string::npos) {
        return "SELECT * FROM table";
    }
    // Other patterns...
}

Debugging Tests

Enable Verbose Output

# Verbose TAP output
./nl2sql_unit_base -v

# ProxySQL debug output
PROXYSQL_VERBOSE=1 ./nl2sql_unit_base

GDB Debugging

gdb ./nl2sql_unit_base
(gdb) break main
(gdb) run
(gdb) backtrace

SQL Debugging

// Print generated SQL
diag("Generated SQL: %s", sql.c_str());

// Check MySQL errors
if (mysql_query(admin, query)) {
    diag("MySQL error: %s", mysql_error(admin));
}

Continuous Integration

GitHub Actions (Planned)

name: NL2SQL Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      - name: Build ProxySQL
        run: make
      - name: Run NL2SQL Tests
        run: make test_nl2sql

Test Data

Sample Schema

Tests use a standard test schema:

CREATE TABLE customers (
    id INT PRIMARY KEY AUTO_INCREMENT,
    name VARCHAR(100),
    country VARCHAR(50),
    created_at DATE
);

CREATE TABLE orders (
    id INT PRIMARY KEY AUTO_INCREMENT,
    customer_id INT,
    total DECIMAL(10,2),
    status VARCHAR(20),
    FOREIGN KEY (customer_id) REFERENCES customers(id)
);

Sample Queries

-- Simple
NL2SQL: Show all customers

-- With conditions
NL2SQL: Find customers from USA

-- JOIN
NL2SQL: Show orders with customer names

-- Aggregation
NL2SQL: Count customers by country

Performance Testing

Benchmark Script

#!/bin/bash
# benchmark_nl2sql.sh

for i in {1..100}; do
    start=$(date +%s%N)
    mysql -h 127.0.0.1 -P 6033 -e "NL2SQL: Show top customers"
    end=$(date +%s%N)
    echo $((end - start))
done | awk '{sum+=$1} END {print sum/NR " ns average"}'

Known Issues

  1. Stub Implementation: Many features return empty/placeholder values
  2. Live LLM Required: Some tests need Ollama running
  3. Timing Dependent: Cache tests may fail on slow systems

Contributing Tests

When contributing new tests:

  1. Follow the template above
  2. Add to Makefile if needed
  3. Update this documentation
  4. Ensure tests pass with make test_nl2sql

See Also