You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
proxysql/doc/SQLITE-REMBED-TEST-README.md

7.0 KiB

sqlite-rembed Integration Test Suite

Overview

This test suite comprehensively validates the integration of sqlite-rembed (Rust SQLite extension for text embedding generation) into ProxySQL. The tests verify the complete AI pipeline from client registration to embedding generation and vector similarity search.

Prerequisites

System Requirements

  • ProxySQL compiled with sqlite-rembed and sqlite-vec extensions
  • MySQL client (mysql command line tool)
  • Bash shell environment
  • Network access to embedding API endpoint (or local Ollama/OpenAI API)

ProxySQL Configuration

Ensure ProxySQL is running with SQLite3 server enabled:

cd /home/rene/proxysql-vec/src
./proxysql --sqlite3-server

Test Configuration

The test script uses default connection parameters:

  • Host: 127.0.0.1
  • Port: 6030 (default SQLite3 server port)
  • User: root
  • Password: root

Modify these in the script if your configuration differs.

Test Suite Structure

The test suite is organized into 9 phases, each testing specific components:

Phase 1: Basic Connectivity and Function Verification

  • ProxySQL connection
  • Database listing
  • sqlite-vec function availability
  • sqlite-rembed function registration
  • temp.rembed_clients virtual table existence

Phase 2: Client Configuration

  • Create embedding API client with rembed_client_options()
  • Verify client registration in temp.rembed_clients
  • Test rembed_client_options function

Phase 3: Embedding Generation Tests

  • Generate embeddings for short and long text
  • Verify embedding data type (BLOB) and size (768 dimensions × 4 bytes)
  • Error handling for non-existent clients

Phase 4: Table Creation and Data Storage

  • Create regular table for document storage
  • Create virtual vector table using vec0
  • Insert test documents with diverse content

Phase 5: Embedding Generation and Storage

  • Generate embeddings for all documents
  • Store embeddings in vector table
  • Verify embedding count matches document count
  • Check embedding storage format

Phase 6: Similarity Search Tests

  • Exact self-match (document with itself, distance = 0.0)
  • Similarity search with query text
  • Verify result ordering by ascending distance

Phase 7: Edge Cases and Error Handling

  • Empty text input
  • Very long text input
  • SQL injection attempt safety

Phase 8: Performance and Concurrency

  • Sequential embedding generation timing
  • Basic performance validation (< 10 seconds for 3 embeddings)

Phase 9: Cleanup and Final Verification

  • Clean up test tables
  • Verify no test artifacts remain

Usage

Running the Full Test Suite

cd /home/rene/proxysql-vec/doc
./sqlite-rembed-test.sh

Expected Output

The script provides color-coded output:

  • 🟢 Green: Test passed
  • 🔴 Red: Test failed
  • 🔵 Blue: Information and headers
  • 🟡 Yellow: Test being executed

Exit Codes

  • 0: All tests passed
  • 1: One or more tests failed
  • 2: Connection issues or missing dependencies

Configuration

Modifying Connection Parameters

Edit the following variables in sqlite-rembed-test.sh:

PROXYSQL_HOST="127.0.0.1"
PROXYSQL_PORT="6030"
MYSQL_USER="root"
MYSQL_PASS="root"

API Configuration

The test uses a synthetic OpenAI endpoint by default. Set API_KEY environment variable or modify the variable below to use your own API:

API_CLIENT_NAME="test-client-$(date +%s)"
API_FORMAT="openai"
API_URL="https://api.synthetic.new/openai/v1/embeddings"
API_KEY="${API_KEY:-YOUR_API_KEY}"  # Uses environment variable or placeholder
API_MODEL="hf:nomic-ai/nomic-embed-text-v1.5"
VECTOR_DIMENSIONS=768

For other providers (Ollama, Cohere, Nomic), adjust the format and URL accordingly.

Test Data

Sample Documents

The test creates 4 sample documents:

  1. Machine Learning - "Machine learning algorithms improve with more training data..."
  2. Database Systems - "Database management systems efficiently store, retrieve..."
  3. Artificial Intelligence - "AI enables computers to perform tasks typically..."
  4. Vector Databases - "Vector databases enable similarity search for embeddings..."

Query Texts

Test searches use:

  • Self-match: Document 1 with itself
  • Query: "data science and algorithms"

Troubleshooting

Common Issues

1. Connection Failed

Error: Cannot connect to ProxySQL at 127.0.0.1:6030

Solution: Ensure ProxySQL is running with --sqlite3-server flag.

2. Missing Functions

ERROR 1045 (28000): no such function: rembed

Solution: Verify sqlite-rembed was compiled and linked into ProxySQL binary.

3. API Errors

Error from embedding API

Solution: Check network connectivity and API credentials.

4. Vector Table Errors

ERROR 1045 (28000): A LIMIT or 'k = ?' constraint is required on vec0 knn queries.

Solution: All sqlite-vec similarity queries require LIMIT clause.

Debug Mode

For detailed debugging, run with trace:

bash -x ./sqlite-rembed-test.sh

Integration with CI/CD

The test script can be integrated into CI/CD pipelines:

# Example GitHub Actions workflow
name: sqlite-rembed Tests
on: [push, pull_request]
jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Build ProxySQL with sqlite-rembed
        run: |
          cd deps && make cleanpart && make sqlite3
          cd ../lib && make
          cd ../src && make          
      - name: Start ProxySQL
        run: |
          cd src && ./proxysql --sqlite3-server &
          sleep 5          
      - name: Run Integration Tests
        run: |
                    cd doc && ./sqlite-rembed-test.sh

Extending the Test Suite

Adding New Tests

  1. Add new test function following existing pattern
  2. Update phase header and test count
  3. Add to appropriate phase section

Testing Different Providers

Modify the API configuration block to test:

  • Ollama: Use format='ollama' and local URL
  • Cohere: Use format='cohere' and appropriate model
  • Nomic: Use format='nomic' and Nomic API endpoint

Performance Testing

Extend Phase 8 for:

  • Concurrent embedding generation
  • Batch processing tests
  • Memory usage monitoring

Results Interpretation

Success Criteria

  • All connectivity tests pass
  • Embeddings generated with correct dimensions
  • Vector search returns ordered results
  • No test artifacts remain after cleanup

Performance Benchmarks

  • Embedding generation: < 3 seconds per request (network-dependent)
  • Similarity search: < 100ms for small datasets
  • Memory: Stable during sequential operations

References

License

This test suite is part of the ProxySQL project and follows the same licensing terms.


Last Updated: $(date) Test Suite Version: 1.0