14 KiB

Raw Permalink Blame History

GenAI Module Documentation

Overview

The GenAI (Generative AI) Module in ProxySQL provides asynchronous, non-blocking access to embedding generation and document reranking services. It enables ProxySQL to interact with LLM services (like llama-server) for vector embeddings and semantic search operations without blocking MySQL threads.

Version

Module Version: 0.1.0
Last Updated: 2025-01-10
Branch: v3.1-vec_genAI_module

Architecture

Async Design

The GenAI module uses a non-blocking async architecture based on socketpair IPC and epoll event notification:

┌─────────────────┐         socketpair         ┌─────────────────┐
│  MySQL_Session  │◄────────────────────────────►│  GenAI Module   │
│  (MySQL Thread) │  fds[0]              fds[1]  │  Listener Loop  │
└────────┬────────┘                            └────────┬────────┘
         │                                               │
         │ epoll                                         │ queue
         │                                               │
         └── epoll_wait() ────────────────────────────────┘
                     (GenAI Response Ready)

Key Components

MySQL_Session - Client-facing interface that receives GENAI: queries
GenAI Listener Thread - Monitors socketpair fds via epoll for incoming requests
GenAI Worker Threads - Thread pool that processes requests (blocking HTTP calls)
Socketpair Communication - Bidirectional IPC between MySQL and GenAI modules

Communication Protocol

Request Format (MySQL → GenAI)

struct GenAI_RequestHeader {
    uint64_t request_id;      // Client's correlation ID
    uint32_t operation;       // GENAI_OP_EMBEDDING, GENAI_OP_RERANK, or GENAI_OP_JSON
    uint32_t query_len;       // Length of JSON query that follows
    uint32_t flags;           // Reserved (must be 0)
    uint32_t top_n;           // For rerank: max results (0 = all)
};
// Followed by: JSON query (query_len bytes)

Response Format (GenAI → MySQL)

struct GenAI_ResponseHeader {
    uint64_t request_id;        // Echo of client's request ID
    uint32_t status_code;       // 0 = success, >0 = error
    uint32_t result_len;        // Length of JSON result that follows
    uint32_t processing_time_ms;// Time taken by GenAI worker
    uint64_t result_ptr;        // Reserved (must be 0)
    uint32_t result_count;      // Number of results
    uint32_t reserved;          // Reserved (must be 0)
};
// Followed by: JSON result (result_len bytes)

Configuration Variables

Thread Configuration

Variable	Type	Default	Description
`genai-threads`	int	4	Number of GenAI worker threads (1-256)

Service Endpoints

Variable	Type	Default	Description
`genai-embedding_uri`	string	`http://127.0.0.1:8013/embedding`	Embedding service endpoint
`genai-rerank_uri`	string	`http://127.0.0.1:8012/rerank`	Reranking service endpoint

Timeouts

Variable	Type	Default	Description
`genai-embedding_timeout_ms`	int	30000	Embedding request timeout (100-300000ms)
`genai-rerank_timeout_ms`	int	30000	Reranking request timeout (100-300000ms)

Admin Commands

-- Load/Save GenAI variables
LOAD GENAI VARIABLES TO RUNTIME;
SAVE GENAI VARIABLES FROM RUNTIME;
LOAD GENAI VARIABLES FROM DISK;
SAVE GENAI VARIABLES TO DISK;

-- Set variables
SET genai-threads = 8;
SET genai-embedding_uri = 'http://localhost:8080/embed';
SET genai-rerank_uri = 'http://localhost:8081/rerank';

-- View variables
SELECT @@genai-threads;
SHOW VARIABLES LIKE 'genai-%';

-- Checksum
CHECKSUM GENAI VARIABLES;

Query Syntax

GENAI: Query Format

GenAI queries use the special GENAI: prefix followed by JSON:

GENAI: {"type": "embed", "documents": ["text1", "text2"]}
GENAI: {"type": "rerank", "query": "search text", "documents": ["doc1", "doc2"]}

Supported Operations

1. Embedding

Generate vector embeddings for documents:

GENAI: {
    "type": "embed",
    "documents": [
        "Machine learning is a subset of AI.",
        "Deep learning uses neural networks."
    ]
}

Response:

+------------------------------------------+
| embedding                                |
+------------------------------------------+
| 0.123, -0.456, 0.789, ...               |
| 0.234, -0.567, 0.890, ...               |
+------------------------------------------+

2. Reranking

Rerank documents by relevance to a query:

GENAI: {
    "type": "rerank",
    "query": "What is machine learning?",
    "documents": [
        "Machine learning is a subset of artificial intelligence.",
        "The capital of France is Paris.",
        "Deep learning uses neural networks."
    ],
    "top_n": 2,
    "columns": 3
}

Parameters:

query (required): Search query text
documents (required): Array of documents to rerank
top_n (optional): Maximum results to return (0 = all, default: all)
columns (optional): 2 = {index, score}, 3 = {index, score, document} (default: 3)

Response:

+-------+-------+----------------------------------------------+
| index | score | document                                    |
+-------+-------+----------------------------------------------+
| 0     | 0.95  | Machine learning is a subset of AI...        |
| 2     | 0.82  | Deep learning uses neural networks...        |
+-------+-------+----------------------------------------------+

Response Format

All GenAI queries return results in MySQL resultset format with:

columns: Array of column names
rows: Array of row data

Success:

{
    "columns": ["index", "score", "document"],
    "rows": [
        [0, 0.95, "Most relevant document"],
        [2, 0.82, "Second most relevant"]
    ]
}

Error:

{
    "error": "Error message describing what went wrong"
}

Usage Examples

Basic Embedding

-- Generate embedding for a single document
GENAI: {"type": "embed", "documents": ["Hello, world!"]};

-- Batch embedding for multiple documents
GENAI: {
    "type": "embed",
    "documents": ["doc1", "doc2", "doc3"]
};

Basic Reranking

-- Find most relevant documents
GENAI: {
    "type": "rerank",
    "query": "database optimization techniques",
    "documents": [
        "How to bake a cake",
        "Indexing strategies for MySQL",
        "Python programming basics",
        "Query optimization in ProxySQL"
    ]
};

Top N Results

-- Get only top 3 most relevant documents
GENAI: {
    "type": "rerank",
    "query": "best practices for SQL",
    "documents": ["doc1", "doc2", "doc3", "doc4", "doc5"],
    "top_n": 3
};

Index and Score Only

-- Get only relevance scores (no document text)
GENAI: {
    "type": "rerank",
    "query": "test query",
    "documents": ["doc1", "doc2"],
    "columns": 2
};

Integration with ProxySQL

Session Lifecycle

Session Start: MySQL session creates genai_epoll_fd_ for monitoring GenAI responses
Query Received: GENAI: query detected in handler___status_WAITING_CLIENT_DATA___STATE_SLEEP()
Async Send: Socketpair created, request sent, returns immediately
Main Loop: check_genai_events() called on each iteration
Response Ready: handle_genai_response() processes response
Result Sent: MySQL result packet sent to client
Cleanup: Socketpair closed, resources freed

Main Loop Integration

The GenAI event checking is integrated into the main MySQL handler loop:

handler_again:
    switch (status) {
        case WAITING_CLIENT_DATA:
            handler___status_WAITING_CLIENT_DATA();
#ifdef epoll_create1
            // Check for GenAI responses before processing new client data
            if (check_genai_events()) {
                goto handler_again;  // Process more responses
            }
#endif
            break;
    }

Backend Services

llama-server Integration

The GenAI module is designed to work with llama-server, a high-performance C++ inference server for LLaMA models.

Starting llama-server

# Start embedding server
./llama-server \
    --model /path/to/nomic-embed-text-v1.5.gguf \
    --port 8013 \
    --embedding \
    --ctx-size 512

# Start reranking server (using same model)
./llama-server \
    --model /path/to/nomic-embed-text-v1.5.gguf \
    --port 8012 \
    --ctx-size 512

API Compatibility

The GenAI module expects:

Embedding endpoint: POST /embedding with JSON request
Rerank endpoint: POST /rerank with JSON request

Compatible with:

llama-server
OpenAI-compatible embedding APIs
Custom services with matching request/response format

Testing

TAP Test Suite

Comprehensive TAP tests are available in test/tap/tests/genai_async-t.cpp:

cd test/tap/tests
make genai_async-t
./genai_async-t

Test Coverage:

Single async requests
Sequential requests (embedding and rerank)
Batch requests (10+ documents)
Mixed embedding and rerank
Request/response matching
Error handling (invalid JSON, missing fields)
Special characters (quotes, unicode, etc.)
Large documents (5KB+)
top_n and columns parameters
Concurrent connections

Manual Testing

-- Test embedding
mysql> GENAI: {"type": "embed", "documents": ["test document"]};

-- Test reranking
mysql> GENAI: {
    ->   "type": "rerank",
    ->   "query": "test query",
    ->   "documents": ["doc1", "doc2", "doc3"]
    -> };

Performance Characteristics

Non-Blocking Behavior

MySQL threads: Return immediately after sending request (~1ms)
GenAI workers: Handle blocking HTTP calls (10-100ms typical)
Throughput: Limited by GenAI service capacity and worker thread count

Resource Usage

Per request: 1 socketpair (2 file descriptors)
Memory: Request metadata + pending response storage
Worker threads: Configurable via genai-threads (default: 4)

Scalability

Concurrent requests: Limited by genai-threads and GenAI service capacity
Request queue: Unlimited (pending requests stored in session map)
Recommended: Set genai-threads to match expected concurrency

Error Handling

Common Errors

Error	Cause	Solution
`Failed to create GenAI communication channel`	Socketpair creation failed	Check system limits (ulimit -n)
`Failed to register with GenAI module`	GenAI module not initialized	Run `LOAD GENAI VARIABLES TO RUNTIME`
`Failed to send request to GenAI module`	Write error on socketpair	Check connection stability
`GenAI module not initialized`	GenAI threads not started	Set `genai-threads > 0` and reload

Timeout Handling

Requests exceeding genai-embedding_timeout_ms or genai-rerank_timeout_ms will fail with:

Status code > 0 in response header
Error message in JSON result
Socketpair cleanup

Monitoring

Status Variables

-- Check GenAI module status (not yet implemented, planned)
SHOW STATUS LIKE 'genai-%';

Planned status variables:

genai_threads_initialized: Number of worker threads running
genai_active_requests: Currently processing requests
genai_completed_requests: Total successful requests
genai_failed_requests: Total failed requests

Logging

GenAI operations log at debug level:

# Enable GenAI debug logging
SET mysql-debug = 1;

# Check logs
tail -f proxysql.log | grep GenAI

Limitations

Current Limitations

document_from_sql: Not yet implemented (requires MySQL connection handling in workers)
Shared memory: Result pointer field reserved for future optimization
Request size: Limited by socket buffer size (typically 64KB-256KB)

Platform Requirements

Epoll support: Linux systems (kernel 2.6+)
Socketpair: Unix domain sockets
Threading: POSIX threads (pthread)

Future Enhancements

Planned Features

document_from_sql: Execute SQL to retrieve documents for reranking
Shared memory: Zero-copy result transfer for large responses
Connection pooling: Reuse HTTP connections to GenAI services
Metrics: Enhanced monitoring and statistics
Batch optimization: Better support for large document batches
Streaming: Progressive result delivery for large operations

Posts Table Embeddings Setup - Using sqlite-rembed with GenAI
SQLite3 Server Documentation - SQLite3 backend integration
sqlite-rembed Integration - Embedding generation

Source Files

Core Implementation

include/GenAI_Thread.h - GenAI module interface and structures
lib/GenAI_Thread.cpp - Implementation of listener and worker loops
include/MySQL_Session.h - Session integration (GenAI async state)
lib/MySQL_Session.cpp - Async handlers and main loop integration
include/Base_Session.h - Base session GenAI members

Tests

test/tap/tests/genai_module-t.cpp - Admin commands and variables
test/tap/tests/genai_embedding_rerank-t.cpp - Basic embedding/reranking
test/tap/tests/genai_async-t.cpp - Async architecture tests

License

Same as ProxySQL - See LICENSE file for details.

Contributing

For contributions and issues:

GitHub: https://github.com/sysown/proxysql
Branch: v3.1-vec_genAI_module

Last Updated: 2025-01-10 Module Version: 0.1.0

14 KiB Raw Permalink Blame History