14 KiB
GenAI Module Documentation
Overview
The GenAI (Generative AI) Module in ProxySQL provides asynchronous, non-blocking access to embedding generation and document reranking services. It enables ProxySQL to interact with LLM services (like llama-server) for vector embeddings and semantic search operations without blocking MySQL threads.
Version
- Module Version: 0.1.0
- Last Updated: 2025-01-10
- Branch: v3.1-vec_genAI_module
Architecture
Async Design
The GenAI module uses a non-blocking async architecture based on socketpair IPC and epoll event notification:
┌─────────────────┐ socketpair ┌─────────────────┐
│ MySQL_Session │◄────────────────────────────►│ GenAI Module │
│ (MySQL Thread) │ fds[0] fds[1] │ Listener Loop │
└────────┬────────┘ └────────┬────────┘
│ │
│ epoll │ queue
│ │
└── epoll_wait() ────────────────────────────────┘
(GenAI Response Ready)
Key Components
- MySQL_Session - Client-facing interface that receives GENAI: queries
- GenAI Listener Thread - Monitors socketpair fds via epoll for incoming requests
- GenAI Worker Threads - Thread pool that processes requests (blocking HTTP calls)
- Socketpair Communication - Bidirectional IPC between MySQL and GenAI modules
Communication Protocol
Request Format (MySQL → GenAI)
struct GenAI_RequestHeader {
uint64_t request_id; // Client's correlation ID
uint32_t operation; // GENAI_OP_EMBEDDING, GENAI_OP_RERANK, or GENAI_OP_JSON
uint32_t query_len; // Length of JSON query that follows
uint32_t flags; // Reserved (must be 0)
uint32_t top_n; // For rerank: max results (0 = all)
};
// Followed by: JSON query (query_len bytes)
Response Format (GenAI → MySQL)
struct GenAI_ResponseHeader {
uint64_t request_id; // Echo of client's request ID
uint32_t status_code; // 0 = success, >0 = error
uint32_t result_len; // Length of JSON result that follows
uint32_t processing_time_ms;// Time taken by GenAI worker
uint64_t result_ptr; // Reserved (must be 0)
uint32_t result_count; // Number of results
uint32_t reserved; // Reserved (must be 0)
};
// Followed by: JSON result (result_len bytes)
Configuration Variables
Thread Configuration
| Variable | Type | Default | Description |
|---|---|---|---|
genai-threads |
int | 4 | Number of GenAI worker threads (1-256) |
Service Endpoints
| Variable | Type | Default | Description |
|---|---|---|---|
genai-embedding_uri |
string | http://127.0.0.1:8013/embedding |
Embedding service endpoint |
genai-rerank_uri |
string | http://127.0.0.1:8012/rerank |
Reranking service endpoint |
Timeouts
| Variable | Type | Default | Description |
|---|---|---|---|
genai-embedding_timeout_ms |
int | 30000 | Embedding request timeout (100-300000ms) |
genai-rerank_timeout_ms |
int | 30000 | Reranking request timeout (100-300000ms) |
Admin Commands
-- Load/Save GenAI variables
LOAD GENAI VARIABLES TO RUNTIME;
SAVE GENAI VARIABLES FROM RUNTIME;
LOAD GENAI VARIABLES FROM DISK;
SAVE GENAI VARIABLES TO DISK;
-- Set variables
SET genai-threads = 8;
SET genai-embedding_uri = 'http://localhost:8080/embed';
SET genai-rerank_uri = 'http://localhost:8081/rerank';
-- View variables
SELECT @@genai-threads;
SHOW VARIABLES LIKE 'genai-%';
-- Checksum
CHECKSUM GENAI VARIABLES;
Query Syntax
GENAI: Query Format
GenAI queries use the special GENAI: prefix followed by JSON:
GENAI: {"type": "embed", "documents": ["text1", "text2"]}
GENAI: {"type": "rerank", "query": "search text", "documents": ["doc1", "doc2"]}
Supported Operations
1. Embedding
Generate vector embeddings for documents:
GENAI: {
"type": "embed",
"documents": [
"Machine learning is a subset of AI.",
"Deep learning uses neural networks."
]
}
Response:
+------------------------------------------+
| embedding |
+------------------------------------------+
| 0.123, -0.456, 0.789, ... |
| 0.234, -0.567, 0.890, ... |
+------------------------------------------+
2. Reranking
Rerank documents by relevance to a query:
GENAI: {
"type": "rerank",
"query": "What is machine learning?",
"documents": [
"Machine learning is a subset of artificial intelligence.",
"The capital of France is Paris.",
"Deep learning uses neural networks."
],
"top_n": 2,
"columns": 3
}
Parameters:
query(required): Search query textdocuments(required): Array of documents to reranktop_n(optional): Maximum results to return (0 = all, default: all)columns(optional): 2 = {index, score}, 3 = {index, score, document} (default: 3)
Response:
+-------+-------+----------------------------------------------+
| index | score | document |
+-------+-------+----------------------------------------------+
| 0 | 0.95 | Machine learning is a subset of AI... |
| 2 | 0.82 | Deep learning uses neural networks... |
+-------+-------+----------------------------------------------+
Response Format
All GenAI queries return results in MySQL resultset format with:
columns: Array of column namesrows: Array of row data
Success:
{
"columns": ["index", "score", "document"],
"rows": [
[0, 0.95, "Most relevant document"],
[2, 0.82, "Second most relevant"]
]
}
Error:
{
"error": "Error message describing what went wrong"
}
Usage Examples
Basic Embedding
-- Generate embedding for a single document
GENAI: {"type": "embed", "documents": ["Hello, world!"]};
-- Batch embedding for multiple documents
GENAI: {
"type": "embed",
"documents": ["doc1", "doc2", "doc3"]
};
Basic Reranking
-- Find most relevant documents
GENAI: {
"type": "rerank",
"query": "database optimization techniques",
"documents": [
"How to bake a cake",
"Indexing strategies for MySQL",
"Python programming basics",
"Query optimization in ProxySQL"
]
};
Top N Results
-- Get only top 3 most relevant documents
GENAI: {
"type": "rerank",
"query": "best practices for SQL",
"documents": ["doc1", "doc2", "doc3", "doc4", "doc5"],
"top_n": 3
};
Index and Score Only
-- Get only relevance scores (no document text)
GENAI: {
"type": "rerank",
"query": "test query",
"documents": ["doc1", "doc2"],
"columns": 2
};
Integration with ProxySQL
Session Lifecycle
- Session Start: MySQL session creates
genai_epoll_fd_for monitoring GenAI responses - Query Received:
GENAI:query detected inhandler___status_WAITING_CLIENT_DATA___STATE_SLEEP() - Async Send: Socketpair created, request sent, returns immediately
- Main Loop:
check_genai_events()called on each iteration - Response Ready:
handle_genai_response()processes response - Result Sent: MySQL result packet sent to client
- Cleanup: Socketpair closed, resources freed
Main Loop Integration
The GenAI event checking is integrated into the main MySQL handler loop:
handler_again:
switch (status) {
case WAITING_CLIENT_DATA:
handler___status_WAITING_CLIENT_DATA();
#ifdef epoll_create1
// Check for GenAI responses before processing new client data
if (check_genai_events()) {
goto handler_again; // Process more responses
}
#endif
break;
}
Backend Services
llama-server Integration
The GenAI module is designed to work with llama-server, a high-performance C++ inference server for LLaMA models.
Starting llama-server
# Start embedding server
./llama-server \
--model /path/to/nomic-embed-text-v1.5.gguf \
--port 8013 \
--embedding \
--ctx-size 512
# Start reranking server (using same model)
./llama-server \
--model /path/to/nomic-embed-text-v1.5.gguf \
--port 8012 \
--ctx-size 512
API Compatibility
The GenAI module expects:
- Embedding endpoint:
POST /embeddingwith JSON request - Rerank endpoint:
POST /rerankwith JSON request
Compatible with:
- llama-server
- OpenAI-compatible embedding APIs
- Custom services with matching request/response format
Testing
TAP Test Suite
Comprehensive TAP tests are available in test/tap/tests/genai_async-t.cpp:
cd test/tap/tests
make genai_async-t
./genai_async-t
Test Coverage:
- Single async requests
- Sequential requests (embedding and rerank)
- Batch requests (10+ documents)
- Mixed embedding and rerank
- Request/response matching
- Error handling (invalid JSON, missing fields)
- Special characters (quotes, unicode, etc.)
- Large documents (5KB+)
top_nandcolumnsparameters- Concurrent connections
Manual Testing
-- Test embedding
mysql> GENAI: {"type": "embed", "documents": ["test document"]};
-- Test reranking
mysql> GENAI: {
-> "type": "rerank",
-> "query": "test query",
-> "documents": ["doc1", "doc2", "doc3"]
-> };
Performance Characteristics
Non-Blocking Behavior
- MySQL threads: Return immediately after sending request (~1ms)
- GenAI workers: Handle blocking HTTP calls (10-100ms typical)
- Throughput: Limited by GenAI service capacity and worker thread count
Resource Usage
- Per request: 1 socketpair (2 file descriptors)
- Memory: Request metadata + pending response storage
- Worker threads: Configurable via
genai-threads(default: 4)
Scalability
- Concurrent requests: Limited by
genai-threadsand GenAI service capacity - Request queue: Unlimited (pending requests stored in session map)
- Recommended: Set
genai-threadsto match expected concurrency
Error Handling
Common Errors
| Error | Cause | Solution |
|---|---|---|
Failed to create GenAI communication channel |
Socketpair creation failed | Check system limits (ulimit -n) |
Failed to register with GenAI module |
GenAI module not initialized | Run LOAD GENAI VARIABLES TO RUNTIME |
Failed to send request to GenAI module |
Write error on socketpair | Check connection stability |
GenAI module not initialized |
GenAI threads not started | Set genai-threads > 0 and reload |
Timeout Handling
Requests exceeding genai-embedding_timeout_ms or genai-rerank_timeout_ms will fail with:
- Status code > 0 in response header
- Error message in JSON result
- Socketpair cleanup
Monitoring
Status Variables
-- Check GenAI module status (not yet implemented, planned)
SHOW STATUS LIKE 'genai-%';
Planned status variables:
genai_threads_initialized: Number of worker threads runninggenai_active_requests: Currently processing requestsgenai_completed_requests: Total successful requestsgenai_failed_requests: Total failed requests
Logging
GenAI operations log at debug level:
# Enable GenAI debug logging
SET mysql-debug = 1;
# Check logs
tail -f proxysql.log | grep GenAI
Limitations
Current Limitations
- document_from_sql: Not yet implemented (requires MySQL connection handling in workers)
- Shared memory: Result pointer field reserved for future optimization
- Request size: Limited by socket buffer size (typically 64KB-256KB)
Platform Requirements
- Epoll support: Linux systems (kernel 2.6+)
- Socketpair: Unix domain sockets
- Threading: POSIX threads (pthread)
Future Enhancements
Planned Features
- document_from_sql: Execute SQL to retrieve documents for reranking
- Shared memory: Zero-copy result transfer for large responses
- Connection pooling: Reuse HTTP connections to GenAI services
- Metrics: Enhanced monitoring and statistics
- Batch optimization: Better support for large document batches
- Streaming: Progressive result delivery for large operations
Related Documentation
- Posts Table Embeddings Setup - Using sqlite-rembed with GenAI
- SQLite3 Server Documentation - SQLite3 backend integration
- sqlite-rembed Integration - Embedding generation
Source Files
Core Implementation
include/GenAI_Thread.h- GenAI module interface and structureslib/GenAI_Thread.cpp- Implementation of listener and worker loopsinclude/MySQL_Session.h- Session integration (GenAI async state)lib/MySQL_Session.cpp- Async handlers and main loop integrationinclude/Base_Session.h- Base session GenAI members
Tests
test/tap/tests/genai_module-t.cpp- Admin commands and variablestest/tap/tests/genai_embedding_rerank-t.cpp- Basic embedding/rerankingtest/tap/tests/genai_async-t.cpp- Async architecture tests
License
Same as ProxySQL - See LICENSE file for details.
Contributing
For contributions and issues:
- GitHub: https://github.com/sysown/proxysql
- Branch:
v3.1-vec_genAI_module
Last Updated: 2025-01-10 Module Version: 0.1.0