# GenAI Module Documentation ## Overview The **GenAI (Generative AI) Module** in ProxySQL provides asynchronous, non-blocking access to embedding generation and document reranking services. It enables ProxySQL to interact with LLM services (like llama-server) for vector embeddings and semantic search operations without blocking MySQL threads. ## Version - **Module Version**: 0.1.0 - **Last Updated**: 2025-01-10 - **Branch**: v3.1-vec_genAI_module ## Architecture ### Async Design The GenAI module uses a **non-blocking async architecture** based on socketpair IPC and epoll event notification: ``` ┌─────────────────┐ socketpair ┌─────────────────┐ │ MySQL_Session │◄────────────────────────────►│ GenAI Module │ │ (MySQL Thread) │ fds[0] fds[1] │ Listener Loop │ └────────┬────────┘ └────────┬────────┘ │ │ │ epoll │ queue │ │ └── epoll_wait() ────────────────────────────────┘ (GenAI Response Ready) ``` ### Key Components 1. **MySQL_Session** - Client-facing interface that receives GENAI: queries 2. **GenAI Listener Thread** - Monitors socketpair fds via epoll for incoming requests 3. **GenAI Worker Threads** - Thread pool that processes requests (blocking HTTP calls) 4. **Socketpair Communication** - Bidirectional IPC between MySQL and GenAI modules ### Communication Protocol #### Request Format (MySQL → GenAI) ```c struct GenAI_RequestHeader { uint64_t request_id; // Client's correlation ID uint32_t operation; // GENAI_OP_EMBEDDING, GENAI_OP_RERANK, or GENAI_OP_JSON uint32_t query_len; // Length of JSON query that follows uint32_t flags; // Reserved (must be 0) uint32_t top_n; // For rerank: max results (0 = all) }; // Followed by: JSON query (query_len bytes) ``` #### Response Format (GenAI → MySQL) ```c struct GenAI_ResponseHeader { uint64_t request_id; // Echo of client's request ID uint32_t status_code; // 0 = success, >0 = error uint32_t result_len; // Length of JSON result that follows uint32_t processing_time_ms;// Time taken by GenAI worker uint64_t result_ptr; // Reserved (must be 0) uint32_t result_count; // Number of results uint32_t reserved; // Reserved (must be 0) }; // Followed by: JSON result (result_len bytes) ``` ## Configuration Variables ### Thread Configuration | Variable | Type | Default | Description | |----------|------|---------|-------------| | `genai-threads` | int | 4 | Number of GenAI worker threads (1-256) | ### Service Endpoints | Variable | Type | Default | Description | |----------|------|---------|-------------| | `genai-embedding_uri` | string | `http://127.0.0.1:8013/embedding` | Embedding service endpoint | | `genai-rerank_uri` | string | `http://127.0.0.1:8012/rerank` | Reranking service endpoint | ### Timeouts | Variable | Type | Default | Description | |----------|------|---------|-------------| | `genai-embedding_timeout_ms` | int | 30000 | Embedding request timeout (100-300000ms) | | `genai-rerank_timeout_ms` | int | 30000 | Reranking request timeout (100-300000ms) | ### Admin Commands ```sql -- Load/Save GenAI variables LOAD GENAI VARIABLES TO RUNTIME; SAVE GENAI VARIABLES FROM RUNTIME; LOAD GENAI VARIABLES FROM DISK; SAVE GENAI VARIABLES TO DISK; -- Set variables SET genai-threads = 8; SET genai-embedding_uri = 'http://localhost:8080/embed'; SET genai-rerank_uri = 'http://localhost:8081/rerank'; -- View variables SELECT @@genai-threads; SHOW VARIABLES LIKE 'genai-%'; -- Checksum CHECKSUM GENAI VARIABLES; ``` ## Query Syntax ### GENAI: Query Format GenAI queries use the special `GENAI:` prefix followed by JSON: ```sql GENAI: {"type": "embed", "documents": ["text1", "text2"]} GENAI: {"type": "rerank", "query": "search text", "documents": ["doc1", "doc2"]} ``` ### Supported Operations #### 1. Embedding Generate vector embeddings for documents: ```sql GENAI: { "type": "embed", "documents": [ "Machine learning is a subset of AI.", "Deep learning uses neural networks." ] } ``` **Response:** ``` +------------------------------------------+ | embedding | +------------------------------------------+ | 0.123, -0.456, 0.789, ... | | 0.234, -0.567, 0.890, ... | +------------------------------------------+ ``` #### 2. Reranking Rerank documents by relevance to a query: ```sql GENAI: { "type": "rerank", "query": "What is machine learning?", "documents": [ "Machine learning is a subset of artificial intelligence.", "The capital of France is Paris.", "Deep learning uses neural networks." ], "top_n": 2, "columns": 3 } ``` **Parameters:** - `query` (required): Search query text - `documents` (required): Array of documents to rerank - `top_n` (optional): Maximum results to return (0 = all, default: all) - `columns` (optional): 2 = {index, score}, 3 = {index, score, document} (default: 3) **Response:** ``` +-------+-------+----------------------------------------------+ | index | score | document | +-------+-------+----------------------------------------------+ | 0 | 0.95 | Machine learning is a subset of AI... | | 2 | 0.82 | Deep learning uses neural networks... | +-------+-------+----------------------------------------------+ ``` ### Response Format All GenAI queries return results in MySQL resultset format with: - `columns`: Array of column names - `rows`: Array of row data **Success:** ```json { "columns": ["index", "score", "document"], "rows": [ [0, 0.95, "Most relevant document"], [2, 0.82, "Second most relevant"] ] } ``` **Error:** ```json { "error": "Error message describing what went wrong" } ``` ## Usage Examples ### Basic Embedding ```sql -- Generate embedding for a single document GENAI: {"type": "embed", "documents": ["Hello, world!"]}; -- Batch embedding for multiple documents GENAI: { "type": "embed", "documents": ["doc1", "doc2", "doc3"] }; ``` ### Basic Reranking ```sql -- Find most relevant documents GENAI: { "type": "rerank", "query": "database optimization techniques", "documents": [ "How to bake a cake", "Indexing strategies for MySQL", "Python programming basics", "Query optimization in ProxySQL" ] }; ``` ### Top N Results ```sql -- Get only top 3 most relevant documents GENAI: { "type": "rerank", "query": "best practices for SQL", "documents": ["doc1", "doc2", "doc3", "doc4", "doc5"], "top_n": 3 }; ``` ### Index and Score Only ```sql -- Get only relevance scores (no document text) GENAI: { "type": "rerank", "query": "test query", "documents": ["doc1", "doc2"], "columns": 2 }; ``` ## Integration with ProxySQL ### Session Lifecycle 1. **Session Start**: MySQL session creates `genai_epoll_fd_` for monitoring GenAI responses 2. **Query Received**: `GENAI:` query detected in `handler___status_WAITING_CLIENT_DATA___STATE_SLEEP()` 3. **Async Send**: Socketpair created, request sent, returns immediately 4. **Main Loop**: `check_genai_events()` called on each iteration 5. **Response Ready**: `handle_genai_response()` processes response 6. **Result Sent**: MySQL result packet sent to client 7. **Cleanup**: Socketpair closed, resources freed ### Main Loop Integration The GenAI event checking is integrated into the main MySQL handler loop: ```cpp handler_again: switch (status) { case WAITING_CLIENT_DATA: handler___status_WAITING_CLIENT_DATA(); #ifdef epoll_create1 // Check for GenAI responses before processing new client data if (check_genai_events()) { goto handler_again; // Process more responses } #endif break; } ``` ## Backend Services ### llama-server Integration The GenAI module is designed to work with [llama-server](https://github.com/ggerganov/llama.cpp), a high-performance C++ inference server for LLaMA models. #### Starting llama-server ```bash # Start embedding server ./llama-server \ --model /path/to/nomic-embed-text-v1.5.gguf \ --port 8013 \ --embedding \ --ctx-size 512 # Start reranking server (using same model) ./llama-server \ --model /path/to/nomic-embed-text-v1.5.gguf \ --port 8012 \ --ctx-size 512 ``` #### API Compatibility The GenAI module expects: - **Embedding endpoint**: `POST /embedding` with JSON request - **Rerank endpoint**: `POST /rerank` with JSON request Compatible with: - llama-server - OpenAI-compatible embedding APIs - Custom services with matching request/response format ## Testing ### TAP Test Suite Comprehensive TAP tests are available in `test/tap/tests/genai_async-t.cpp`: ```bash cd test/tap/tests make genai_async-t ./genai_async-t ``` **Test Coverage:** - Single async requests - Sequential requests (embedding and rerank) - Batch requests (10+ documents) - Mixed embedding and rerank - Request/response matching - Error handling (invalid JSON, missing fields) - Special characters (quotes, unicode, etc.) - Large documents (5KB+) - `top_n` and `columns` parameters - Concurrent connections ### Manual Testing ```sql -- Test embedding mysql> GENAI: {"type": "embed", "documents": ["test document"]}; -- Test reranking mysql> GENAI: { -> "type": "rerank", -> "query": "test query", -> "documents": ["doc1", "doc2", "doc3"] -> }; ``` ## Performance Characteristics ### Non-Blocking Behavior - **MySQL threads**: Return immediately after sending request (~1ms) - **GenAI workers**: Handle blocking HTTP calls (10-100ms typical) - **Throughput**: Limited by GenAI service capacity and worker thread count ### Resource Usage - **Per request**: 1 socketpair (2 file descriptors) - **Memory**: Request metadata + pending response storage - **Worker threads**: Configurable via `genai-threads` (default: 4) ### Scalability - **Concurrent requests**: Limited by `genai-threads` and GenAI service capacity - **Request queue**: Unlimited (pending requests stored in session map) - **Recommended**: Set `genai-threads` to match expected concurrency ## Error Handling ### Common Errors | Error | Cause | Solution | |-------|-------|----------| | `Failed to create GenAI communication channel` | Socketpair creation failed | Check system limits (ulimit -n) | | `Failed to register with GenAI module` | GenAI module not initialized | Run `LOAD GENAI VARIABLES TO RUNTIME` | | `Failed to send request to GenAI module` | Write error on socketpair | Check connection stability | | `GenAI module not initialized` | GenAI threads not started | Set `genai-threads > 0` and reload | ### Timeout Handling Requests exceeding `genai-embedding_timeout_ms` or `genai-rerank_timeout_ms` will fail with: - Status code > 0 in response header - Error message in JSON result - Socketpair cleanup ## Monitoring ### Status Variables ```sql -- Check GenAI module status (not yet implemented, planned) SHOW STATUS LIKE 'genai-%'; ``` **Planned status variables:** - `genai_threads_initialized`: Number of worker threads running - `genai_active_requests`: Currently processing requests - `genai_completed_requests`: Total successful requests - `genai_failed_requests`: Total failed requests ### Logging GenAI operations log at debug level: ```bash # Enable GenAI debug logging SET mysql-debug = 1; # Check logs tail -f proxysql.log | grep GenAI ``` ## Limitations ### Current Limitations 1. **document_from_sql**: Not yet implemented (requires MySQL connection handling in workers) 2. **Shared memory**: Result pointer field reserved for future optimization 3. **Request size**: Limited by socket buffer size (typically 64KB-256KB) ### Platform Requirements - **Epoll support**: Linux systems (kernel 2.6+) - **Socketpair**: Unix domain sockets - **Threading**: POSIX threads (pthread) ## Future Enhancements ### Planned Features 1. **document_from_sql**: Execute SQL to retrieve documents for reranking 2. **Shared memory**: Zero-copy result transfer for large responses 3. **Connection pooling**: Reuse HTTP connections to GenAI services 4. **Metrics**: Enhanced monitoring and statistics 5. **Batch optimization**: Better support for large document batches 6. **Streaming**: Progressive result delivery for large operations ## Related Documentation - [Posts Table Embeddings Setup](./posts-embeddings-setup.md) - Using sqlite-rembed with GenAI - [SQLite3 Server Documentation](./SQLite3-Server.md) - SQLite3 backend integration - [sqlite-rembed Integration](./sqlite-rembed-integration.md) - Embedding generation ## Source Files ### Core Implementation - `include/GenAI_Thread.h` - GenAI module interface and structures - `lib/GenAI_Thread.cpp` - Implementation of listener and worker loops - `include/MySQL_Session.h` - Session integration (GenAI async state) - `lib/MySQL_Session.cpp` - Async handlers and main loop integration - `include/Base_Session.h` - Base session GenAI members ### Tests - `test/tap/tests/genai_module-t.cpp` - Admin commands and variables - `test/tap/tests/genai_embedding_rerank-t.cpp` - Basic embedding/reranking - `test/tap/tests/genai_async-t.cpp` - Async architecture tests ## License Same as ProxySQL - See LICENSE file for details. ## Contributing For contributions and issues: - GitHub: https://github.com/sysown/proxysql - Branch: `v3.1-vec_genAI_module` --- *Last Updated: 2025-01-10* *Module Version: 0.1.0*