diff --git a/doc/GENAI.md b/doc/GENAI.md new file mode 100644 index 000000000..66d5218a4 --- /dev/null +++ b/doc/GENAI.md @@ -0,0 +1,490 @@ +# GenAI Module Documentation + +## Overview + +The **GenAI (Generative AI) Module** in ProxySQL provides asynchronous, non-blocking access to embedding generation and document reranking services. It enables ProxySQL to interact with LLM services (like llama-server) for vector embeddings and semantic search operations without blocking MySQL threads. + +## Version + +- **Module Version**: 0.1.0 +- **Last Updated**: 2025-01-10 +- **Branch**: v3.1-vec_genAI_module + +## Architecture + +### Async Design + +The GenAI module uses a **non-blocking async architecture** based on socketpair IPC and epoll event notification: + +``` +┌─────────────────┐ socketpair ┌─────────────────┐ +│ MySQL_Session │◄────────────────────────────►│ GenAI Module │ +│ (MySQL Thread) │ fds[0] fds[1] │ Listener Loop │ +└────────┬────────┘ └────────┬────────┘ + │ │ + │ epoll │ queue + │ │ + └── epoll_wait() ────────────────────────────────┘ + (GenAI Response Ready) +``` + +### Key Components + +1. **MySQL_Session** - Client-facing interface that receives GENAI: queries +2. **GenAI Listener Thread** - Monitors socketpair fds via epoll for incoming requests +3. **GenAI Worker Threads** - Thread pool that processes requests (blocking HTTP calls) +4. **Socketpair Communication** - Bidirectional IPC between MySQL and GenAI modules + +### Communication Protocol + +#### Request Format (MySQL → GenAI) + +```c +struct GenAI_RequestHeader { + uint64_t request_id; // Client's correlation ID + uint32_t operation; // GENAI_OP_EMBEDDING, GENAI_OP_RERANK, or GENAI_OP_JSON + uint32_t query_len; // Length of JSON query that follows + uint32_t flags; // Reserved (must be 0) + uint32_t top_n; // For rerank: max results (0 = all) +}; +// Followed by: JSON query (query_len bytes) +``` + +#### Response Format (GenAI → MySQL) + +```c +struct GenAI_ResponseHeader { + uint64_t request_id; // Echo of client's request ID + uint32_t status_code; // 0 = success, >0 = error + uint32_t result_len; // Length of JSON result that follows + uint32_t processing_time_ms;// Time taken by GenAI worker + uint64_t result_ptr; // Reserved (must be 0) + uint32_t result_count; // Number of results + uint32_t reserved; // Reserved (must be 0) +}; +// Followed by: JSON result (result_len bytes) +``` + +## Configuration Variables + +### Thread Configuration + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `genai-threads` | int | 4 | Number of GenAI worker threads (1-256) | + +### Service Endpoints + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `genai-embedding_uri` | string | `http://127.0.0.1:8013/embedding` | Embedding service endpoint | +| `genai-rerank_uri` | string | `http://127.0.0.1:8012/rerank` | Reranking service endpoint | + +### Timeouts + +| Variable | Type | Default | Description | +|----------|------|---------|-------------| +| `genai-embedding_timeout_ms` | int | 30000 | Embedding request timeout (100-300000ms) | +| `genai-rerank_timeout_ms` | int | 30000 | Reranking request timeout (100-300000ms) | + +### Admin Commands + +```sql +-- Load/Save GenAI variables +LOAD GENAI VARIABLES TO RUNTIME; +SAVE GENAI VARIABLES FROM RUNTIME; +LOAD GENAI VARIABLES FROM DISK; +SAVE GENAI VARIABLES TO DISK; + +-- Set variables +SET genai-threads = 8; +SET genai-embedding_uri = 'http://localhost:8080/embed'; +SET genai-rerank_uri = 'http://localhost:8081/rerank'; + +-- View variables +SELECT @@genai-threads; +SHOW VARIABLES LIKE 'genai-%'; + +-- Checksum +CHECKSUM GENAI VARIABLES; +``` + +## Query Syntax + +### GENAI: Query Format + +GenAI queries use the special `GENAI:` prefix followed by JSON: + +```sql +GENAI: {"type": "embed", "documents": ["text1", "text2"]} +GENAI: {"type": "rerank", "query": "search text", "documents": ["doc1", "doc2"]} +``` + +### Supported Operations + +#### 1. Embedding + +Generate vector embeddings for documents: + +```sql +GENAI: { + "type": "embed", + "documents": [ + "Machine learning is a subset of AI.", + "Deep learning uses neural networks." + ] +} +``` + +**Response:** +``` ++------------------------------------------+ +| embedding | ++------------------------------------------+ +| 0.123, -0.456, 0.789, ... | +| 0.234, -0.567, 0.890, ... | ++------------------------------------------+ +``` + +#### 2. Reranking + +Rerank documents by relevance to a query: + +```sql +GENAI: { + "type": "rerank", + "query": "What is machine learning?", + "documents": [ + "Machine learning is a subset of artificial intelligence.", + "The capital of France is Paris.", + "Deep learning uses neural networks." + ], + "top_n": 2, + "columns": 3 +} +``` + +**Parameters:** +- `query` (required): Search query text +- `documents` (required): Array of documents to rerank +- `top_n` (optional): Maximum results to return (0 = all, default: all) +- `columns` (optional): 2 = {index, score}, 3 = {index, score, document} (default: 3) + +**Response:** +``` ++-------+-------+----------------------------------------------+ +| index | score | document | ++-------+-------+----------------------------------------------+ +| 0 | 0.95 | Machine learning is a subset of AI... | +| 2 | 0.82 | Deep learning uses neural networks... | ++-------+-------+----------------------------------------------+ +``` + +### Response Format + +All GenAI queries return results in MySQL resultset format with: +- `columns`: Array of column names +- `rows`: Array of row data + +**Success:** +```json +{ + "columns": ["index", "score", "document"], + "rows": [ + [0, 0.95, "Most relevant document"], + [2, 0.82, "Second most relevant"] + ] +} +``` + +**Error:** +```json +{ + "error": "Error message describing what went wrong" +} +``` + +## Usage Examples + +### Basic Embedding + +```sql +-- Generate embedding for a single document +GENAI: {"type": "embed", "documents": ["Hello, world!"]}; + +-- Batch embedding for multiple documents +GENAI: { + "type": "embed", + "documents": ["doc1", "doc2", "doc3"] +}; +``` + +### Basic Reranking + +```sql +-- Find most relevant documents +GENAI: { + "type": "rerank", + "query": "database optimization techniques", + "documents": [ + "How to bake a cake", + "Indexing strategies for MySQL", + "Python programming basics", + "Query optimization in ProxySQL" + ] +}; +``` + +### Top N Results + +```sql +-- Get only top 3 most relevant documents +GENAI: { + "type": "rerank", + "query": "best practices for SQL", + "documents": ["doc1", "doc2", "doc3", "doc4", "doc5"], + "top_n": 3 +}; +``` + +### Index and Score Only + +```sql +-- Get only relevance scores (no document text) +GENAI: { + "type": "rerank", + "query": "test query", + "documents": ["doc1", "doc2"], + "columns": 2 +}; +``` + +## Integration with ProxySQL + +### Session Lifecycle + +1. **Session Start**: MySQL session creates `genai_epoll_fd_` for monitoring GenAI responses +2. **Query Received**: `GENAI:` query detected in `handler___status_WAITING_CLIENT_DATA___STATE_SLEEP()` +3. **Async Send**: Socketpair created, request sent, returns immediately +4. **Main Loop**: `check_genai_events()` called on each iteration +5. **Response Ready**: `handle_genai_response()` processes response +6. **Result Sent**: MySQL result packet sent to client +7. **Cleanup**: Socketpair closed, resources freed + +### Main Loop Integration + +The GenAI event checking is integrated into the main MySQL handler loop: + +```cpp +handler_again: + switch (status) { + case WAITING_CLIENT_DATA: + handler___status_WAITING_CLIENT_DATA(); +#ifdef epoll_create1 + // Check for GenAI responses before processing new client data + if (check_genai_events()) { + goto handler_again; // Process more responses + } +#endif + break; + } +``` + +## Backend Services + +### llama-server Integration + +The GenAI module is designed to work with [llama-server](https://github.com/ggerganov/llama.cpp), a high-performance C++ inference server for LLaMA models. + +#### Starting llama-server + +```bash +# Start embedding server +./llama-server \ + --model /path/to/nomic-embed-text-v1.5.gguf \ + --port 8013 \ + --embedding \ + --ctx-size 512 + +# Start reranking server (using same model) +./llama-server \ + --model /path/to/nomic-embed-text-v1.5.gguf \ + --port 8012 \ + --ctx-size 512 +``` + +#### API Compatibility + +The GenAI module expects: +- **Embedding endpoint**: `POST /embedding` with JSON request +- **Rerank endpoint**: `POST /rerank` with JSON request + +Compatible with: +- llama-server +- OpenAI-compatible embedding APIs +- Custom services with matching request/response format + +## Testing + +### TAP Test Suite + +Comprehensive TAP tests are available in `test/tap/tests/genai_async-t.cpp`: + +```bash +cd test/tap/tests +make genai_async-t +./genai_async-t +``` + +**Test Coverage:** +- Single async requests +- Sequential requests (embedding and rerank) +- Batch requests (10+ documents) +- Mixed embedding and rerank +- Request/response matching +- Error handling (invalid JSON, missing fields) +- Special characters (quotes, unicode, etc.) +- Large documents (5KB+) +- `top_n` and `columns` parameters +- Concurrent connections + +### Manual Testing + +```sql +-- Test embedding +mysql> GENAI: {"type": "embed", "documents": ["test document"]}; + +-- Test reranking +mysql> GENAI: { + -> "type": "rerank", + -> "query": "test query", + -> "documents": ["doc1", "doc2", "doc3"] + -> }; +``` + +## Performance Characteristics + +### Non-Blocking Behavior + +- **MySQL threads**: Return immediately after sending request (~1ms) +- **GenAI workers**: Handle blocking HTTP calls (10-100ms typical) +- **Throughput**: Limited by GenAI service capacity and worker thread count + +### Resource Usage + +- **Per request**: 1 socketpair (2 file descriptors) +- **Memory**: Request metadata + pending response storage +- **Worker threads**: Configurable via `genai-threads` (default: 4) + +### Scalability + +- **Concurrent requests**: Limited by `genai-threads` and GenAI service capacity +- **Request queue**: Unlimited (pending requests stored in session map) +- **Recommended**: Set `genai-threads` to match expected concurrency + +## Error Handling + +### Common Errors + +| Error | Cause | Solution | +|-------|-------|----------| +| `Failed to create GenAI communication channel` | Socketpair creation failed | Check system limits (ulimit -n) | +| `Failed to register with GenAI module` | GenAI module not initialized | Run `LOAD GENAI VARIABLES TO RUNTIME` | +| `Failed to send request to GenAI module` | Write error on socketpair | Check connection stability | +| `GenAI module not initialized` | GenAI threads not started | Set `genai-threads > 0` and reload | + +### Timeout Handling + +Requests exceeding `genai-embedding_timeout_ms` or `genai-rerank_timeout_ms` will fail with: +- Status code > 0 in response header +- Error message in JSON result +- Socketpair cleanup + +## Monitoring + +### Status Variables + +```sql +-- Check GenAI module status (not yet implemented, planned) +SHOW STATUS LIKE 'genai-%'; +``` + +**Planned status variables:** +- `genai_threads_initialized`: Number of worker threads running +- `genai_active_requests`: Currently processing requests +- `genai_completed_requests`: Total successful requests +- `genai_failed_requests`: Total failed requests + +### Logging + +GenAI operations log at debug level: + +```bash +# Enable GenAI debug logging +SET mysql-debug = 1; + +# Check logs +tail -f proxysql.log | grep GenAI +``` + +## Limitations + +### Current Limitations + +1. **document_from_sql**: Not yet implemented (requires MySQL connection handling in workers) +2. **Shared memory**: Result pointer field reserved for future optimization +3. **Request size**: Limited by socket buffer size (typically 64KB-256KB) + +### Platform Requirements + +- **Epoll support**: Linux systems (kernel 2.6+) +- **Socketpair**: Unix domain sockets +- **Threading**: POSIX threads (pthread) + +## Future Enhancements + +### Planned Features + +1. **document_from_sql**: Execute SQL to retrieve documents for reranking +2. **Shared memory**: Zero-copy result transfer for large responses +3. **Connection pooling**: Reuse HTTP connections to GenAI services +4. **Metrics**: Enhanced monitoring and statistics +5. **Batch optimization**: Better support for large document batches +6. **Streaming**: Progressive result delivery for large operations + +## Related Documentation + +- [Posts Table Embeddings Setup](./posts-embeddings-setup.md) - Using sqlite-rembed with GenAI +- [SQLite3 Server Documentation](./SQLite3-Server.md) - SQLite3 backend integration +- [sqlite-rembed Integration](./sqlite-rembed-integration.md) - Embedding generation + +## Source Files + +### Core Implementation + +- `include/GenAI_Thread.h` - GenAI module interface and structures +- `lib/GenAI_Thread.cpp` - Implementation of listener and worker loops +- `include/MySQL_Session.h` - Session integration (GenAI async state) +- `lib/MySQL_Session.cpp` - Async handlers and main loop integration +- `include/Base_Session.h` - Base session GenAI members + +### Tests + +- `test/tap/tests/genai_module-t.cpp` - Admin commands and variables +- `test/tap/tests/genai_embedding_rerank-t.cpp` - Basic embedding/reranking +- `test/tap/tests/genai_async-t.cpp` - Async architecture tests + +## License + +Same as ProxySQL - See LICENSE file for details. + +## Contributing + +For contributions and issues: +- GitHub: https://github.com/sysown/proxysql +- Branch: `v3.1-vec_genAI_module` + +--- + +*Last Updated: 2025-01-10* +*Module Version: 0.1.0*