mirror of https://github.com/sysown/proxysql
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
491 lines
14 KiB
491 lines
14 KiB
# GenAI Module Documentation
|
|
|
|
## Overview
|
|
|
|
The **GenAI (Generative AI) Module** in ProxySQL provides asynchronous, non-blocking access to embedding generation and document reranking services. It enables ProxySQL to interact with LLM services (like llama-server) for vector embeddings and semantic search operations without blocking MySQL threads.
|
|
|
|
## Version
|
|
|
|
- **Module Version**: 0.1.0
|
|
- **Last Updated**: 2025-01-10
|
|
- **Branch**: v3.1-vec_genAI_module
|
|
|
|
## Architecture
|
|
|
|
### Async Design
|
|
|
|
The GenAI module uses a **non-blocking async architecture** based on socketpair IPC and epoll event notification:
|
|
|
|
```
|
|
┌─────────────────┐ socketpair ┌─────────────────┐
|
|
│ MySQL_Session │◄────────────────────────────►│ GenAI Module │
|
|
│ (MySQL Thread) │ fds[0] fds[1] │ Listener Loop │
|
|
└────────┬────────┘ └────────┬────────┘
|
|
│ │
|
|
│ epoll │ queue
|
|
│ │
|
|
└── epoll_wait() ────────────────────────────────┘
|
|
(GenAI Response Ready)
|
|
```
|
|
|
|
### Key Components
|
|
|
|
1. **MySQL_Session** - Client-facing interface that receives GENAI: queries
|
|
2. **GenAI Listener Thread** - Monitors socketpair fds via epoll for incoming requests
|
|
3. **GenAI Worker Threads** - Thread pool that processes requests (blocking HTTP calls)
|
|
4. **Socketpair Communication** - Bidirectional IPC between MySQL and GenAI modules
|
|
|
|
### Communication Protocol
|
|
|
|
#### Request Format (MySQL → GenAI)
|
|
|
|
```c
|
|
struct GenAI_RequestHeader {
|
|
uint64_t request_id; // Client's correlation ID
|
|
uint32_t operation; // GENAI_OP_EMBEDDING, GENAI_OP_RERANK, or GENAI_OP_JSON
|
|
uint32_t query_len; // Length of JSON query that follows
|
|
uint32_t flags; // Reserved (must be 0)
|
|
uint32_t top_n; // For rerank: max results (0 = all)
|
|
};
|
|
// Followed by: JSON query (query_len bytes)
|
|
```
|
|
|
|
#### Response Format (GenAI → MySQL)
|
|
|
|
```c
|
|
struct GenAI_ResponseHeader {
|
|
uint64_t request_id; // Echo of client's request ID
|
|
uint32_t status_code; // 0 = success, >0 = error
|
|
uint32_t result_len; // Length of JSON result that follows
|
|
uint32_t processing_time_ms;// Time taken by GenAI worker
|
|
uint64_t result_ptr; // Reserved (must be 0)
|
|
uint32_t result_count; // Number of results
|
|
uint32_t reserved; // Reserved (must be 0)
|
|
};
|
|
// Followed by: JSON result (result_len bytes)
|
|
```
|
|
|
|
## Configuration Variables
|
|
|
|
### Thread Configuration
|
|
|
|
| Variable | Type | Default | Description |
|
|
|----------|------|---------|-------------|
|
|
| `genai-threads` | int | 4 | Number of GenAI worker threads (1-256) |
|
|
|
|
### Service Endpoints
|
|
|
|
| Variable | Type | Default | Description |
|
|
|----------|------|---------|-------------|
|
|
| `genai-embedding_uri` | string | `http://127.0.0.1:8013/embedding` | Embedding service endpoint |
|
|
| `genai-rerank_uri` | string | `http://127.0.0.1:8012/rerank` | Reranking service endpoint |
|
|
|
|
### Timeouts
|
|
|
|
| Variable | Type | Default | Description |
|
|
|----------|------|---------|-------------|
|
|
| `genai-embedding_timeout_ms` | int | 30000 | Embedding request timeout (100-300000ms) |
|
|
| `genai-rerank_timeout_ms` | int | 30000 | Reranking request timeout (100-300000ms) |
|
|
|
|
### Admin Commands
|
|
|
|
```sql
|
|
-- Load/Save GenAI variables
|
|
LOAD GENAI VARIABLES TO RUNTIME;
|
|
SAVE GENAI VARIABLES FROM RUNTIME;
|
|
LOAD GENAI VARIABLES FROM DISK;
|
|
SAVE GENAI VARIABLES TO DISK;
|
|
|
|
-- Set variables
|
|
SET genai-threads = 8;
|
|
SET genai-embedding_uri = 'http://localhost:8080/embed';
|
|
SET genai-rerank_uri = 'http://localhost:8081/rerank';
|
|
|
|
-- View variables
|
|
SELECT @@genai-threads;
|
|
SHOW VARIABLES LIKE 'genai-%';
|
|
|
|
-- Checksum
|
|
CHECKSUM GENAI VARIABLES;
|
|
```
|
|
|
|
## Query Syntax
|
|
|
|
### GENAI: Query Format
|
|
|
|
GenAI queries use the special `GENAI:` prefix followed by JSON:
|
|
|
|
```sql
|
|
GENAI: {"type": "embed", "documents": ["text1", "text2"]}
|
|
GENAI: {"type": "rerank", "query": "search text", "documents": ["doc1", "doc2"]}
|
|
```
|
|
|
|
### Supported Operations
|
|
|
|
#### 1. Embedding
|
|
|
|
Generate vector embeddings for documents:
|
|
|
|
```sql
|
|
GENAI: {
|
|
"type": "embed",
|
|
"documents": [
|
|
"Machine learning is a subset of AI.",
|
|
"Deep learning uses neural networks."
|
|
]
|
|
}
|
|
```
|
|
|
|
**Response:**
|
|
```
|
|
+------------------------------------------+
|
|
| embedding |
|
|
+------------------------------------------+
|
|
| 0.123, -0.456, 0.789, ... |
|
|
| 0.234, -0.567, 0.890, ... |
|
|
+------------------------------------------+
|
|
```
|
|
|
|
#### 2. Reranking
|
|
|
|
Rerank documents by relevance to a query:
|
|
|
|
```sql
|
|
GENAI: {
|
|
"type": "rerank",
|
|
"query": "What is machine learning?",
|
|
"documents": [
|
|
"Machine learning is a subset of artificial intelligence.",
|
|
"The capital of France is Paris.",
|
|
"Deep learning uses neural networks."
|
|
],
|
|
"top_n": 2,
|
|
"columns": 3
|
|
}
|
|
```
|
|
|
|
**Parameters:**
|
|
- `query` (required): Search query text
|
|
- `documents` (required): Array of documents to rerank
|
|
- `top_n` (optional): Maximum results to return (0 = all, default: all)
|
|
- `columns` (optional): 2 = {index, score}, 3 = {index, score, document} (default: 3)
|
|
|
|
**Response:**
|
|
```
|
|
+-------+-------+----------------------------------------------+
|
|
| index | score | document |
|
|
+-------+-------+----------------------------------------------+
|
|
| 0 | 0.95 | Machine learning is a subset of AI... |
|
|
| 2 | 0.82 | Deep learning uses neural networks... |
|
|
+-------+-------+----------------------------------------------+
|
|
```
|
|
|
|
### Response Format
|
|
|
|
All GenAI queries return results in MySQL resultset format with:
|
|
- `columns`: Array of column names
|
|
- `rows`: Array of row data
|
|
|
|
**Success:**
|
|
```json
|
|
{
|
|
"columns": ["index", "score", "document"],
|
|
"rows": [
|
|
[0, 0.95, "Most relevant document"],
|
|
[2, 0.82, "Second most relevant"]
|
|
]
|
|
}
|
|
```
|
|
|
|
**Error:**
|
|
```json
|
|
{
|
|
"error": "Error message describing what went wrong"
|
|
}
|
|
```
|
|
|
|
## Usage Examples
|
|
|
|
### Basic Embedding
|
|
|
|
```sql
|
|
-- Generate embedding for a single document
|
|
GENAI: {"type": "embed", "documents": ["Hello, world!"]};
|
|
|
|
-- Batch embedding for multiple documents
|
|
GENAI: {
|
|
"type": "embed",
|
|
"documents": ["doc1", "doc2", "doc3"]
|
|
};
|
|
```
|
|
|
|
### Basic Reranking
|
|
|
|
```sql
|
|
-- Find most relevant documents
|
|
GENAI: {
|
|
"type": "rerank",
|
|
"query": "database optimization techniques",
|
|
"documents": [
|
|
"How to bake a cake",
|
|
"Indexing strategies for MySQL",
|
|
"Python programming basics",
|
|
"Query optimization in ProxySQL"
|
|
]
|
|
};
|
|
```
|
|
|
|
### Top N Results
|
|
|
|
```sql
|
|
-- Get only top 3 most relevant documents
|
|
GENAI: {
|
|
"type": "rerank",
|
|
"query": "best practices for SQL",
|
|
"documents": ["doc1", "doc2", "doc3", "doc4", "doc5"],
|
|
"top_n": 3
|
|
};
|
|
```
|
|
|
|
### Index and Score Only
|
|
|
|
```sql
|
|
-- Get only relevance scores (no document text)
|
|
GENAI: {
|
|
"type": "rerank",
|
|
"query": "test query",
|
|
"documents": ["doc1", "doc2"],
|
|
"columns": 2
|
|
};
|
|
```
|
|
|
|
## Integration with ProxySQL
|
|
|
|
### Session Lifecycle
|
|
|
|
1. **Session Start**: MySQL session creates `genai_epoll_fd_` for monitoring GenAI responses
|
|
2. **Query Received**: `GENAI:` query detected in `handler___status_WAITING_CLIENT_DATA___STATE_SLEEP()`
|
|
3. **Async Send**: Socketpair created, request sent, returns immediately
|
|
4. **Main Loop**: `check_genai_events()` called on each iteration
|
|
5. **Response Ready**: `handle_genai_response()` processes response
|
|
6. **Result Sent**: MySQL result packet sent to client
|
|
7. **Cleanup**: Socketpair closed, resources freed
|
|
|
|
### Main Loop Integration
|
|
|
|
The GenAI event checking is integrated into the main MySQL handler loop:
|
|
|
|
```cpp
|
|
handler_again:
|
|
switch (status) {
|
|
case WAITING_CLIENT_DATA:
|
|
handler___status_WAITING_CLIENT_DATA();
|
|
#ifdef epoll_create1
|
|
// Check for GenAI responses before processing new client data
|
|
if (check_genai_events()) {
|
|
goto handler_again; // Process more responses
|
|
}
|
|
#endif
|
|
break;
|
|
}
|
|
```
|
|
|
|
## Backend Services
|
|
|
|
### llama-server Integration
|
|
|
|
The GenAI module is designed to work with [llama-server](https://github.com/ggerganov/llama.cpp), a high-performance C++ inference server for LLaMA models.
|
|
|
|
#### Starting llama-server
|
|
|
|
```bash
|
|
# Start embedding server
|
|
./llama-server \
|
|
--model /path/to/nomic-embed-text-v1.5.gguf \
|
|
--port 8013 \
|
|
--embedding \
|
|
--ctx-size 512
|
|
|
|
# Start reranking server (using same model)
|
|
./llama-server \
|
|
--model /path/to/nomic-embed-text-v1.5.gguf \
|
|
--port 8012 \
|
|
--ctx-size 512
|
|
```
|
|
|
|
#### API Compatibility
|
|
|
|
The GenAI module expects:
|
|
- **Embedding endpoint**: `POST /embedding` with JSON request
|
|
- **Rerank endpoint**: `POST /rerank` with JSON request
|
|
|
|
Compatible with:
|
|
- llama-server
|
|
- OpenAI-compatible embedding APIs
|
|
- Custom services with matching request/response format
|
|
|
|
## Testing
|
|
|
|
### TAP Test Suite
|
|
|
|
Comprehensive TAP tests are available in `test/tap/tests/genai_async-t.cpp`:
|
|
|
|
```bash
|
|
cd test/tap/tests
|
|
make genai_async-t
|
|
./genai_async-t
|
|
```
|
|
|
|
**Test Coverage:**
|
|
- Single async requests
|
|
- Sequential requests (embedding and rerank)
|
|
- Batch requests (10+ documents)
|
|
- Mixed embedding and rerank
|
|
- Request/response matching
|
|
- Error handling (invalid JSON, missing fields)
|
|
- Special characters (quotes, unicode, etc.)
|
|
- Large documents (5KB+)
|
|
- `top_n` and `columns` parameters
|
|
- Concurrent connections
|
|
|
|
### Manual Testing
|
|
|
|
```sql
|
|
-- Test embedding
|
|
mysql> GENAI: {"type": "embed", "documents": ["test document"]};
|
|
|
|
-- Test reranking
|
|
mysql> GENAI: {
|
|
-> "type": "rerank",
|
|
-> "query": "test query",
|
|
-> "documents": ["doc1", "doc2", "doc3"]
|
|
-> };
|
|
```
|
|
|
|
## Performance Characteristics
|
|
|
|
### Non-Blocking Behavior
|
|
|
|
- **MySQL threads**: Return immediately after sending request (~1ms)
|
|
- **GenAI workers**: Handle blocking HTTP calls (10-100ms typical)
|
|
- **Throughput**: Limited by GenAI service capacity and worker thread count
|
|
|
|
### Resource Usage
|
|
|
|
- **Per request**: 1 socketpair (2 file descriptors)
|
|
- **Memory**: Request metadata + pending response storage
|
|
- **Worker threads**: Configurable via `genai-threads` (default: 4)
|
|
|
|
### Scalability
|
|
|
|
- **Concurrent requests**: Limited by `genai-threads` and GenAI service capacity
|
|
- **Request queue**: Unlimited (pending requests stored in session map)
|
|
- **Recommended**: Set `genai-threads` to match expected concurrency
|
|
|
|
## Error Handling
|
|
|
|
### Common Errors
|
|
|
|
| Error | Cause | Solution |
|
|
|-------|-------|----------|
|
|
| `Failed to create GenAI communication channel` | Socketpair creation failed | Check system limits (ulimit -n) |
|
|
| `Failed to register with GenAI module` | GenAI module not initialized | Run `LOAD GENAI VARIABLES TO RUNTIME` |
|
|
| `Failed to send request to GenAI module` | Write error on socketpair | Check connection stability |
|
|
| `GenAI module not initialized` | GenAI threads not started | Set `genai-threads > 0` and reload |
|
|
|
|
### Timeout Handling
|
|
|
|
Requests exceeding `genai-embedding_timeout_ms` or `genai-rerank_timeout_ms` will fail with:
|
|
- Status code > 0 in response header
|
|
- Error message in JSON result
|
|
- Socketpair cleanup
|
|
|
|
## Monitoring
|
|
|
|
### Status Variables
|
|
|
|
```sql
|
|
-- Check GenAI module status (not yet implemented, planned)
|
|
SHOW STATUS LIKE 'genai-%';
|
|
```
|
|
|
|
**Planned status variables:**
|
|
- `genai_threads_initialized`: Number of worker threads running
|
|
- `genai_active_requests`: Currently processing requests
|
|
- `genai_completed_requests`: Total successful requests
|
|
- `genai_failed_requests`: Total failed requests
|
|
|
|
### Logging
|
|
|
|
GenAI operations log at debug level:
|
|
|
|
```bash
|
|
# Enable GenAI debug logging
|
|
SET mysql-debug = 1;
|
|
|
|
# Check logs
|
|
tail -f proxysql.log | grep GenAI
|
|
```
|
|
|
|
## Limitations
|
|
|
|
### Current Limitations
|
|
|
|
1. **document_from_sql**: Not yet implemented (requires MySQL connection handling in workers)
|
|
2. **Shared memory**: Result pointer field reserved for future optimization
|
|
3. **Request size**: Limited by socket buffer size (typically 64KB-256KB)
|
|
|
|
### Platform Requirements
|
|
|
|
- **Epoll support**: Linux systems (kernel 2.6+)
|
|
- **Socketpair**: Unix domain sockets
|
|
- **Threading**: POSIX threads (pthread)
|
|
|
|
## Future Enhancements
|
|
|
|
### Planned Features
|
|
|
|
1. **document_from_sql**: Execute SQL to retrieve documents for reranking
|
|
2. **Shared memory**: Zero-copy result transfer for large responses
|
|
3. **Connection pooling**: Reuse HTTP connections to GenAI services
|
|
4. **Metrics**: Enhanced monitoring and statistics
|
|
5. **Batch optimization**: Better support for large document batches
|
|
6. **Streaming**: Progressive result delivery for large operations
|
|
|
|
## Related Documentation
|
|
|
|
- [Posts Table Embeddings Setup](./posts-embeddings-setup.md) - Using sqlite-rembed with GenAI
|
|
- [SQLite3 Server Documentation](./SQLite3-Server.md) - SQLite3 backend integration
|
|
- [sqlite-rembed Integration](./sqlite-rembed-integration.md) - Embedding generation
|
|
|
|
## Source Files
|
|
|
|
### Core Implementation
|
|
|
|
- `include/GenAI_Thread.h` - GenAI module interface and structures
|
|
- `lib/GenAI_Thread.cpp` - Implementation of listener and worker loops
|
|
- `include/MySQL_Session.h` - Session integration (GenAI async state)
|
|
- `lib/MySQL_Session.cpp` - Async handlers and main loop integration
|
|
- `include/Base_Session.h` - Base session GenAI members
|
|
|
|
### Tests
|
|
|
|
- `test/tap/tests/genai_module-t.cpp` - Admin commands and variables
|
|
- `test/tap/tests/genai_embedding_rerank-t.cpp` - Basic embedding/reranking
|
|
- `test/tap/tests/genai_async-t.cpp` - Async architecture tests
|
|
|
|
## License
|
|
|
|
Same as ProxySQL - See LICENSE file for details.
|
|
|
|
## Contributing
|
|
|
|
For contributions and issues:
|
|
- GitHub: https://github.com/sysown/proxysql
|
|
- Branch: `v3.1-vec_genAI_module`
|
|
|
|
---
|
|
|
|
*Last Updated: 2025-01-10*
|
|
*Module Version: 0.1.0*
|