History

René Cannaò 313f637cf0 Merge branch 'v3.1-vec' into v3.1-MCP1 Signed-off-by: René Cannaò <rene.cannao@gmail.com>		1 month ago
..
.gitignore	Remove genai_demo_event binary from tracking and update .gitignore	1 month ago
Makefile	Evolve genai_demo_event to working POC with real embeddings	1 month ago
README.md	…
genai_demo.cpp	…
genai_demo_event	Implement MySQL connection pool for MySQL_Tool_Handler	1 month ago
genai_demo_event.cpp	Add rerank support to GenAI prototype via llama-server	1 month ago

README.md

GenAI Module Prototype

Standalone prototype demonstrating the GenAI module architecture for ProxySQL.

Architecture Overview

This prototype demonstrates a thread-pool based GenAI module that:

Receives requests from multiple clients (MySQL/PgSQL threads) via socket pairs
Queues requests internally with a fixed-size worker thread pool
Processes requests asynchronously without blocking the clients
Returns responses to clients via the same socket connections

Components

┌─────────────────────────────────────────────────────────┐
│                   GenAI Module                          │
│                                                          │
│  ┌────────────────────────────────────────────────┐    │
│  │  Listener Thread (epoll-based)                 │    │
│  │  - Monitors all client file descriptors        │    │
│  │  - Reads incoming requests                     │    │
│  │  - Pushes to request queue                     │    │
│  └──────────────────┬─────────────────────────────┘    │
│                     │                                  │
│                     ▼                                  │
│  ┌────────────────────────────────────────────────┐    │
│  │  Request Queue                                 │    │
│  │  - Thread-safe queue                           │    │
│  │  - Condition variable for worker notification  │    │
│  └──────────────────┬─────────────────────────────┘    │
│                     │                                  │
│                     ▼                                  │
│  ┌────────────────────────────────────────────────┐    │
│  │  Thread Pool (configurable number of workers)  │    │
│  │  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐      │    │
│  │  │Worker│  │Worker│  │Worker│  │Worker│  ...  │    │
│  │  └───┬──┘  └───┬──┘  └───┬──┘  └───┬──┘        │    │
│  │      └──────────┴──────────┴──────────┘         │    │
│  └────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘
         ▲                    │                    ▲
         │                    │                    │
    socketpair()         Responses           socketpair()
    from clients          to clients          from clients

Communication Protocol

Client → GenAI (Request):

struct RequestHeader {
    uint64_t request_id;     // Client's correlation ID
    uint32_t operation;      // 0=embedding, 1=completion, 2=rag
    uint32_t input_size;     // Size of following data
    uint32_t flags;          // Reserved
};
// Followed by input_size bytes of input data

GenAI → Client (Response):

struct ResponseHeader {
    uint64_t request_id;     // Echo client's ID
    uint32_t status_code;    // 0=success, >0=error
    uint32_t output_size;    // Size of following data
    uint32_t processing_time_ms;  // Time taken to process
};
// Followed by output_size bytes of output data

Building and Running

# Build
make

# Run
make run

# Clean
make clean

# Debug build
make debug

# Show help
make help

Current Status

Implemented:

✅ Thread pool with configurable workers
✅ epoll-based listener thread
✅ Thread-safe request queue
✅ socketpair communication
✅ Multiple concurrent clients
✅ Non-blocking async operation
✅ Simulated processing (random sleep)

TODO (Enhancement Phase):

⬜ Real LLM API integration (OpenAI, local models)
⬜ Request batching for efficiency
⬜ Priority queue for urgent requests
⬜ Timeout and cancellation
⬜ Backpressure handling (queue limits)
⬜ Metrics and monitoring
⬜ Error handling and retry logic
⬜ Configuration file support
⬜ Unit tests
⬜ Performance benchmarking

Integration Plan

Phase 1: Prototype Enhancement (Current)

Complete TODO items above
Test with real LLM APIs
Performance testing

Phase 2: ProxySQL Integration

Integrate into ProxySQL build system
Add to existing MySQL/PgSQL thread logic
Implement GenAI variable system

Phase 3: Production Features

Connection pooling
Request multiplexing
Caching layer
Fallback strategies

Design Principles

Zero Coupling: GenAI module doesn't know about client types
Non-Blocking: Clients never wait on GenAI responses
Scalable: Fixed resource usage (bounded thread pool)
Observable: Easy to monitor and debug
Testable: Standalone, independent testing