# GenAI Module Prototype

Standalone prototype demonstrating the GenAI module architecture for ProxySQL.

## Architecture Overview

This prototype demonstrates a thread-pool based GenAI module that:

1. **Receives requests** from multiple clients (MySQL/PgSQL threads) via socket pairs
2. **Queues requests** internally with a fixed-size worker thread pool
3. **Processes requests asynchronously** without blocking the clients
4. **Returns responses** to clients via the same socket connections

### Components

```
┌─────────────────────────────────────────────────────────┐
│                   GenAI Module                          │
│                                                          │
│  ┌────────────────────────────────────────────────┐    │
│  │  Listener Thread (epoll-based)                 │    │
│  │  - Monitors all client file descriptors        │    │
│  │  - Reads incoming requests                     │    │
│  │  - Pushes to request queue                     │    │
│  └──────────────────┬─────────────────────────────┘    │
│                     │                                  │
│                     ▼                                  │
│  ┌────────────────────────────────────────────────┐    │
│  │  Request Queue                                 │    │
│  │  - Thread-safe queue                           │    │
│  │  - Condition variable for worker notification  │    │
│  └──────────────────┬─────────────────────────────┘    │
│                     │                                  │
│                     ▼                                  │
│  ┌────────────────────────────────────────────────┐    │
│  │  Thread Pool (configurable number of workers)  │    │
│  │  ┌──────┐  ┌──────┐  ┌──────┐  ┌──────┐      │    │
│  │  │Worker│  │Worker│  │Worker│  │Worker│  ...  │    │
│  │  └───┬──┘  └───┬──┘  └───┬──┘  └───┬──┘        │    │
│  │      └──────────┴──────────┴──────────┘         │    │
│  └────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────┘
         ▲                    │                    ▲
         │                    │                    │
    socketpair()         Responses           socketpair()
    from clients          to clients          from clients
```

### Communication Protocol

**Client → GenAI (Request)**:
```cpp
struct RequestHeader {
    uint64_t request_id;     // Client's correlation ID
    uint32_t operation;      // 0=embedding, 1=completion, 2=rag
    uint32_t input_size;     // Size of following data
    uint32_t flags;          // Reserved
};
// Followed by input_size bytes of input data
```

**GenAI → Client (Response)**:
```cpp
struct ResponseHeader {
    uint64_t request_id;     // Echo client's ID
    uint32_t status_code;    // 0=success, >0=error
    uint32_t output_size;    // Size of following data
    uint32_t processing_time_ms;  // Time taken to process
};
// Followed by output_size bytes of output data
```

## Building and Running

```bash
# Build
make

# Run
make run

# Clean
make clean

# Debug build
make debug

# Show help
make help
```

## Current Status

**Implemented:**
- ✅ Thread pool with configurable workers
- ✅ epoll-based listener thread
- ✅ Thread-safe request queue
- ✅ socketpair communication
- ✅ Multiple concurrent clients
- ✅ Non-blocking async operation
- ✅ Simulated processing (random sleep)

**TODO (Enhancement Phase):**
- ⬜ Real LLM API integration (OpenAI, local models)
- ⬜ Request batching for efficiency
- ⬜ Priority queue for urgent requests
- ⬜ Timeout and cancellation
- ⬜ Backpressure handling (queue limits)
- ⬜ Metrics and monitoring
- ⬜ Error handling and retry logic
- ⬜ Configuration file support
- ⬜ Unit tests
- ⬜ Performance benchmarking

## Integration Plan

Phase 1: **Prototype Enhancement** (Current)
- Complete TODO items above
- Test with real LLM APIs
- Performance testing

Phase 2: **ProxySQL Integration**
- Integrate into ProxySQL build system
- Add to existing MySQL/PgSQL thread logic
- Implement GenAI variable system

Phase 3: **Production Features**
- Connection pooling
- Request multiplexing
- Caching layer
- Fallback strategies

## Design Principles

1. **Zero Coupling**: GenAI module doesn't know about client types
2. **Non-Blocking**: Clients never wait on GenAI responses
3. **Scalable**: Fixed resource usage (bounded thread pool)
4. **Observable**: Easy to monitor and debug
5. **Testable**: Standalone, independent testing