# ProxySQL Architecture Overview > **⚠️ Important Notice**: This documentation was generated by AI and may contain inaccuracies. > It should be used as a starting point for exploration only. Always verify critical information > against the actual source code. > > **Last AI Update**: 2025-09-11 > **Status**: NON-VERIFIED > **Maintainer**: Rene Cannao ## Executive Summary ProxySQL is a MySQL and PostgreSQL protocol-aware proxy server written in C++11/17. It implements a multi-threaded architecture with connection pooling, query routing, caching, and monitoring. ## System Architecture ### Core Design Patterns 1. **Multi-Threaded Worker Model** - MySQL worker threads (`MySQL_Thread`) handle client connections - PgSQL worker threads (`PgSQL_Thread`) for PostgreSQL support - Admin threads for configuration management - Monitor threads for backend health checking - Idle connection management threads (when `IDLE_THREADS` enabled) 2. **Event-Driven I/O** - Uses `libev` for event loop management - Poll-based multiplexing handles multiple connections per thread - Epoll support for idle thread management on Linux 3. **Connection Pooling & Multiplexing** - Per-hostgroup connection pools - Connection multiplexing to reduce backend connections - Connection reuse based on session state 4. **Protocol Implementation** - Full MySQL protocol implementation (`MySQL_Protocol`) - PostgreSQL wire protocol support (`PgSQL_Protocol`) - Protocol-aware query parsing and routing ## Main Components and Relationships ### 1. Entry Point & Initialization - **File**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/src/main.cpp` - **Responsibilities**: - Process initialization and daemonization - Loads configuration from `proxysql.cfg` - Creates global variables structure - Starts all subsystems ### 2. Thread Architecture #### Thread Pool Design - **Consumer Thread Pattern**: Generic work queue for monitoring tasks ```cpp template class ConsumerThread : public Thread { wqueue*>& m_queue; } ``` - **Thread-Local Storage**: `__thread` variables for per-thread configuration - **Maintenance Threads**: Minimum 8 threads for housekeeping operations - **Event Loop Integration**: Epoll-based event handling for scalability #### MySQL Threads (`MySQL_Thread`) - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Thread.cpp`, `https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_Thread.h` - **Key Features**: - Worker threads handle MySQL client connections - Session management and query processing - Connection pool interaction - Thread-local statistics for lock-free updates - Query cache integration #### PgSQL Threads (`PgSQL_Thread`) - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/PgSQL_Thread.cpp`, `https://github.com/sysown/proxysql/tree/v3.0.agentics/include/PgSQL_Thread.h` - **Key Features**: - PostgreSQL protocol handling - SASL/SCRAM authentication support - Extended query protocol - Transaction state management ### 3. Session Management #### MySQL Session (`MySQL_Session`) - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Session.cpp`, `https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_Session.h` - **Responsibilities**: - Client authentication - Query lifecycle management - Backend connection assignment - State machine for protocol handling - Prepared statement management #### PgSQL Session (`PgSQL_Session`) - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/PgSQL_Session.cpp`, `https://github.com/sysown/proxysql/tree/v3.0.agentics/include/PgSQL_Session.h` - **Features**: - PostgreSQL authentication methods - Extended query protocol - Transaction state tracking ### 4. Connection Pool Management #### MySQL HostGroups Manager - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_HostGroups_Manager.cpp`, `https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_HostGroups_Manager.h` - **Key Concepts**: - Hostgroups logically group database servers - Connection pool per hostgroup - Server status tracking (ONLINE, SHUNNED, OFFLINE_SOFT, OFFLINE_HARD) - Connection health monitoring #### Read-Only Server Management (v2.5.1+) - **Evolution**: Improved from `read_only_action()` to `read_only_action_v2()` - **Batch Processing**: Processes multiple servers simultaneously - **State Transitions**: - `read_only=0`: Server promoted to writer hostgroup - `read_only=1`: Server moved to reader hostgroup - `writer_is_also_reader`: Controls writer presence in reader hostgroups - **Performance**: Optimized lock management and reduced database operations - Replication topology awareness (master/slave, Galera, Group Replication, Aurora) #### Connection States ``` ONLINE → SHUNNED (temporary failures) → OFFLINE_SOFT → OFFLINE_HARD ``` ### 5. Query Processing #### MySQL Query Processor - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Query_Processor.cpp`, `https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_Query_Processor.h` - **Functions**: - Rule-based query routing - Query rewriting capabilities - Query caching decisions - Query digest generation - GTID handling #### Query Digest Generation Pipeline - **Normalization Stages**: 1. **Comment Removal**: Hash (#), ANSI (--), C-style (/* */) 2. **Value Replacement**: Numbers and strings → `?` 3. **Spacing Normalization**: Collapse multiple spaces 4. **Grouping Algorithm**: `?,?,?,?` → `?,?,?,...` when exceeding limit 5. **NULL Handling**: Optional replacement based on `mysql-query_digests_replace_null` - **Implementation**: `c_tokenizer.cpp` using SpookyV2 hashing - **Known Limitations**: 12+ documented edge cases including buffer overruns, sign handling issues #### Query Rules Engine - Pattern matching (regex support) - Destination hostgroup routing - Query modification/rewriting - Cache TTL configuration - Query mirroring support - Fast routing optimization for simple patterns ### 6. Database Layer & Persistence #### SQLite3 Integration - **Admin Database**: Runtime configuration storage - **Stats Database**: Metrics and statistics - **Monitor Database**: Health check results - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/sqlite3db.cpp` #### Configuration Layers 1. **Disk**: Persistent configuration in SQLite 2. **Memory**: Runtime configuration tables 3. **Runtime**: Active configuration in use ### 7. Admin & Monitoring Interfaces #### Admin Interface (`ProxySQL_Admin`) - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/ProxySQL_Admin.cpp`, `https://github.com/sysown/proxysql/tree/v3.0.agentics/include/proxysql_admin.h` - **Features**: - MySQL-compatible admin interface (port 6032) - Configuration management via SQL - Runtime statistics access - Cluster synchronization #### SQLite3 Server - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/src/SQLite3_Server.cpp`, `https://github.com/sysown/proxysql/tree/v3.0.agentics/include/SQLite3_Server.h` - **Purpose**: SQL interface for admin operations #### Monitoring (`MySQL_Monitor`, `PgSQL_Monitor`) - Backend health checking - Replication lag monitoring - Read-only status detection - GTID tracking #### Galera Cluster Monitoring - **Health Check Query**: Monitors 8 critical Galera variables - `wsrep_local_state` (must be 4=SYNCED or 2=DONOR with conditions) - `wsrep_cluster_status` (Primary/Non-Primary detection) - `wsrep_desync`, `wsrep_reject_queries`, `pxc_maint_mode` - **Writer Selection**: Deterministic by `weight DESC, hostname DESC, port DESC` - **SHUNNED Status**: Preserves connections during writer transitions - **SST Handling**: Honors `wsrep_sst_donor_rejects_queries` - **Monitoring Intervals**: - `mysql-monitor_galera_healthcheck_interval`: 1000ms default - `mysql-monitor_galera_healthcheck_max_timeout_count`: 3 consecutive failures #### Bootstrap Mode - **Purpose**: Auto-configuration for MySQL Group Replication clusters - **Discovery Process**: 1. Connects to bootstrap server with optional SSL 2. Queries `performance_schema.replication_group_members` 3. Auto-discovers topology and creates configuration - **Account Creation**: Generates monitoring accounts with required permissions - **MySQL Router Compatibility**: Uses ports 6446 (RW) and 6447 (RO) - **Configuration Precedence**: Bootstrap → Config File → Command Line ### 8. Network & Protocol Handling #### Data Streams - **MySQL_Data_Stream**: MySQL protocol communication - **PgSQL_Data_Stream**: PostgreSQL protocol communication - Buffer management for network I/O - SSL/TLS support #### Protocol Parsers - MySQL command parsing - PostgreSQL message format handling - Prepared statement protocol - Result set handling ### 9. Advanced Features #### Query Cache - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Query_Cache.cpp`, `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/PgSQL_Query_Cache.cpp` - In-memory result caching - TTL-based expiration - Cache key generation from query digest #### Cluster Support (`ProxySQL_Cluster`) - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/ProxySQL_Cluster.cpp` - **Architecture**: Decentralized peer-to-peer with Core and Satellite nodes - **Synchronization Mechanism**: - SpookyV2 hash-based checksums for change detection - Version-based source of truth selection (version > 1 required) - Epoch timestamps for conflict resolution - Configurable diff thresholds before sync (default: 3) - **Protection Mechanisms**: - Circular fetching prevention through version checks - Split-brain detection with manual resolution - Pre-computed resultsets for performance (v2.4.3+) - **Network Optimization**: ~50KBps per node in 200-node cluster #### Statistics & Metrics - **Files**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/ProxySQL_Statistics.cpp` - Prometheus metrics integration - Query statistics - Connection pool metrics - Memory usage tracking ## Threading Model & Concurrency ### Thread Types 1. **Main Thread**: Initialization and coordination 2. **MySQL Worker Threads**: Handle MySQL client connections 3. **PgSQL Worker Threads**: Handle PostgreSQL connections 4. **Admin Thread**: Admin interface requests 5. **Monitor Threads**: Backend health monitoring 6. **Idle Threads**: Manage idle connections (optional) 7. **Cluster Threads**: Inter-proxy communication ### Synchronization Mechanisms - Read-write locks for configuration access - Mutexes for connection pool operations - Lock-free structures for statistics - Atomic operations for counters ## Configuration Management ### Configuration Sources 1. **Configuration File**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/src/proxysql.cfg` 2. **Command Line**: Override options 3. **Admin Interface**: Runtime modifications 4. **Cluster Sync**: Peer configuration updates ### Key Configuration Areas - `admin_variables`: Admin interface settings - `mysql_variables`: MySQL protocol settings - `pgsql_variables`: PostgreSQL settings - `mysql_servers`: Backend server definitions - `mysql_users`: User authentication - `mysql_query_rules`: Query routing rules ## Build System & Dependencies ### Build Configuration - **Makefile**: Main build configuration - C++11/17 support detection - Debug vs Release builds - Platform-specific optimizations ### Key Dependencies - **libev**: Event loop - **libmariadbclient**: MySQL protocol - **libpq**: PostgreSQL protocol - **sqlite3**: Embedded database - **jemalloc**: Memory allocator - **re2/pcre**: Regular expressions - **prometheus-cpp**: Metrics - **libmicrohttpd**: HTTP server - **clickhouse-cpp**: ClickHouse support ## Testing Framework ### Test Types - **TAP Tests**: `https://github.com/sysown/proxysql/tree/v3.0.agentics/test/tap/` - **Unit Tests**: Component-level testing - **Integration Tests**: Full stack testing - **Cluster Tests**: Multi-proxy scenarios ## Performance Optimizations ### Connection Pool Optimizations 1. **Multi-Tier Pool Management**: - Free connections pool per backend - Used connections tracking with statistics - Connection warming for pre-emptive establishment - Latency-aware connection selection - GTID-aware routing for consistency 2. **Pool Algorithms**: ```cpp // Connection retrieval with multiple criteria MySQL_Connection* get_MyConn_from_pool( uint32_t wait_until_ms, // Timeout control bool ff_flag, // Fast forward flag char* gtid_uuid, // GTID consistency uint64_t gtid_trxid, // Transaction ID int max_lag_ms // Max replication lag ) ``` 3. **Query Processing**: - **Fast Digest Path**: Optimized for queries > 100KB - **Multi-threaded Digesting**: 4 threads for parallel processing - **Regex Caching**: Compiled patterns cached in `regex_engine1/2` - **Digest Statistics**: Low-overhead tracking 4. **Memory Management**: - **Buffer Pools**: Reusable buffers for packet handling - **Statement Cache**: Prepared statement metadata caching - **Result Buffering**: Configurable strategies - **jemalloc Integration**: Optimized memory allocation 5. **Lock-Free Structures**: - Thread-local statistics counters - Lock-free query digest maps - Atomic operations for global counters - Per-thread configuration caching ## Monitoring & Observability ### Metrics Collection - Query response times - Connection pool efficiency - Backend server health - Memory usage patterns - Cache hit rates ### Interfaces - Admin interface statistics tables - Prometheus metrics endpoint - REST API for monitoring - Log files for debugging ## Security Features ### Authentication Architecture #### Multi-Stage Authentication Flow 1. **Initial Handshake**: Server greeting and capability negotiation 2. **SSL Negotiation**: Optional TLS upgrade 3. **Auth Plugin Negotiation**: Select authentication method 4. **Credential Verification**: Validate user credentials 5. **Session Establishment**: Create authenticated session #### Supported Authentication Methods - **mysql_native_password**: SHA1-based with fast path caching - **caching_sha2_password**: SHA256 with full/fast authentication modes (v2.6.0+) - **mysql_clear_password**: For LDAP integration - **SPIFFE Authentication**: Certificate-based passwordless auth - **Dual-Password Support**: Zero-downtime password rotation (v3.0+) - **Auth Plugin Switching**: Dynamic protocol adaptation - **PostgreSQL SCRAM**: SASL/SCRAM-SHA-256 support #### SSL/TLS Implementation - **Non-Standard mTLS**: Certificate verification occurs AFTER handshake completion - **SPIFFE Integration**: Only validates certificates with `spiffe://` SAN URIs - **Dynamic Certificate Reloading**: `PROXYSQL RELOAD TLS` without downtime (v2.3.0+) - **Known Limitation**: No SSL alert messages for certificate failures - **Future Enhancement**: Standard mTLS verification planned #### Authentication Caching - SHA1 passwords cached in `GloMyAuth` - Passwords cached for `caching_sha2_password` fast authentication - User attributes cached with JSON validation - Per-user connection limits and routing rules ### Security Controls - SSL/TLS support for client and backend connections - Query firewall with SQL injection detection - User-level query rules and access controls - Connection rate limiting per user/hostgroup - Audit logging capabilities ## Query Processing Pipeline ### Query Digest System - **Digest Computation**: Optimized for queries > 100KB - **Digest Structure**: ```cpp struct QP_query_digest_stats { uint64_t digest; time_t first_seen, last_seen; unsigned long long sum_time, min_time, max_time; unsigned long long rows_affected, rows_sent; } ``` ### Rule Processing Engine - **Weighted Routing**: Rules can specify multiple destinations with weights - **Rule Chaining**: `next_query_flagIN` enables sequential processing - **Query Mirroring**: Mirror queries to secondary hostgroups - **Query Rewriting**: Pattern-based query transformation - **Cache Control**: Per-rule cache TTL settings ### Advanced Rule Features - **flagOUT Routing**: Multi-destination with load balancing - **Regex Optimization**: Compiled patterns cached for performance - **Conditional Logic**: Username, schema, client address matching - **Error Injection**: Custom error messages for blocked queries - **Sticky Sessions**: Maintain connection affinity ## High Availability Features ### Backend Management 1. **Server State Management**: ```cpp enum MySerStatus { MYSQL_SERVER_STATUS_ONLINE = 0, MYSQL_SERVER_STATUS_SHUNNED = 1, MYSQL_SERVER_STATUS_OFFLINE_SOFT = 2, MYSQL_SERVER_STATUS_OFFLINE_HARD = 3, MYSQL_SERVER_STATUS_SHUNNED_REPLICATION_LAG = 4 } ``` 2. **Automatic Server Management**: - Auto-shunning on connection errors - Weighted distribution across servers - Per-server connection limits - Compression support (0-102400 bytes) - Per-server SSL configuration 3. **Health Monitoring**: - Connect checks for basic connectivity - Ping checks for lightweight monitoring - Read-only status detection - Replication lag measurement - Group replication state tracking ### Cluster Synchronization #### Checksum-Based Sync - **Global Checksum**: Overall configuration state hash - **Module Checksums**: Individual module configuration tracking - **Epoch Tracking**: Version control for changes - **Diff-Based Sync**: Sync triggered after N differences #### Sync Decision Algorithm ``` IF (node_version > 1 AND (own_version == 1 OR node_epoch > own_epoch)) AND diff_check >= cluster_module_diffs_before_sync THEN sync_from_peer ``` #### Cluster Features - Automatic configuration propagation - Conflict resolution based on epochs - Selective module synchronization - Peer discovery and health checking ## Architecture Characteristics 1. **Scalability**: Horizontal scaling via clustering 2. **Performance**: High throughput optimization 3. **Configuration**: Extensive runtime configuration options 4. **Protocol Support**: Full MySQL and PostgreSQL protocol implementation 5. **Extensibility**: Plugin architecture for authentication and web interface 6. **Monitoring**: Built-in metrics and statistics collection ## Design Decisions 1. **Multi-threaded over Multi-process**: Resource sharing 2. **SQLite for Configuration**: ACID compliance, SQL interface 3. **Connection Pooling per Hostgroup**: Isolation between hostgroups 4. **Protocol-aware Proxy**: Packet inspection and manipulation 5. **Checksum-based Clustering**: Configuration synchronization ## Architecture Extensions Architecture supports: - Additional database protocols - Alternative caching strategies - Custom routing algorithms - Extended monitoring capabilities - Cloud-native deployments