19 KiB
ProxySQL Architecture Overview
⚠️ Important Notice: This documentation was generated by AI and may contain inaccuracies. It should be used as a starting point for exploration only. Always verify critical information against the actual source code.
Last AI Update: 2025-09-11 Status: NON-VERIFIED Maintainer: Rene Cannao
Executive Summary
ProxySQL is a MySQL and PostgreSQL protocol-aware proxy server written in C++11/17. It implements a multi-threaded architecture with connection pooling, query routing, caching, and monitoring.
System Architecture
Core Design Patterns
-
Multi-Threaded Worker Model
- MySQL worker threads (
MySQL_Thread) handle client connections - PgSQL worker threads (
PgSQL_Thread) for PostgreSQL support - Admin threads for configuration management
- Monitor threads for backend health checking
- Idle connection management threads (when
IDLE_THREADSenabled)
- MySQL worker threads (
-
Event-Driven I/O
- Uses
libevfor event loop management - Poll-based multiplexing handles multiple connections per thread
- Epoll support for idle thread management on Linux
- Uses
-
Connection Pooling & Multiplexing
- Per-hostgroup connection pools
- Connection multiplexing to reduce backend connections
- Connection reuse based on session state
-
Protocol Implementation
- Full MySQL protocol implementation (
MySQL_Protocol) - PostgreSQL wire protocol support (
PgSQL_Protocol) - Protocol-aware query parsing and routing
- Full MySQL protocol implementation (
Main Components and Relationships
1. Entry Point & Initialization
- File:
https://github.com/sysown/proxysql/tree/v3.0.agentics/src/main.cpp - Responsibilities:
- Process initialization and daemonization
- Loads configuration from
proxysql.cfg - Creates global variables structure
- Starts all subsystems
2. Thread Architecture
Thread Pool Design
- Consumer Thread Pattern: Generic work queue for monitoring tasks
template<typename T> class ConsumerThread : public Thread { wqueue<WorkItem<T>*>& m_queue; } - Thread-Local Storage:
__threadvariables for per-thread configuration - Maintenance Threads: Minimum 8 threads for housekeeping operations
- Event Loop Integration: Epoll-based event handling for scalability
MySQL Threads (MySQL_Thread)
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Thread.cpp,https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_Thread.h - Key Features:
- Worker threads handle MySQL client connections
- Session management and query processing
- Connection pool interaction
- Thread-local statistics for lock-free updates
- Query cache integration
PgSQL Threads (PgSQL_Thread)
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/PgSQL_Thread.cpp,https://github.com/sysown/proxysql/tree/v3.0.agentics/include/PgSQL_Thread.h - Key Features:
- PostgreSQL protocol handling
- SASL/SCRAM authentication support
- Extended query protocol
- Transaction state management
3. Session Management
MySQL Session (MySQL_Session)
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Session.cpp,https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_Session.h - Responsibilities:
- Client authentication
- Query lifecycle management
- Backend connection assignment
- State machine for protocol handling
- Prepared statement management
PgSQL Session (PgSQL_Session)
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/PgSQL_Session.cpp,https://github.com/sysown/proxysql/tree/v3.0.agentics/include/PgSQL_Session.h - Features:
- PostgreSQL authentication methods
- Extended query protocol
- Transaction state tracking
4. Connection Pool Management
MySQL HostGroups Manager
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_HostGroups_Manager.cpp,https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_HostGroups_Manager.h - Key Concepts:
- Hostgroups logically group database servers
- Connection pool per hostgroup
- Server status tracking (ONLINE, SHUNNED, OFFLINE_SOFT, OFFLINE_HARD)
- Connection health monitoring
Read-Only Server Management (v2.5.1+)
- Evolution: Improved from
read_only_action()toread_only_action_v2() - Batch Processing: Processes multiple servers simultaneously
- State Transitions:
read_only=0: Server promoted to writer hostgroupread_only=1: Server moved to reader hostgroupwriter_is_also_reader: Controls writer presence in reader hostgroups
- Performance: Optimized lock management and reduced database operations
- Replication topology awareness (master/slave, Galera, Group Replication, Aurora)
Connection States
ONLINE → SHUNNED (temporary failures) → OFFLINE_SOFT → OFFLINE_HARD
5. Query Processing
MySQL Query Processor
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Query_Processor.cpp,https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_Query_Processor.h - Functions:
- Rule-based query routing
- Query rewriting capabilities
- Query caching decisions
- Query digest generation
- GTID handling
Query Digest Generation Pipeline
- Normalization Stages:
- Comment Removal: Hash (#), ANSI (--), C-style (/* */)
- Value Replacement: Numbers and strings →
? - Spacing Normalization: Collapse multiple spaces
- Grouping Algorithm:
?,?,?,?→?,?,?,...when exceeding limit - NULL Handling: Optional replacement based on
mysql-query_digests_replace_null
- Implementation:
c_tokenizer.cppusing SpookyV2 hashing - Known Limitations: 12+ documented edge cases including buffer overruns, sign handling issues
Query Rules Engine
- Pattern matching (regex support)
- Destination hostgroup routing
- Query modification/rewriting
- Cache TTL configuration
- Query mirroring support
- Fast routing optimization for simple patterns
6. Database Layer & Persistence
SQLite3 Integration
- Admin Database: Runtime configuration storage
- Stats Database: Metrics and statistics
- Monitor Database: Health check results
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/sqlite3db.cpp
Configuration Layers
- Disk: Persistent configuration in SQLite
- Memory: Runtime configuration tables
- Runtime: Active configuration in use
7. Admin & Monitoring Interfaces
Admin Interface (ProxySQL_Admin)
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/ProxySQL_Admin.cpp,https://github.com/sysown/proxysql/tree/v3.0.agentics/include/proxysql_admin.h - Features:
- MySQL-compatible admin interface (port 6032)
- Configuration management via SQL
- Runtime statistics access
- Cluster synchronization
SQLite3 Server
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/src/SQLite3_Server.cpp,https://github.com/sysown/proxysql/tree/v3.0.agentics/include/SQLite3_Server.h - Purpose: SQL interface for admin operations
Monitoring (MySQL_Monitor, PgSQL_Monitor)
- Backend health checking
- Replication lag monitoring
- Read-only status detection
- GTID tracking
Galera Cluster Monitoring
- Health Check Query: Monitors 8 critical Galera variables
wsrep_local_state(must be 4=SYNCED or 2=DONOR with conditions)wsrep_cluster_status(Primary/Non-Primary detection)wsrep_desync,wsrep_reject_queries,pxc_maint_mode
- Writer Selection: Deterministic by
weight DESC, hostname DESC, port DESC - SHUNNED Status: Preserves connections during writer transitions
- SST Handling: Honors
wsrep_sst_donor_rejects_queries - Monitoring Intervals:
mysql-monitor_galera_healthcheck_interval: 1000ms defaultmysql-monitor_galera_healthcheck_max_timeout_count: 3 consecutive failures
Bootstrap Mode
- Purpose: Auto-configuration for MySQL Group Replication clusters
- Discovery Process:
- Connects to bootstrap server with optional SSL
- Queries
performance_schema.replication_group_members - Auto-discovers topology and creates configuration
- Account Creation: Generates monitoring accounts with required permissions
- MySQL Router Compatibility: Uses ports 6446 (RW) and 6447 (RO)
- Configuration Precedence: Bootstrap → Config File → Command Line
8. Network & Protocol Handling
Data Streams
- MySQL_Data_Stream: MySQL protocol communication
- PgSQL_Data_Stream: PostgreSQL protocol communication
- Buffer management for network I/O
- SSL/TLS support
Protocol Parsers
- MySQL command parsing
- PostgreSQL message format handling
- Prepared statement protocol
- Result set handling
9. Advanced Features
Query Cache
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Query_Cache.cpp,https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/PgSQL_Query_Cache.cpp - In-memory result caching
- TTL-based expiration
- Cache key generation from query digest
Cluster Support (ProxySQL_Cluster)
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/ProxySQL_Cluster.cpp - Architecture: Decentralized peer-to-peer with Core and Satellite nodes
- Synchronization Mechanism:
- SpookyV2 hash-based checksums for change detection
- Version-based source of truth selection (version > 1 required)
- Epoch timestamps for conflict resolution
- Configurable diff thresholds before sync (default: 3)
- Protection Mechanisms:
- Circular fetching prevention through version checks
- Split-brain detection with manual resolution
- Pre-computed resultsets for performance (v2.4.3+)
- Network Optimization: ~50KBps per node in 200-node cluster
Statistics & Metrics
- Files:
https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/ProxySQL_Statistics.cpp - Prometheus metrics integration
- Query statistics
- Connection pool metrics
- Memory usage tracking
Threading Model & Concurrency
Thread Types
- Main Thread: Initialization and coordination
- MySQL Worker Threads: Handle MySQL client connections
- PgSQL Worker Threads: Handle PostgreSQL connections
- Admin Thread: Admin interface requests
- Monitor Threads: Backend health monitoring
- Idle Threads: Manage idle connections (optional)
- Cluster Threads: Inter-proxy communication
Synchronization Mechanisms
- Read-write locks for configuration access
- Mutexes for connection pool operations
- Lock-free structures for statistics
- Atomic operations for counters
Configuration Management
Configuration Sources
- Configuration File:
https://github.com/sysown/proxysql/tree/v3.0.agentics/src/proxysql.cfg - Command Line: Override options
- Admin Interface: Runtime modifications
- Cluster Sync: Peer configuration updates
Key Configuration Areas
admin_variables: Admin interface settingsmysql_variables: MySQL protocol settingspgsql_variables: PostgreSQL settingsmysql_servers: Backend server definitionsmysql_users: User authenticationmysql_query_rules: Query routing rules
Build System & Dependencies
Build Configuration
- Makefile: Main build configuration
- C++11/17 support detection
- Debug vs Release builds
- Platform-specific optimizations
Key Dependencies
- libev: Event loop
- libmariadbclient: MySQL protocol
- libpq: PostgreSQL protocol
- sqlite3: Embedded database
- jemalloc: Memory allocator
- re2/pcre: Regular expressions
- prometheus-cpp: Metrics
- libmicrohttpd: HTTP server
- clickhouse-cpp: ClickHouse support
Testing Framework
Test Types
- TAP Tests:
https://github.com/sysown/proxysql/tree/v3.0.agentics/test/tap/ - Unit Tests: Component-level testing
- Integration Tests: Full stack testing
- Cluster Tests: Multi-proxy scenarios
Performance Optimizations
Connection Pool Optimizations
-
Multi-Tier Pool Management:
- Free connections pool per backend
- Used connections tracking with statistics
- Connection warming for pre-emptive establishment
- Latency-aware connection selection
- GTID-aware routing for consistency
-
Pool Algorithms:
// Connection retrieval with multiple criteria MySQL_Connection* get_MyConn_from_pool( uint32_t wait_until_ms, // Timeout control bool ff_flag, // Fast forward flag char* gtid_uuid, // GTID consistency uint64_t gtid_trxid, // Transaction ID int max_lag_ms // Max replication lag ) -
Query Processing:
- Fast Digest Path: Optimized for queries > 100KB
- Multi-threaded Digesting: 4 threads for parallel processing
- Regex Caching: Compiled patterns cached in
regex_engine1/2 - Digest Statistics: Low-overhead tracking
-
Memory Management:
- Buffer Pools: Reusable buffers for packet handling
- Statement Cache: Prepared statement metadata caching
- Result Buffering: Configurable strategies
- jemalloc Integration: Optimized memory allocation
-
Lock-Free Structures:
- Thread-local statistics counters
- Lock-free query digest maps
- Atomic operations for global counters
- Per-thread configuration caching
Monitoring & Observability
Metrics Collection
- Query response times
- Connection pool efficiency
- Backend server health
- Memory usage patterns
- Cache hit rates
Interfaces
- Admin interface statistics tables
- Prometheus metrics endpoint
- REST API for monitoring
- Log files for debugging
Security Features
Authentication Architecture
Multi-Stage Authentication Flow
- Initial Handshake: Server greeting and capability negotiation
- SSL Negotiation: Optional TLS upgrade
- Auth Plugin Negotiation: Select authentication method
- Credential Verification: Validate user credentials
- Session Establishment: Create authenticated session
Supported Authentication Methods
- mysql_native_password: SHA1-based with fast path caching
- caching_sha2_password: SHA256 with full/fast authentication modes (v2.6.0+)
- mysql_clear_password: For LDAP integration
- SPIFFE Authentication: Certificate-based passwordless auth
- Dual-Password Support: Zero-downtime password rotation (v3.0+)
- Auth Plugin Switching: Dynamic protocol adaptation
- PostgreSQL SCRAM: SASL/SCRAM-SHA-256 support
SSL/TLS Implementation
- Non-Standard mTLS: Certificate verification occurs AFTER handshake completion
- SPIFFE Integration: Only validates certificates with
spiffe://SAN URIs - Dynamic Certificate Reloading:
PROXYSQL RELOAD TLSwithout downtime (v2.3.0+) - Known Limitation: No SSL alert messages for certificate failures
- Future Enhancement: Standard mTLS verification planned
Authentication Caching
- SHA1 passwords cached in
GloMyAuth - Passwords cached for
caching_sha2_passwordfast authentication - User attributes cached with JSON validation
- Per-user connection limits and routing rules
Security Controls
- SSL/TLS support for client and backend connections
- Query firewall with SQL injection detection
- User-level query rules and access controls
- Connection rate limiting per user/hostgroup
- Audit logging capabilities
Query Processing Pipeline
Query Digest System
- Digest Computation: Optimized for queries > 100KB
- Digest Structure:
struct QP_query_digest_stats { uint64_t digest; time_t first_seen, last_seen; unsigned long long sum_time, min_time, max_time; unsigned long long rows_affected, rows_sent; }
Rule Processing Engine
- Weighted Routing: Rules can specify multiple destinations with weights
- Rule Chaining:
next_query_flagINenables sequential processing - Query Mirroring: Mirror queries to secondary hostgroups
- Query Rewriting: Pattern-based query transformation
- Cache Control: Per-rule cache TTL settings
Advanced Rule Features
- flagOUT Routing: Multi-destination with load balancing
- Regex Optimization: Compiled patterns cached for performance
- Conditional Logic: Username, schema, client address matching
- Error Injection: Custom error messages for blocked queries
- Sticky Sessions: Maintain connection affinity
High Availability Features
Backend Management
-
Server State Management:
enum MySerStatus { MYSQL_SERVER_STATUS_ONLINE = 0, MYSQL_SERVER_STATUS_SHUNNED = 1, MYSQL_SERVER_STATUS_OFFLINE_SOFT = 2, MYSQL_SERVER_STATUS_OFFLINE_HARD = 3, MYSQL_SERVER_STATUS_SHUNNED_REPLICATION_LAG = 4 } -
Automatic Server Management:
- Auto-shunning on connection errors
- Weighted distribution across servers
- Per-server connection limits
- Compression support (0-102400 bytes)
- Per-server SSL configuration
-
Health Monitoring:
- Connect checks for basic connectivity
- Ping checks for lightweight monitoring
- Read-only status detection
- Replication lag measurement
- Group replication state tracking
Cluster Synchronization
Checksum-Based Sync
- Global Checksum: Overall configuration state hash
- Module Checksums: Individual module configuration tracking
- Epoch Tracking: Version control for changes
- Diff-Based Sync: Sync triggered after N differences
Sync Decision Algorithm
IF (node_version > 1 AND
(own_version == 1 OR node_epoch > own_epoch))
AND diff_check >= cluster_module_diffs_before_sync
THEN sync_from_peer
Cluster Features
- Automatic configuration propagation
- Conflict resolution based on epochs
- Selective module synchronization
- Peer discovery and health checking
Architecture Characteristics
- Scalability: Horizontal scaling via clustering
- Performance: High throughput optimization
- Configuration: Extensive runtime configuration options
- Protocol Support: Full MySQL and PostgreSQL protocol implementation
- Extensibility: Plugin architecture for authentication and web interface
- Monitoring: Built-in metrics and statistics collection
Design Decisions
- Multi-threaded over Multi-process: Resource sharing
- SQLite for Configuration: ACID compliance, SQL interface
- Connection Pooling per Hostgroup: Isolation between hostgroups
- Protocol-aware Proxy: Packet inspection and manipulation
- Checksum-based Clustering: Configuration synchronization
Architecture Extensions
Architecture supports:
- Additional database protocols
- Alternative caching strategies
- Custom routing algorithms
- Extended monitoring capabilities
- Cloud-native deployments