You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
proxysql/doc/ai-generated/architecture/ARCHITECTURE-OVERVIEW.md

19 KiB

ProxySQL Architecture Overview

⚠️ Important Notice: This documentation was generated by AI and may contain inaccuracies. It should be used as a starting point for exploration only. Always verify critical information against the actual source code.

Last AI Update: 2025-09-11 Status: NON-VERIFIED Maintainer: Rene Cannao

Executive Summary

ProxySQL is a MySQL and PostgreSQL protocol-aware proxy server written in C++11/17. It implements a multi-threaded architecture with connection pooling, query routing, caching, and monitoring.

System Architecture

Core Design Patterns

  1. Multi-Threaded Worker Model

    • MySQL worker threads (MySQL_Thread) handle client connections
    • PgSQL worker threads (PgSQL_Thread) for PostgreSQL support
    • Admin threads for configuration management
    • Monitor threads for backend health checking
    • Idle connection management threads (when IDLE_THREADS enabled)
  2. Event-Driven I/O

    • Uses libev for event loop management
    • Poll-based multiplexing handles multiple connections per thread
    • Epoll support for idle thread management on Linux
  3. Connection Pooling & Multiplexing

    • Per-hostgroup connection pools
    • Connection multiplexing to reduce backend connections
    • Connection reuse based on session state
  4. Protocol Implementation

    • Full MySQL protocol implementation (MySQL_Protocol)
    • PostgreSQL wire protocol support (PgSQL_Protocol)
    • Protocol-aware query parsing and routing

Main Components and Relationships

1. Entry Point & Initialization

  • File: https://github.com/sysown/proxysql/tree/v3.0.agentics/src/main.cpp
  • Responsibilities:
    • Process initialization and daemonization
    • Loads configuration from proxysql.cfg
    • Creates global variables structure
    • Starts all subsystems

2. Thread Architecture

Thread Pool Design

  • Consumer Thread Pattern: Generic work queue for monitoring tasks
    template<typename T>
    class ConsumerThread : public Thread {
      wqueue<WorkItem<T>*>& m_queue;
    }
    
  • Thread-Local Storage: __thread variables for per-thread configuration
  • Maintenance Threads: Minimum 8 threads for housekeeping operations
  • Event Loop Integration: Epoll-based event handling for scalability

MySQL Threads (MySQL_Thread)

  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Thread.cpp, https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_Thread.h
  • Key Features:
    • Worker threads handle MySQL client connections
    • Session management and query processing
    • Connection pool interaction
    • Thread-local statistics for lock-free updates
    • Query cache integration

PgSQL Threads (PgSQL_Thread)

  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/PgSQL_Thread.cpp, https://github.com/sysown/proxysql/tree/v3.0.agentics/include/PgSQL_Thread.h
  • Key Features:
    • PostgreSQL protocol handling
    • SASL/SCRAM authentication support
    • Extended query protocol
    • Transaction state management

3. Session Management

MySQL Session (MySQL_Session)

  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Session.cpp, https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_Session.h
  • Responsibilities:
    • Client authentication
    • Query lifecycle management
    • Backend connection assignment
    • State machine for protocol handling
    • Prepared statement management

PgSQL Session (PgSQL_Session)

  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/PgSQL_Session.cpp, https://github.com/sysown/proxysql/tree/v3.0.agentics/include/PgSQL_Session.h
  • Features:
    • PostgreSQL authentication methods
    • Extended query protocol
    • Transaction state tracking

4. Connection Pool Management

MySQL HostGroups Manager

  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_HostGroups_Manager.cpp, https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_HostGroups_Manager.h
  • Key Concepts:
    • Hostgroups logically group database servers
    • Connection pool per hostgroup
    • Server status tracking (ONLINE, SHUNNED, OFFLINE_SOFT, OFFLINE_HARD)
    • Connection health monitoring

Read-Only Server Management (v2.5.1+)

  • Evolution: Improved from read_only_action() to read_only_action_v2()
  • Batch Processing: Processes multiple servers simultaneously
  • State Transitions:
    • read_only=0: Server promoted to writer hostgroup
    • read_only=1: Server moved to reader hostgroup
    • writer_is_also_reader: Controls writer presence in reader hostgroups
  • Performance: Optimized lock management and reduced database operations
    • Replication topology awareness (master/slave, Galera, Group Replication, Aurora)

Connection States

ONLINE → SHUNNED (temporary failures) → OFFLINE_SOFT → OFFLINE_HARD

5. Query Processing

MySQL Query Processor

  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Query_Processor.cpp, https://github.com/sysown/proxysql/tree/v3.0.agentics/include/MySQL_Query_Processor.h
  • Functions:
    • Rule-based query routing
    • Query rewriting capabilities
    • Query caching decisions
    • Query digest generation
    • GTID handling

Query Digest Generation Pipeline

  • Normalization Stages:
    1. Comment Removal: Hash (#), ANSI (--), C-style (/* */)
    2. Value Replacement: Numbers and strings → ?
    3. Spacing Normalization: Collapse multiple spaces
    4. Grouping Algorithm: ?,?,?,??,?,?,... when exceeding limit
    5. NULL Handling: Optional replacement based on mysql-query_digests_replace_null
  • Implementation: c_tokenizer.cpp using SpookyV2 hashing
  • Known Limitations: 12+ documented edge cases including buffer overruns, sign handling issues

Query Rules Engine

  • Pattern matching (regex support)
  • Destination hostgroup routing
  • Query modification/rewriting
  • Cache TTL configuration
  • Query mirroring support
  • Fast routing optimization for simple patterns

6. Database Layer & Persistence

SQLite3 Integration

  • Admin Database: Runtime configuration storage
  • Stats Database: Metrics and statistics
  • Monitor Database: Health check results
  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/sqlite3db.cpp

Configuration Layers

  1. Disk: Persistent configuration in SQLite
  2. Memory: Runtime configuration tables
  3. Runtime: Active configuration in use

7. Admin & Monitoring Interfaces

Admin Interface (ProxySQL_Admin)

  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/ProxySQL_Admin.cpp, https://github.com/sysown/proxysql/tree/v3.0.agentics/include/proxysql_admin.h
  • Features:
    • MySQL-compatible admin interface (port 6032)
    • Configuration management via SQL
    • Runtime statistics access
    • Cluster synchronization

SQLite3 Server

  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/src/SQLite3_Server.cpp, https://github.com/sysown/proxysql/tree/v3.0.agentics/include/SQLite3_Server.h
  • Purpose: SQL interface for admin operations

Monitoring (MySQL_Monitor, PgSQL_Monitor)

  • Backend health checking
  • Replication lag monitoring
  • Read-only status detection
  • GTID tracking

Galera Cluster Monitoring

  • Health Check Query: Monitors 8 critical Galera variables
    • wsrep_local_state (must be 4=SYNCED or 2=DONOR with conditions)
    • wsrep_cluster_status (Primary/Non-Primary detection)
    • wsrep_desync, wsrep_reject_queries, pxc_maint_mode
  • Writer Selection: Deterministic by weight DESC, hostname DESC, port DESC
  • SHUNNED Status: Preserves connections during writer transitions
  • SST Handling: Honors wsrep_sst_donor_rejects_queries
  • Monitoring Intervals:
    • mysql-monitor_galera_healthcheck_interval: 1000ms default
    • mysql-monitor_galera_healthcheck_max_timeout_count: 3 consecutive failures

Bootstrap Mode

  • Purpose: Auto-configuration for MySQL Group Replication clusters
  • Discovery Process:
    1. Connects to bootstrap server with optional SSL
    2. Queries performance_schema.replication_group_members
    3. Auto-discovers topology and creates configuration
  • Account Creation: Generates monitoring accounts with required permissions
  • MySQL Router Compatibility: Uses ports 6446 (RW) and 6447 (RO)
  • Configuration Precedence: Bootstrap → Config File → Command Line

8. Network & Protocol Handling

Data Streams

  • MySQL_Data_Stream: MySQL protocol communication
  • PgSQL_Data_Stream: PostgreSQL protocol communication
  • Buffer management for network I/O
  • SSL/TLS support

Protocol Parsers

  • MySQL command parsing
  • PostgreSQL message format handling
  • Prepared statement protocol
  • Result set handling

9. Advanced Features

Query Cache

  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/MySQL_Query_Cache.cpp, https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/PgSQL_Query_Cache.cpp
  • In-memory result caching
  • TTL-based expiration
  • Cache key generation from query digest

Cluster Support (ProxySQL_Cluster)

  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/ProxySQL_Cluster.cpp
  • Architecture: Decentralized peer-to-peer with Core and Satellite nodes
  • Synchronization Mechanism:
    • SpookyV2 hash-based checksums for change detection
    • Version-based source of truth selection (version > 1 required)
    • Epoch timestamps for conflict resolution
    • Configurable diff thresholds before sync (default: 3)
  • Protection Mechanisms:
    • Circular fetching prevention through version checks
    • Split-brain detection with manual resolution
    • Pre-computed resultsets for performance (v2.4.3+)
  • Network Optimization: ~50KBps per node in 200-node cluster

Statistics & Metrics

  • Files: https://github.com/sysown/proxysql/tree/v3.0.agentics/lib/ProxySQL_Statistics.cpp
  • Prometheus metrics integration
  • Query statistics
  • Connection pool metrics
  • Memory usage tracking

Threading Model & Concurrency

Thread Types

  1. Main Thread: Initialization and coordination
  2. MySQL Worker Threads: Handle MySQL client connections
  3. PgSQL Worker Threads: Handle PostgreSQL connections
  4. Admin Thread: Admin interface requests
  5. Monitor Threads: Backend health monitoring
  6. Idle Threads: Manage idle connections (optional)
  7. Cluster Threads: Inter-proxy communication

Synchronization Mechanisms

  • Read-write locks for configuration access
  • Mutexes for connection pool operations
  • Lock-free structures for statistics
  • Atomic operations for counters

Configuration Management

Configuration Sources

  1. Configuration File: https://github.com/sysown/proxysql/tree/v3.0.agentics/src/proxysql.cfg
  2. Command Line: Override options
  3. Admin Interface: Runtime modifications
  4. Cluster Sync: Peer configuration updates

Key Configuration Areas

  • admin_variables: Admin interface settings
  • mysql_variables: MySQL protocol settings
  • pgsql_variables: PostgreSQL settings
  • mysql_servers: Backend server definitions
  • mysql_users: User authentication
  • mysql_query_rules: Query routing rules

Build System & Dependencies

Build Configuration

  • Makefile: Main build configuration
  • C++11/17 support detection
  • Debug vs Release builds
  • Platform-specific optimizations

Key Dependencies

  • libev: Event loop
  • libmariadbclient: MySQL protocol
  • libpq: PostgreSQL protocol
  • sqlite3: Embedded database
  • jemalloc: Memory allocator
  • re2/pcre: Regular expressions
  • prometheus-cpp: Metrics
  • libmicrohttpd: HTTP server
  • clickhouse-cpp: ClickHouse support

Testing Framework

Test Types

  • TAP Tests: https://github.com/sysown/proxysql/tree/v3.0.agentics/test/tap/
  • Unit Tests: Component-level testing
  • Integration Tests: Full stack testing
  • Cluster Tests: Multi-proxy scenarios

Performance Optimizations

Connection Pool Optimizations

  1. Multi-Tier Pool Management:

    • Free connections pool per backend
    • Used connections tracking with statistics
    • Connection warming for pre-emptive establishment
    • Latency-aware connection selection
    • GTID-aware routing for consistency
  2. Pool Algorithms:

    // Connection retrieval with multiple criteria
    MySQL_Connection* get_MyConn_from_pool(
      uint32_t wait_until_ms,  // Timeout control
      bool ff_flag,             // Fast forward flag
      char* gtid_uuid,          // GTID consistency
      uint64_t gtid_trxid,      // Transaction ID
      int max_lag_ms            // Max replication lag
    )
    
  3. Query Processing:

    • Fast Digest Path: Optimized for queries > 100KB
    • Multi-threaded Digesting: 4 threads for parallel processing
    • Regex Caching: Compiled patterns cached in regex_engine1/2
    • Digest Statistics: Low-overhead tracking
  4. Memory Management:

    • Buffer Pools: Reusable buffers for packet handling
    • Statement Cache: Prepared statement metadata caching
    • Result Buffering: Configurable strategies
    • jemalloc Integration: Optimized memory allocation
  5. Lock-Free Structures:

    • Thread-local statistics counters
    • Lock-free query digest maps
    • Atomic operations for global counters
    • Per-thread configuration caching

Monitoring & Observability

Metrics Collection

  • Query response times
  • Connection pool efficiency
  • Backend server health
  • Memory usage patterns
  • Cache hit rates

Interfaces

  • Admin interface statistics tables
  • Prometheus metrics endpoint
  • REST API for monitoring
  • Log files for debugging

Security Features

Authentication Architecture

Multi-Stage Authentication Flow

  1. Initial Handshake: Server greeting and capability negotiation
  2. SSL Negotiation: Optional TLS upgrade
  3. Auth Plugin Negotiation: Select authentication method
  4. Credential Verification: Validate user credentials
  5. Session Establishment: Create authenticated session

Supported Authentication Methods

  • mysql_native_password: SHA1-based with fast path caching
  • caching_sha2_password: SHA256 with full/fast authentication modes (v2.6.0+)
  • mysql_clear_password: For LDAP integration
  • SPIFFE Authentication: Certificate-based passwordless auth
  • Dual-Password Support: Zero-downtime password rotation (v3.0+)
  • Auth Plugin Switching: Dynamic protocol adaptation
  • PostgreSQL SCRAM: SASL/SCRAM-SHA-256 support

SSL/TLS Implementation

  • Non-Standard mTLS: Certificate verification occurs AFTER handshake completion
  • SPIFFE Integration: Only validates certificates with spiffe:// SAN URIs
  • Dynamic Certificate Reloading: PROXYSQL RELOAD TLS without downtime (v2.3.0+)
  • Known Limitation: No SSL alert messages for certificate failures
  • Future Enhancement: Standard mTLS verification planned

Authentication Caching

  • SHA1 passwords cached in GloMyAuth
  • Passwords cached for caching_sha2_password fast authentication
  • User attributes cached with JSON validation
  • Per-user connection limits and routing rules

Security Controls

  • SSL/TLS support for client and backend connections
  • Query firewall with SQL injection detection
  • User-level query rules and access controls
  • Connection rate limiting per user/hostgroup
  • Audit logging capabilities

Query Processing Pipeline

Query Digest System

  • Digest Computation: Optimized for queries > 100KB
  • Digest Structure:
    struct QP_query_digest_stats {
      uint64_t digest;
      time_t first_seen, last_seen;
      unsigned long long sum_time, min_time, max_time;
      unsigned long long rows_affected, rows_sent;
    }
    

Rule Processing Engine

  • Weighted Routing: Rules can specify multiple destinations with weights
  • Rule Chaining: next_query_flagIN enables sequential processing
  • Query Mirroring: Mirror queries to secondary hostgroups
  • Query Rewriting: Pattern-based query transformation
  • Cache Control: Per-rule cache TTL settings

Advanced Rule Features

  • flagOUT Routing: Multi-destination with load balancing
  • Regex Optimization: Compiled patterns cached for performance
  • Conditional Logic: Username, schema, client address matching
  • Error Injection: Custom error messages for blocked queries
  • Sticky Sessions: Maintain connection affinity

High Availability Features

Backend Management

  1. Server State Management:

    enum MySerStatus {
      MYSQL_SERVER_STATUS_ONLINE = 0,
      MYSQL_SERVER_STATUS_SHUNNED = 1,
      MYSQL_SERVER_STATUS_OFFLINE_SOFT = 2,
      MYSQL_SERVER_STATUS_OFFLINE_HARD = 3,
      MYSQL_SERVER_STATUS_SHUNNED_REPLICATION_LAG = 4
    }
    
  2. Automatic Server Management:

    • Auto-shunning on connection errors
    • Weighted distribution across servers
    • Per-server connection limits
    • Compression support (0-102400 bytes)
    • Per-server SSL configuration
  3. Health Monitoring:

    • Connect checks for basic connectivity
    • Ping checks for lightweight monitoring
    • Read-only status detection
    • Replication lag measurement
    • Group replication state tracking

Cluster Synchronization

Checksum-Based Sync

  • Global Checksum: Overall configuration state hash
  • Module Checksums: Individual module configuration tracking
  • Epoch Tracking: Version control for changes
  • Diff-Based Sync: Sync triggered after N differences

Sync Decision Algorithm

IF (node_version > 1 AND 
    (own_version == 1 OR node_epoch > own_epoch))
    AND diff_check >= cluster_module_diffs_before_sync
THEN sync_from_peer

Cluster Features

  • Automatic configuration propagation
  • Conflict resolution based on epochs
  • Selective module synchronization
  • Peer discovery and health checking

Architecture Characteristics

  1. Scalability: Horizontal scaling via clustering
  2. Performance: High throughput optimization
  3. Configuration: Extensive runtime configuration options
  4. Protocol Support: Full MySQL and PostgreSQL protocol implementation
  5. Extensibility: Plugin architecture for authentication and web interface
  6. Monitoring: Built-in metrics and statistics collection

Design Decisions

  1. Multi-threaded over Multi-process: Resource sharing
  2. SQLite for Configuration: ACID compliance, SQL interface
  3. Connection Pooling per Hostgroup: Isolation between hostgroups
  4. Protocol-aware Proxy: Packet inspection and manipulation
  5. Checksum-based Clustering: Configuration synchronization

Architecture Extensions

Architecture supports:

  • Additional database protocols
  • Alternative caching strategies
  • Custom routing algorithms
  • Extended monitoring capabilities
  • Cloud-native deployments