History

Rene Cannao 527bfed297 fix: Migrate AI variables to GenAI module for proper architecture This commit fixes a serious design flaw where AI configuration variables were not integrated with the ProxySQL admin interface. All ai_* variables have been migrated to the GenAI module as genai-* variables. Changes: - Added 21 new genai_* variables to GenAI_Thread.h structure - Implemented get/set functions for all new variables in GenAI_Thread.cpp - Removed internal variables struct from AI_Features_Manager - AI_Features_Manager now reads from GloGATH instead of internal state - Updated documentation to reference genai-* variables - Fixed debug.cpp assertion for PROXY_DEBUG_NL2SQL and PROXY_DEBUG_ANOMALY Variable mapping: - ai_nl2sql_enabled → genai-nl2sql_enabled - ai_anomaly_detection_enabled → genai-anomaly_enabled - ai_features_enabled → genai-enabled - All other ai_* variables follow the same pattern The flush functions automatically handle all variables in the genai_thread_variables_names array, so database persistence works correctly without additional changes. Related to: https://github.com/ProxySQL/proxysql-vec/pull/13		1 month ago
..
API.md	Fix gemini-code-assist recommendations and implement comprehensive anomaly detection tests	1 month ago
ARCHITECTURE.md	test: Add comprehensive tests and documentation for Anomaly Detection	1 month ago
README.md	fix: Migrate AI variables to GenAI module for proper architecture	1 month ago
TESTING.md	test: Add comprehensive tests and documentation for Anomaly Detection	1 month ago

README.md

Anomaly Detection - Security Threat Detection for ProxySQL

Overview

The Anomaly Detection module provides real-time security threat detection for ProxySQL using a multi-stage analysis pipeline. It identifies SQL injection attacks, unusual query patterns, rate limiting violations, and statistical anomalies.

Features

Multi-Stage Detection Pipeline: 5-layer analysis for comprehensive threat detection
SQL Injection Pattern Detection: Regex-based and keyword-based detection
Query Normalization: Advanced normalization for pattern matching
Rate Limiting: Per-user and per-host query rate tracking
Statistical Anomaly Detection: Z-score based outlier detection
Configurable Blocking: Auto-block or log-only modes
Prometheus Metrics: Native monitoring integration

Quick Start

1. Enable Anomaly Detection

-- Via admin interface
SET genai-anomaly_enabled='true';

2. Configure Detection

-- Set risk threshold (0-100)
SET genai-anomaly_risk_threshold='70';

-- Set rate limit (queries per minute)
SET genai-anomaly_rate_limit='100';

-- Enable auto-blocking
SET genai-anomaly_auto_block='true';

-- Or enable log-only mode
SET genai-anomaly_log_only='false';

3. Monitor Detection Results

-- Check statistics
SHOW STATUS LIKE 'ai_detected_anomalies';
SHOW STATUS LIKE 'ai_blocked_queries';

-- View Prometheus metrics
curl http://localhost:4200/metrics | grep proxysql_ai

Configuration

Variables

Variable	Default	Description
`genai-anomaly_enabled`	true	Enable/disable anomaly detection
`genai-anomaly_risk_threshold`	70	Risk score threshold (0-100) for blocking
`genai-anomaly_rate_limit`	100	Max queries per minute per user/host
`genai-anomaly_similarity_threshold`	85	Similarity threshold for embedding matching (0-100)
`genai-anomaly_auto_block`	true	Automatically block suspicious queries
`genai-anomaly_log_only`	false	Log anomalies without blocking

Status Variables

Variable	Description
`ai_detected_anomalies`	Total number of anomalies detected
`ai_blocked_queries`	Total number of queries blocked

Detection Methods

1. SQL Injection Pattern Detection

Detects common SQL injection patterns using regex and keyword matching:

Patterns Detected:

OR/AND tautologies: OR 1=1, AND 1=1
Quote sequences: '' OR ''=''
UNION SELECT: UNION SELECT
DROP TABLE: DROP TABLE
Comment injection: --, /* */
Hex encoding: 0x414243
CONCAT attacks: CONCAT(0x41, 0x42)
File operations: INTO OUTFILE, LOAD_FILE
Timing attacks: SLEEP(), BENCHMARK()

Example:

-- This query will be blocked:
SELECT * FROM users WHERE username='admin' OR 1=1--' AND password='xxx'

2. Query Normalization

Normalizes queries for consistent pattern matching:

Case normalization
Comment removal
Literal replacement
Whitespace normalization

Example:

-- Input:
SELECT * FROM users WHERE name='John' -- comment

-- Normalized:
select * from users where name=?

3. Rate Limiting

Tracks query rates per user and host:

Time window: 1 hour
Tracks: Query count, last query time
Action: Block when limit exceeded

Configuration:

SET ai_anomaly_rate_limit='100';

4. Statistical Anomaly Detection

Uses Z-score analysis to detect outliers:

Query execution time
Result set size
Query frequency
Schema access patterns

Example:

-- Unusually large result set:
SELECT * FROM huge_table -- May trigger statistical anomaly

5. Embedding-based Similarity

(Framework for future implementation) Detects similarity to known threat patterns using vector embeddings.

Examples

SQL Injection Detection

-- Blocked: OR 1=1 tautology
mysql> SELECT * FROM users WHERE username='admin' OR 1=1--';
ERROR 1313 (HY000): Query blocked: SQL injection pattern detected

-- Blocked: UNION SELECT
mysql> SELECT name FROM products WHERE id=1 UNION SELECT password FROM users;
ERROR 1313 (HY000): Query blocked: SQL injection pattern detected

-- Blocked: Comment injection
mysql> SELECT * FROM users WHERE id=1-- AND password='xxx';
ERROR 1313 (HY000): Query blocked: SQL injection pattern detected

Rate Limiting

-- Set low rate limit for testing
SET ai_anomaly_rate_limit='10';

-- After 10 queries in 1 minute:
mysql> SELECT 1;
ERROR 1313 (HY000): Query blocked: Rate limit exceeded for user 'app_user'

Statistical Anomaly

-- Unusual query pattern detected
mysql> SELECT * FROM users CROSS JOIN orders CROSS JOIN products;
-- May trigger: Statistical anomaly detected (high result count)

Log-Only Mode

For monitoring without blocking:

-- Enable log-only mode
SET ai_anomaly_log_only='true';
SET ai_anomaly_auto_block='false';

-- Queries will be logged but not blocked
-- Monitor via:
SHOW STATUS LIKE 'ai_detected_anomalies';

Monitoring

Prometheus Metrics

# View AI metrics
curl http://localhost:4200/metrics | grep proxysql_ai

# Output includes:
# proxysql_ai_detected_anomalies_total
# proxysql_ai_blocked_queries_total

Admin Interface

-- Check detection statistics
SELECT * FROM stats_mysql_global WHERE variable_name LIKE 'ai_%';

-- View current configuration
SELECT * FROM runtime_mysql_servers WHERE variable_name LIKE 'ai_anomaly_%';

Troubleshooting

Queries Being Blocked Incorrectly

Check if legitimate queries match patterns:
- Review the SQL injection patterns list
- Consider log-only mode for testing

Adjust risk threshold:

SET ai_anomaly_risk_threshold='80';  -- Higher threshold

Adjust rate limit:

SET ai_anomaly_rate_limit='200';  -- Higher limit

False Positives

If legitimate queries are being flagged:

Enable log-only mode to investigate:

SET ai_anomaly_log_only='true';
SET ai_anomaly_auto_block='false';

Check logs for specific patterns:
```
tail -f proxysql.log | grep "Anomaly:"
```
Adjust configuration based on findings

No Anomalies Detected

If detection seems inactive:

Verify anomaly detection is enabled:

SELECT * FROM runtime_mysql_servers WHERE variable_name='ai_anomaly_enabled';

Check logs for errors:
```
tail -f proxysql.log | grep "Anomaly:"
```
Verify AI features are initialized:
```
grep "AI_Features" proxysql.log
```

Security Considerations

Anomaly Detection is a Defense in Depth: It complements, not replaces, proper security practices
Pattern Evasion Possible: Attackers may evolve techniques; regular updates needed
Performance Impact: Detection adds minimal overhead (~1-2ms per query)
Log Monitoring: Regular review of anomaly logs recommended
Tune for Your Workload: Adjust thresholds based on your query patterns

Performance

Detection Overhead: ~1-2ms per query
Memory Usage: ~100KB for user statistics
CPU Usage: Minimal (regex-based detection)

API Reference

See API.md for complete API documentation.

Architecture

See ARCHITECTURE.md for detailed architecture information.

Testing

See TESTING.md for testing guide and examples.