You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
proxysql/doc/ANOMALY_DETECTION/API.md

601 lines
12 KiB

# Anomaly Detection API Reference
## Complete API Documentation for Anomaly Detection Module
This document provides comprehensive API reference for the Anomaly Detection feature in ProxySQL.
---
## Table of Contents
1. [Configuration Variables](#configuration-variables)
2. [Status Variables](#status-variables)
3. [AnomalyResult Structure](#anomalyresult-structure)
4. [Anomaly_Detector Class](#anomaly_detector-class)
5. [MySQL_Session Integration](#mysql_session-integration)
---
## Configuration Variables
All configuration variables are prefixed with `ai_anomaly_` and can be set via the ProxySQL admin interface.
### ai_anomaly_enabled
**Type:** Boolean
**Default:** `true`
**Dynamic:** Yes
Enable or disable the anomaly detection module.
```sql
SET ai_anomaly_enabled='true';
SET ai_anomaly_enabled='false';
```
**Example:**
```sql
-- Disable anomaly detection temporarily
UPDATE mysql_servers SET ai_anomaly_enabled='false';
LOAD MYSQL VARIABLES TO RUNTIME;
```
---
### ai_anomaly_risk_threshold
**Type:** Integer (0-100)
**Default:** `70`
**Dynamic:** Yes
The risk score threshold for blocking queries. Queries with risk scores above this threshold will be blocked if auto-block is enabled.
- **0-49**: Low sensitivity, only severe threats blocked
- **50-69**: Medium sensitivity (default)
- **70-89**: High sensitivity
- **90-100**: Very high sensitivity, may block legitimate queries
```sql
SET ai_anomaly_risk_threshold='80';
```
**Risk Score Calculation:**
- Each detection method contributes 0-100 points
- Final score = maximum of all method scores
- Score > threshold = query blocked (if auto-block enabled)
---
### ai_anomaly_rate_limit
**Type:** Integer
**Default:** `100`
**Dynamic:** Yes
Maximum number of queries allowed per minute per user/host combination.
**Time Window:** 1 hour rolling window
```sql
-- Set rate limit to 200 queries per minute
SET ai_anomaly_rate_limit='200';
-- Set rate limit to 10 for testing
SET ai_anomaly_rate_limit='10';
```
**Rate Limiting Logic:**
1. Tracks query count per (user, host) pair
2. Calculates queries per minute
3. Blocks when rate > limit
4. Auto-resets after time window expires
---
### ai_anomaly_similarity_threshold
**Type:** Integer (0-100)
**Default:** `85`
**Dynamic:** Yes
Similarity threshold for embedding-based threat detection (future implementation).
Higher values = more exact matching required.
```sql
SET ai_anomaly_similarity_threshold='90';
```
---
### ai_anomaly_auto_block
**Type:** Boolean
**Default:** `true`
**Dynamic:** Yes
Automatically block queries that exceed the risk threshold.
```sql
-- Enable auto-blocking
SET ai_anomaly_auto_block='true';
-- Disable auto-blocking (log-only mode)
SET ai_anomaly_auto_block='false';
```
**When `true`:**
- Queries exceeding risk threshold are blocked
- Error 1313 returned to client
- Query not executed
**When `false`:**
- Queries are logged only
- Query executes normally
- Useful for testing/monitoring
---
### ai_anomaly_log_only
**Type:** Boolean
**Default:** `false`
**Dynamic:** Yes
Enable log-only mode (monitoring without blocking).
```sql
-- Enable log-only mode
SET ai_anomaly_log_only='true';
```
**Log-Only Mode:**
- Anomalies are detected and logged
- Queries are NOT blocked
- Statistics are incremented
- Useful for baselining
---
## Status Variables
Status variables provide runtime statistics about anomaly detection.
### ai_detected_anomalies
**Type:** Counter
**Read-Only:** Yes
Total number of anomalies detected since ProxySQL started.
```sql
SHOW STATUS LIKE 'ai_detected_anomalies';
```
**Example Output:**
```
+-----------------------+-------+
| Variable_name | Value |
+-----------------------+-------+
| ai_detected_anomalies | 152 |
+-----------------------+-------+
```
**Prometheus Metric:** `proxysql_ai_detected_anomalies_total`
---
### ai_blocked_queries
**Type:** Counter
**Read-Only:** Yes
Total number of queries blocked by anomaly detection.
```sql
SHOW STATUS LIKE 'ai_blocked_queries';
```
**Example Output:**
```
+-------------------+-------+
| Variable_name | Value |
+-------------------+-------+
| ai_blocked_queries | 89 |
+-------------------+-------+
```
**Prometheus Metric:** `proxysql_ai_blocked_queries_total`
---
## AnomalyResult Structure
The `AnomalyResult` structure contains the outcome of an anomaly check.
```cpp
struct AnomalyResult {
bool is_anomaly; ///< True if anomaly detected
float risk_score; ///< 0.0-1.0 risk score
std::string anomaly_type; ///< Type of anomaly
std::string explanation; ///< Human-readable explanation
std::vector<std::string> matched_rules; ///< Rule names that matched
bool should_block; ///< Whether to block query
};
```
### Fields
#### is_anomaly
**Type:** `bool`
Indicates whether an anomaly was detected.
**Values:**
- `true`: Anomaly detected
- `false`: No anomaly
---
#### risk_score
**Type:** `float`
**Range:** 0.0 - 1.0
The calculated risk score for the query.
**Interpretation:**
- `0.0 - 0.3`: Low risk
- `0.3 - 0.6`: Medium risk
- `0.6 - 1.0`: High risk
**Note:** Compare against `ai_anomaly_risk_threshold / 100.0`
---
#### anomaly_type
**Type:** `std::string`
Type of anomaly detected.
**Possible Values:**
- `"sql_injection"`: SQL injection pattern detected
- `"rate_limit"`: Rate limit exceeded
- `"statistical"`: Statistical anomaly
- `"embedding_similarity"`: Similar to known threat (future)
- `"multiple"`: Multiple detection methods triggered
---
#### explanation
**Type:** `std::string`
Human-readable explanation of why the query was flagged.
**Example:**
```
"SQL injection pattern detected: OR 1=1 tautology"
"Rate limit exceeded: 150 queries/min for user 'app'"
```
---
#### matched_rules
**Type:** `std::vector<std::string>`
List of rule names that matched.
**Example:**
```cpp
["pattern:or_tautology", "pattern:quote_sequence"]
```
---
#### should_block
**Type:** `bool`
Whether the query should be blocked based on configuration.
**Determined by:**
1. `is_anomaly == true`
2. `risk_score > ai_anomaly_risk_threshold / 100.0`
3. `ai_anomaly_auto_block == true`
4. `ai_anomaly_log_only == false`
---
## Anomaly_Detector Class
Main class for anomaly detection operations.
```cpp
class Anomaly_Detector {
public:
Anomaly_Detector();
~Anomaly_Detector();
int init();
void close();
AnomalyResult analyze(const std::string& query,
const std::string& user,
const std::string& client_host,
const std::string& schema);
int add_threat_pattern(const std::string& pattern_name,
const std::string& query_example,
const std::string& pattern_type,
int severity);
std::string list_threat_patterns();
bool remove_threat_pattern(int pattern_id);
std::string get_statistics();
void clear_user_statistics();
};
```
---
### Constructor/Destructor
```cpp
Anomaly_Detector();
~Anomaly_Detector();
```
**Description:** Creates and destroys the anomaly detector instance.
**Default Configuration:**
- `enabled = true`
- `risk_threshold = 70`
- `similarity_threshold = 85`
- `rate_limit = 100`
- `auto_block = true`
- `log_only = false`
---
### init()
```cpp
int init();
```
**Description:** Initializes the anomaly detector.
**Return Value:**
- `0`: Success
- `non-zero`: Error
**Initialization Steps:**
1. Load configuration
2. Initialize user statistics tracking
3. Prepare detection patterns
**Example:**
```cpp
Anomaly_Detector* detector = new Anomaly_Detector();
if (detector->init() != 0) {
// Handle error
}
```
---
### close()
```cpp
void close();
```
**Description:** Closes the anomaly detector and releases resources.
**Example:**
```cpp
detector->close();
delete detector;
```
---
### analyze()
```cpp
AnomalyResult analyze(const std::string& query,
const std::string& user,
const std::string& client_host,
const std::string& schema);
```
**Description:** Main entry point for anomaly detection.
**Parameters:**
- `query`: The SQL query to analyze
- `user`: Username executing the query
- `client_host`: Client IP address
- `schema`: Database schema name
**Return Value:** `AnomalyResult` structure
**Detection Pipeline:**
1. Query normalization
2. SQL injection pattern detection
3. Rate limiting check
4. Statistical anomaly detection
5. Embedding similarity check (future)
6. Result aggregation
**Example:**
```cpp
Anomaly_Detector* detector = GloAI->get_anomaly_detector();
AnomalyResult result = detector->analyze(
"SELECT * FROM users WHERE username='admin' OR 1=1--'",
"app_user",
"192.168.1.100",
"production"
);
if (result.should_block) {
// Block the query
std::cerr << "Blocked: " << result.explanation << std::endl;
}
```
---
### add_threat_pattern()
```cpp
int add_threat_pattern(const std::string& pattern_name,
const std::string& query_example,
const std::string& pattern_type,
int severity);
```
**Description:** Adds a custom threat pattern to the detection database.
**Parameters:**
- `pattern_name`: Name for the pattern
- `query_example`: Example query representing the threat
- `pattern_type`: Type of pattern (e.g., "sql_injection", "ddos")
- `severity`: Severity level (1-10)
**Return Value:**
- `> 0`: Pattern ID
- `-1`: Error
**Example:**
```cpp
int pattern_id = detector->add_threat_pattern(
"custom_sqli",
"SELECT * FROM users WHERE id='1' UNION SELECT 1,2,3--'",
"sql_injection",
8
);
```
---
### list_threat_patterns()
```cpp
std::string list_threat_patterns();
```
**Description:** Returns JSON-formatted list of all threat patterns.
**Return Value:** JSON string containing pattern list
**Example:**
```cpp
std::string patterns = detector->list_threat_patterns();
std::cout << patterns << std::endl;
// Output: {"patterns": [{"id": 1, "name": "sql_injection_or", ...}]}
```
---
### remove_threat_pattern()
```cpp
bool remove_threat_pattern(int pattern_id);
```
**Description:** Removes a threat pattern by ID.
**Parameters:**
- `pattern_id`: ID of pattern to remove
**Return Value:**
- `true`: Success
- `false`: Pattern not found
---
### get_statistics()
```cpp
std::string get_statistics();
```
**Description:** Returns JSON-formatted statistics.
**Return Value:** JSON string with statistics
**Example Output:**
```json
{
"total_queries_analyzed": 15000,
"anomalies_detected": 152,
"queries_blocked": 89,
"detection_methods": {
"sql_injection": 120,
"rate_limiting": 25,
"statistical": 7
},
"user_statistics": {
"app_user": {"query_count": 5000, "blocked": 5},
"admin": {"query_count": 200, "blocked": 0}
}
}
```
---
### clear_user_statistics()
```cpp
void clear_user_statistics();
```
**Description:** Clears all accumulated user statistics.
**Use Case:** Resetting statistics after configuration changes.
---
## MySQL_Session Integration
The anomaly detection is integrated into the MySQL query processing flow.
### Integration Point
**File:** `lib/MySQL_Session.cpp`
**Function:** `MySQL_Session::handler___status_WAITING_CLIENT_DATA___STATE_SLEEP___MYSQL_COM_QUERY_detect_ai_anomaly()`
**Location:** Line ~3626
**Flow:**
```
Client Query
Query Parsing
libinjection SQLi Detection
AI Anomaly Detection ← Integration Point
Query Execution
Result Return
```
### Error Handling
When a query is blocked:
1. Error code 1317 (HY000) is returned
2. Custom error message includes explanation
3. Query is NOT executed
4. Event is logged
**Example Error:**
```
ERROR 1313 (HY000): Query blocked by anomaly detection: SQL injection pattern detected
```
### Access Control
Anomaly detection bypass for admin users:
- Queries from admin interface bypass detection
- Configurable via admin username whitelist