Add comprehensive documentation for "Closing killed client connection" warnings

misc251219
Rene Cannao 2 months ago
parent 6966b79db9
commit de214cb558

@ -0,0 +1,215 @@
# ProxySQL: Understanding "Closing killed client connection" Warnings
## Introduction
ProxySQL logs the message `"Closing killed client connection <IP>:<PORT>"` as a final cleanup step when a client session that has already been marked for termination is removed from a worker thread. This warning appears **only after** the session has been marked as killed (`killed=true`), and it does **not** indicate the **reason** for the kill.
To diagnose why connections are being killed, you must look for earlier log entries that explain **why** the session was marked as killed in the first place.
## TwoPhase Kill Process
ProxySQL handles client connection termination in two distinct phases:
1. **Kill Decision Phase**: A condition triggers the session to be marked as killed (`killed=true`).
- **With warning**: Most timeoutbased mechanisms log a `"Killing client connection … because …"` warning at this point.
- **Without warning**: Explicit kill commands (admin interface, client KILL statements) set `killed=true` silently.
2. **Cleanup Phase**: The worker thread detects `killed=true` and performs final cleanup, logging:
`"Closing killed client connection <IP>:<PORT>"`
**Key Insight**: If you see **only** the `"Closing killed client connection …"` warning without a preceding `"Killing client connection because …"` message, the kill was triggered by an explicit command, **not** by a timeout.
---
## Kill Triggers That Log a Warning
These mechanisms log a `"Killing client connection … because …"` warning **before** setting `killed=true`.
If any of these are the cause, you will see **both** the reason warning **and** the final cleanup warning.
### 1. **Idle Timeout** (`wait_timeout`)
- **Trigger**: Client connection inactive (no queries) longer than `wait_timeout`.
- **Warning**:
`"Killing client connection <IP>:<PORT> because inactive for <ms>ms"`
- **Configuration**:
- `mysqlwait_timeout` (global)
- Clientspecific `wait_timeout` (if set via `SET wait_timeout=…`)
### 2. **Transaction Idle Timeout** (`max_transaction_idle_time`)
- **Trigger**: A transaction has been started but remains idle (no statements executed) longer than `max_transaction_idle_time`.
- **Warning**:
`"Killing client connection <IP>:<PORT> because of (possible) transaction idle for <ms>ms"`
- **Configuration**: `mysqlmax_transaction_idle_time`
### 3. **Transaction Running Timeout** (`max_transaction_time`)
- **Trigger**: A transaction has been actively running (executing statements) longer than `max_transaction_time`.
- **Warning**:
`"Killing client connection <IP>:<PORT> because of (possible) transaction running for <ms>ms"`
- **Configuration**: `mysqlmax_transaction_time`
### 4. **FastForward Mode with Offline Backends**
- **Trigger**: Session is in `session_fast_forward` mode and all backends are OFFLINE.
- **Warning**:
`"Killing client connection <IP>:<PORT> due to 'session_fast_forward' and offline backends"`
- **Configuration**: `mysqlsession_fast_forward`
---
## Kill Triggers That Do **Not** Log a Warning
These triggers set `killed=true` **without** logging a `"Killing client connection because …"` warning.
You will see **only** the final `"Closing killed client connection …"` message.
### 1. **AdminInitiated Kill**
- **Trigger**: `KILL CONNECTION` command executed via the ProxySQL Admin interface.
- **Path**: `MySQL_Threads_Handler::kill_session()` → sets `killed=true` directly.
- **No warning logged** at kill decision time.
### 2. **KillQueue Request**
- **Trigger**: `kill_connection_or_query()` called by:
- MySQL client `KILL CONNECTION` or `KILL QUERY` statements
- Admininterface kill commands (same as above)
- **Path**:
`kill_connection_or_query()` → places request in perthread queue → `Scan_Sessions_to_Kill()` processes queue → sets `killed=true`.
- **No warning logged** when `killed=true` is set.
---
## Log Analysis and Troubleshooting Guide
### Scenario 1: You see **both** warnings
```
[WARNING] Killing client connection 192.168.123.45:56789 because inactive for 3600000ms
[WARNING] Closing killed client connection 192.168.123.45:56789
```
**Diagnosis**: A timeout mechanism killed the connection.
**Action**: Adjust the relevant timeout variable (`wait_timeout`, `max_transaction_idle_time`, etc.) or investigate why the client stayed idle so long.
### Scenario 2: You see **only** the cleanup warning
```
[WARNING] Closing killed client connection 192.168.123.45:56789
```
**Diagnosis**: The connection was killed by an explicit command (admin or client KILL).
**Action**:
1. Check if any component (application, connection pool, admin script) is issuing `KILL CONNECTION` commands.
2. Review ProxySQL admin logs for `KILL` commands.
3. Check application logs for connectionpool cleanup activities.
### Scenario 3: Many cleanup warnings appear unexpectedly
**Possible Causes**:
1. **Connectionpool cleanup**: Pools that aggressively close idle connections may issue `KILL` commands.
2. **Admin automation**: Scripts or monitoring tools that kill “stuck” connections.
3. **Client applications**: Applications that manually kill their own connections.
**Investigation Steps**:
1. **Enable audit logging**:
```sql
SET mysql-auditlog_filename='/var/log/proxysql_audit.log';
LOAD MYSQL VARIABLES TO RUNTIME;
```
Audit logs capture `KILL` commands with source IP and username.
2. **Check `stats_mysql_commands_counters`**:
```sql
SELECT * FROM stats_mysql_commands_counters WHERE Command='Kill';
```
Shows how many `KILL` commands have been executed.
3. **Monitor active kills**:
```sql
SELECT * FROM stats_mysql_processlist WHERE info LIKE 'KILL%';
```
Shows currently executing `KILL` commands.
4. **Review clientside logs**: Look for connectionpool or applicationlayer kill patterns.
---
## Configuration Parameters Reference
| Variable | Default | Description |
|----------|---------|-------------|
| `mysqlwait_timeout` | 28800000 ms (8 hours) | Maximum idle time before connection is killed. |
| `mysqlmax_transaction_idle_time` | 0 (disabled) | Maximum idle time for an open transaction. |
| `mysqlmax_transaction_time` | 0 (disabled) | Maximum total time a transaction can run. |
| `mysqlsession_fast_forward` | false | Enable fastforward mode (kills connections if backends go OFFLINE). |
| `mysqlthrottle_max_transaction_time` | 0 (disabled) | Alternative to `max_transaction_time`; throttles instead of kills. |
**Note**: Timeouts are expressed in **milliseconds**. A value of `0` disables the timeout.
---
## Best Practices for Monitoring and Handling
### 1. **Differentiate Between Expected and Unexpected Kills**
- **Expected**: Connectionpool cleanup, scheduled maintenance, applicationcontrolled termination.
- **Unexpected**: Unknown source of `KILL` commands, timeouts that are too aggressive for your workload.
### 2. **Set Appropriate Timeouts**
- **Production**: Align `wait_timeout` with your applications connectionpool `idleTimeout`.
- **Transactions**: Enable `max_transaction_idle_time` and `max_transaction_time` to prevent runaway transactions.
- **Testing**: Start with conservative values and adjust based on observed behavior.
### 3. **Use Audit Logging for Forensic Analysis**
```sql
-- Enable audit logging
SET mysql-auditlog_filename='/var/log/proxysql_audit.log';
SET mysql-auditlog_filesize=1000000;
SET mysql-auditlog=true;
LOAD MYSQL VARIABLES TO RUNTIME;
SAVE MYSQL VARIABLES TO DISK;
```
Audit logs record every `KILL` command with timestamp, client IP, and username.
### 4. **Monitor Kill Statistics**
```sql
-- Track kill sources
SELECT
SUM(CASE WHEN Command='Kill' THEN Total_Time_us ELSE 0 END) AS kill_time_us,
SUM(CASE WHEN Command='Kill' THEN cnt ELSE 0 END) AS kill_count
FROM stats_mysql_commands_counters;
-- Check for recent kills in the processlist
SELECT * FROM stats_mysql_processlist
WHERE info LIKE 'KILL%'
ORDER BY time_ms DESC
LIMIT 10;
```
### 5. **Responding to Unexpected Kill Storms**
1. **Identify the source** via audit logs and `stats_mysql_commands_counters`.
2. **If client applications**: Coordinate with developers to adjust connectionpool settings.
3. **If admin scripts**: Review automation logic and add appropriate guards.
4. **If unknown**: Temporarily enable verbose logging (`mysqlquery_processor_log`) to capture more context.
---
## Common Questions and Answers
### Q: Why dont I see a “Killing client connection because …” warning?
**A**: The kill was triggered by an explicit `KILL` command (admin or client), not by a timeout. Explicit kills do not log a reason at the moment `killed=true` is set.
### Q: Can I disable the “Closing killed client connection” warnings?
**A**: Yes, by lowering the logverbosity level. However, doing so removes visibility into connection termination. Instead, investigate and address the root cause of the kills.
### Q: Are PostgreSQL connections handled differently?
**A**: The same twophase pattern applies to PostgreSQL, with analogous timeout variables (`pgsqlwait_timeout`, etc.) and kill mechanisms. The warning messages are similar but may appear in `PgSQL_Thread` instead of `MySQL_Thread`.
### Q: How can I distinguish between a client `KILL` and an admin `KILL`?
**A**: Audit logs show the source IP and username. Client `KILL` commands originate from application IPs; admin `KILL` commands come from the admininterface IP (usually `127.0.0.1` or your admin network).
### Q: What should I do if kills are causing application errors?
**A**:
1. Verify timeout values match your applications expected behavior.
2. Ensure connection pools are configured to `KILL` connections gracefully (e.g., with `COM_QUIT` instead of `KILL CONNECTION`).
3. Consider increasing timeouts temporarily while diagnosing.
---
## Summary
The `"Closing killed client connection …"` warning is a **cleanup message**, not a **rootcause indicator**. Diagnosing why connections are killed requires examining earlier logs for `"Killing client connection because …"` warnings or identifying explicit `KILL` commands via audit logs and statistics.
- **Timeout kills** → preceded by a reason warning.
- **Explicit kills** → no preceding reason warning.
Use the troubleshooting steps and monitoring practices outlined above to identify the source of kills and adjust your configuration or application behavior accordingly.
Loading…
Cancel
Save