Previously, when a PgSQL backend died mid-transaction, ProxySQL's
handler_minus1_ClientLibraryError refused to silently retry the failed
statement on a fresh backend (correct, per 68c76eb42) but then also tore
down the client session by returning handler_ret=-1. The application
had to reconnect before it could make forward progress, even though
Postgres's wire protocol has enough granularity to let us signal the
client "your transaction is aborted, send ROLLBACK to recover" while
keeping the socket open.
This change introduces that alternative path, gated behind a new admin
variable pgsql-preserve_client_on_broken_backend_in_tx (default true).
When the backend dies mid-transaction with the admin var on and the
result-set transfer to the client has not yet started, ProxySQL now
synthesizes on the client OUT queue:
* ErrorResponse severity=ERROR sqlstate=25P02
"current transaction is aborted, commands ignored until end of
transaction block"
* NoticeResponse carrying the backend's original error-message text
(no leak of the original 57P01 sqlstate — per design, the only
client-visible sqlstate is the synthesized 25P02)
* ReadyForQuery('E') — libpq now reports PQTRANS_INERROR.
The session is marked tx_poisoned. handler_special_queries gains an
earliest gate that dispatches to handler_poisoned_simple_query before
query rules / digests / routing:
* ROLLBACK / ROLLBACK TO SAVEPOINT / ABORT -> synthesize
CommandComplete('ROLLBACK') + ReadyForQuery('I'), clear the flag.
* COMMIT / END -> match Postgres native: emit NoticeResponse with
"there is no transaction in progress" then the same synthesized
ROLLBACK, clear the flag.
* RELEASE SAVEPOINT -> ERROR 25P02 (matches Postgres native).
* Anything else -> ERROR 25P02, stay poisoned.
Extended Query (Parse/Bind/Describe/Close/Execute/Sync) while poisoned
is rejected wholesale in V1: P/B/D/C/E are swallowed, Sync emits
ErrorResponse + ReadyForQuery('E'). The client can always recover via
a Simple-Query ROLLBACK. Extended-query recovery is deliberate future
work (documented in the accompanying issue).
The fallback paths stay intact. With the admin var off, or when the
result-set transfer has already started to the client, the session is
terminated as before — preserving pre-feature behavior for operators
who rely on session termination as an app-layer signal.
Three counters expose the feature's activity on stats_pgsql_global:
pgsql_tx_poisoned_total
pgsql_tx_poisoned_recovered_total
pgsql_tx_poisoned_rejected_statements_total
Each PgSQL thread maintains its own lock-free copy;
PgSQL_Threads_Handler aggregates across threads when the admin table
is rendered. Prometheus wiring for these counters is deliberately not
added here — the existing PgSQL Prometheus pipeline (p_gauge_array)
is incomplete even for active_transactions. Follow-up work.
tx_poisoned resets on PgSQL_Session::reset(). No retry-loop interacts
with the flag; session handoff to a fresh backend only happens after
the client itself sends ROLLBACK/COMMIT/ABORT to clear it.
Related: issue #5658.
'M',"current transaction is aborted, commands ignored until end of transaction block (extended-query path; issue ROLLBACK via simple query to recover)",