feat(passthrough-auth): counters + stats_mysql_passthrough_auth_metrics

Adds operational-visibility metrics for the pass-through auth feature.
Previously the only diagnostics available to an SRE were the audit
log (optional, often off) and proxy_warning (added in the preceding
commit). Both are good but neither lets an operator answer
"is anything happening?" / "how often does pass-through succeed?"
at a glance.

Nine atomic counters added to MySQL_Passthrough_Auth_Cache,
incremented at the single well-defined call site each:

  probes_attempted          -- a real probe was about to run
                               (past all eligibility gates, past
                               inflight cap, past rate limits)
  probes_ok                 -- probe succeeded, cache populated
  probes_failed_credentials -- probe got 1045/1698/1130 from backend
  probes_failed_transport   -- probe got 2xxx / unknown errno
  lockouts_user             -- per-user lockout fired
  lockouts_ip               -- per-IP lockout fired
  inflight_cap_rejects      -- max_inflight_probes saturated
  cache_hits                -- PPHR_verify_password used the cache
  cache_invalidations       -- backend ER 1045 evicted a cached entry

Plus two current-state gauges (read on demand from existing methods):
  inflight_probes  -- current Glo->inflight()
  cache_entries    -- current Glo->size()

Exposed via a new admin virtual table:

  stats_mysql_passthrough_auth_metrics(
      metric_name VARCHAR NOT NULL PRIMARY KEY,
      metric_value BIGINT NOT NULL
  )

stats___mysql_passthrough_auth_metrics is registered alongside the
existing stats___mysql_passthrough_auth_cache; both refresh from
GloMyPTAuthCache on SELECT.

All counter ops use std::memory_order_relaxed -- stats are
advisory and don't need to synchronize with the increments.
Counters are monotonic since process start; reset only by process
restart. Phase 2 may add a Prometheus exposer alongside this; for
now the admin table is the canonical surface for monitoring
integrations.

Discovered by the production-readiness subagent during the second
deep review of PR #5810. Listed there as a blocker for production
GA because operational visibility into pass-through misbehavior
was effectively zero.
docs/passthrough-auth-spec
Rene Cannao 1 month ago
parent c73d1be692
commit 8dfe921ad4

@ -43,6 +43,28 @@ class MySQL_Passthrough_Auth_Cache {
mutable pthread_rwlock_t lock;
std::unordered_map<std::string, entry_t> entries;
std::atomic<int> inflight_probes;
/**
* @brief Atomic counters for operational observability (spec §7.4 follow-up).
*
* Exposed via @c stats_mysql_passthrough_auth_metrics. Each
* counter increments at exactly one well-defined point in
* @c handler_again___status_AUTHENTICATING_BACKEND_FOR_CLIENT
* (see the corresponding @c bump_* methods below). Monotonic
* since process start; reset only by process restart.
*
* Naming mirrors the existing @c stats_mysql_global pattern of
* "what happened" snake-case-counters; no special suffixes.
*/
std::atomic<uint64_t> stat_probes_attempted;
std::atomic<uint64_t> stat_probes_ok;
std::atomic<uint64_t> stat_probes_failed_credentials;
std::atomic<uint64_t> stat_probes_failed_transport;
std::atomic<uint64_t> stat_lockouts_user;
std::atomic<uint64_t> stat_lockouts_ip;
std::atomic<uint64_t> stat_inflight_cap_rejects;
std::atomic<uint64_t> stat_cache_hits;
std::atomic<uint64_t> stat_cache_invalidations;
// Sliding-window failure counters (spec §7.2). Per-username and
// per-source-IP. Mutated only behind failure_lock — a separate
// mutex from `lock` since these are write-mostly and accessed on
@ -118,6 +140,37 @@ class MySQL_Passthrough_Auth_Cache {
bool would_lockout_ip(const std::string& ip, int max_failures, uint32_t window_s) const;
void record_failure(const std::string& username, const std::string& ip);
/**
* @brief Observability counters (B7 follow-up).
*
* Each @c bump_* method increments the corresponding atomic at
* the single call site documented in @c MySQL_Session.cpp. The
* @c metrics_snapshot helper returns the current values for the
* @c stats_mysql_passthrough_auth_metrics virtual table.
*/
void bump_probes_attempted();
void bump_probes_ok();
void bump_probes_failed_credentials();
void bump_probes_failed_transport();
void bump_lockouts_user();
void bump_lockouts_ip();
void bump_inflight_cap_rejects();
void bump_cache_hits();
void bump_cache_invalidations();
/**
* @brief Snapshot of metric counters + current-state gauges.
*
* Returns a vector of (name, value) pairs ordered for stable JSON /
* stats-table output. Values are read with relaxed memory ordering
* since stats are advisory, not synchronizing.
*/
struct metric_kv {
std::string name;
uint64_t value;
};
std::vector<metric_kv> metrics_snapshot() const;
/**
* @brief Check whether @p username matches the configured allowlist
* regex (spec §7.1, mysql-passthrough_auth_username_pattern).

@ -197,6 +197,8 @@
#define STATS_SQLITE_TABLE_MYSQL_PASSTHROUGH_AUTH_CACHE "CREATE TABLE stats_mysql_passthrough_auth_cache (username VARCHAR NOT NULL PRIMARY KEY , learned_at BIGINT NOT NULL , age_s INT NOT NULL , hostgroup_probed INT NOT NULL)"
#define STATS_SQLITE_TABLE_MYSQL_PASSTHROUGH_AUTH_METRICS "CREATE TABLE stats_mysql_passthrough_auth_metrics (metric_name VARCHAR NOT NULL PRIMARY KEY , metric_value BIGINT NOT NULL)"
#define STATS_SQLITE_TABLE_TLS_CERTIFICATES "CREATE TABLE stats_tls_certificates (cert_type VARCHAR NOT NULL PRIMARY KEY , file_path VARCHAR NOT NULL , subject_cn VARCHAR , issuer_cn VARCHAR , serial_number VARCHAR , not_before VARCHAR , not_after VARCHAR , days_until_expiry INT , sha256_fingerprint VARCHAR , loaded_at INT NOT NULL DEFAULT 0)"
#ifdef DEBUG

@ -804,6 +804,7 @@ class ProxySQL_Admin {
void stats___mysql_gtid_executed();
void stats___mysql_client_host_cache(bool reset);
void stats___mysql_passthrough_auth_cache();
void stats___mysql_passthrough_auth_metrics();
void stats___tls_certificates();
void stats___proxysql_global();

@ -880,6 +880,7 @@ bool ProxySQL_Admin::init(const bootstrap_info_t& bootstrap_info) {
insert_into_tables_defs(tables_defs_stats,"stats_mysql_client_host_cache", STATS_SQLITE_TABLE_MYSQL_CLIENT_HOST_CACHE);
insert_into_tables_defs(tables_defs_stats,"stats_mysql_client_host_cache_reset", STATS_SQLITE_TABLE_MYSQL_CLIENT_HOST_CACHE_RESET);
insert_into_tables_defs(tables_defs_stats,"stats_mysql_passthrough_auth_cache", STATS_SQLITE_TABLE_MYSQL_PASSTHROUGH_AUTH_CACHE);
insert_into_tables_defs(tables_defs_stats,"stats_mysql_passthrough_auth_metrics", STATS_SQLITE_TABLE_MYSQL_PASSTHROUGH_AUTH_METRICS);
insert_into_tables_defs(tables_defs_stats,"stats_mysql_query_events", ADMIN_SQLITE_TABLE_STATS_MYSQL_QUERY_EVENTS);
insert_into_tables_defs(tables_defs_stats,"stats_tls_certificates", STATS_SQLITE_TABLE_TLS_CERTIFICATES);
insert_into_tables_defs(tables_defs_stats,"stats_proxysql_global", STATS_SQLITE_TABLE_GLOBAL);

@ -7,7 +7,17 @@
#include <cstdio>
MySQL_Passthrough_Auth_Cache::MySQL_Passthrough_Auth_Cache()
: inflight_probes(0), compiled_pattern(NULL) {
: inflight_probes(0),
stat_probes_attempted(0),
stat_probes_ok(0),
stat_probes_failed_credentials(0),
stat_probes_failed_transport(0),
stat_lockouts_user(0),
stat_lockouts_ip(0),
stat_inflight_cap_rejects(0),
stat_cache_hits(0),
stat_cache_invalidations(0),
compiled_pattern(NULL) {
pthread_rwlock_init(&lock, NULL);
pthread_mutex_init(&failure_lock, NULL);
pthread_rwlock_init(&pattern_lock, NULL);
@ -330,6 +340,62 @@ bool MySQL_Passthrough_Auth_Cache::username_allowed(
return ok;
}
void MySQL_Passthrough_Auth_Cache::bump_probes_attempted() {
stat_probes_attempted.fetch_add(1, std::memory_order_relaxed);
}
void MySQL_Passthrough_Auth_Cache::bump_probes_ok() {
stat_probes_ok.fetch_add(1, std::memory_order_relaxed);
}
void MySQL_Passthrough_Auth_Cache::bump_probes_failed_credentials() {
stat_probes_failed_credentials.fetch_add(1, std::memory_order_relaxed);
}
void MySQL_Passthrough_Auth_Cache::bump_probes_failed_transport() {
stat_probes_failed_transport.fetch_add(1, std::memory_order_relaxed);
}
void MySQL_Passthrough_Auth_Cache::bump_lockouts_user() {
stat_lockouts_user.fetch_add(1, std::memory_order_relaxed);
}
void MySQL_Passthrough_Auth_Cache::bump_lockouts_ip() {
stat_lockouts_ip.fetch_add(1, std::memory_order_relaxed);
}
void MySQL_Passthrough_Auth_Cache::bump_inflight_cap_rejects() {
stat_inflight_cap_rejects.fetch_add(1, std::memory_order_relaxed);
}
void MySQL_Passthrough_Auth_Cache::bump_cache_hits() {
stat_cache_hits.fetch_add(1, std::memory_order_relaxed);
}
void MySQL_Passthrough_Auth_Cache::bump_cache_invalidations() {
stat_cache_invalidations.fetch_add(1, std::memory_order_relaxed);
}
std::vector<MySQL_Passthrough_Auth_Cache::metric_kv>
MySQL_Passthrough_Auth_Cache::metrics_snapshot() const {
/**
* @brief Order matters here: this is the order @c
* stats_mysql_passthrough_auth_metrics returns to admin clients.
* Counters first (monotonic-since-startup), gauges last
* (current-state). All values are read with relaxed memory
* ordering -- stats are advisory and don't need to synchronize
* with the increments.
*/
std::vector<metric_kv> out;
out.reserve(11);
out.push_back({ "probes_attempted", stat_probes_attempted.load(std::memory_order_relaxed) });
out.push_back({ "probes_ok", stat_probes_ok.load(std::memory_order_relaxed) });
out.push_back({ "probes_failed_credentials", stat_probes_failed_credentials.load(std::memory_order_relaxed) });
out.push_back({ "probes_failed_transport", stat_probes_failed_transport.load(std::memory_order_relaxed) });
out.push_back({ "lockouts_user", stat_lockouts_user.load(std::memory_order_relaxed) });
out.push_back({ "lockouts_ip", stat_lockouts_ip.load(std::memory_order_relaxed) });
out.push_back({ "inflight_cap_rejects", stat_inflight_cap_rejects.load(std::memory_order_relaxed) });
out.push_back({ "cache_hits", stat_cache_hits.load(std::memory_order_relaxed) });
out.push_back({ "cache_invalidations", stat_cache_invalidations.load(std::memory_order_relaxed) });
/* Current-state gauges. Use the public accessors so the locking
* lives in one place. */
out.push_back({ "inflight_probes", static_cast<uint64_t>(inflight()) });
out.push_back({ "cache_entries", static_cast<uint64_t>(size()) });
return out;
}
void MySQL_Passthrough_Auth_Cache::print_version() {
fprintf(stderr, "MySQL_Passthrough_Auth_Cache rev. " MYSQL_PASSTHROUGH_AUTH_CACHE_VERSION "\n");
}

@ -2649,6 +2649,7 @@ bool MySQL_Protocol::PPHR_verify_password(MyProt_tmp_auth_vars& vars1, account_d
: 0;
if (GloMyPTAuthCache->lookup(
std::string((const char*)vars1.user), cleartext, ttl_s)) {
GloMyPTAuthCache->bump_cache_hits();
if (vars1.password) { free(vars1.password); }
vars1.password = strdup(cleartext.c_str());
/**

@ -1805,11 +1805,13 @@ int MySQL_Session::handler_again___status_AUTHENTICATING_BACKEND_FOR_CLIENT() {
if (GloMyPTAuthCache->would_lockout_user(user_key,
mysql_thread___passthrough_auth_max_failures_per_user,
mysql_thread___passthrough_auth_failure_window_s)) {
GloMyPTAuthCache->bump_lockouts_user();
return fail_session("per-user lockout");
}
if (GloMyPTAuthCache->would_lockout_ip(ip_key,
mysql_thread___passthrough_auth_max_failures_per_ip,
mysql_thread___passthrough_auth_failure_window_s)) {
GloMyPTAuthCache->bump_lockouts_ip();
return fail_session("per-ip lockout");
}
@ -1820,8 +1822,12 @@ int MySQL_Session::handler_again___status_AUTHENTICATING_BACKEND_FOR_CLIENT() {
// every return path below (success and failure).
if (!GloMyPTAuthCache->try_acquire_inflight(
mysql_thread___passthrough_auth_max_inflight_probes)) {
GloMyPTAuthCache->bump_inflight_cap_rejects();
return fail_session("inflight probe cap reached");
}
/* Past the gates; this attempt will run a real probe. Bump
* probes_attempted exactly once per actual probe. */
GloMyPTAuthCache->bump_probes_attempted();
struct InflightGuard {
MySQL_Passthrough_Auth_Cache* cache;
~InflightGuard() { if (cache) cache->release_inflight(); }
@ -1935,8 +1941,10 @@ int MySQL_Session::handler_again___status_AUTHENTICATING_BACKEND_FOR_CLIENT() {
|| probe_errno == ER_HOST_NOT_PRIVILEGED); /* 1130 */
if (credential_failure) {
GloMyPTAuthCache->record_failure(user_key, ip_key);
GloMyPTAuthCache->bump_probes_failed_credentials();
return fail_session("backend rejected probe (credentials)");
}
GloMyPTAuthCache->bump_probes_failed_transport();
return fail_session("backend probe transport failure");
}
@ -1960,6 +1968,8 @@ int MySQL_Session::handler_again___status_AUTHENTICATING_BACKEND_FOR_CLIENT() {
client_myds->myprot.generate_pkt_OK(true, NULL, NULL, _pid, 0, 0, 2, 0, NULL);
client_myds->DSS = STATE_CLIENT_AUTH_OK;
GloMyPTAuthCache->bump_probes_ok();
/**
* @brief Audit log -- success path.
*
@ -3441,8 +3451,11 @@ bool MySQL_Session::handler_again___status_CONNECTING_SERVER(int *_rc) {
&& client_myds && client_myds->myconn
&& client_myds->myconn->userinfo
&& client_myds->myconn->userinfo->username) {
GloMyPTAuthCache->evict(std::string(
const bool was_present = GloMyPTAuthCache->evict(std::string(
(const char*)client_myds->myconn->userinfo->username));
if (was_present) {
GloMyPTAuthCache->bump_cache_invalidations();
}
}
break;
default:

@ -1287,6 +1287,7 @@ bool ProxySQL_Admin::GenericRefreshStatistics(const char *query_no_space, unsign
bool stats_mysql_client_host_cache=false;
bool stats_mysql_client_host_cache_reset=false;
bool stats_mysql_passthrough_auth_cache=false;
bool stats_mysql_passthrough_auth_metrics=false;
bool stats_pgsql_client_host_cache = false;
bool stats_pgsql_client_host_cache_reset = false;
bool stats_tls_certificates=false;
@ -1469,6 +1470,8 @@ bool ProxySQL_Admin::GenericRefreshStatistics(const char *query_no_space, unsign
{ stats_mysql_client_host_cache_reset=true; refresh=true; }
if (strstr(query_no_space,"stats_mysql_passthrough_auth_cache"))
{ stats_mysql_passthrough_auth_cache=true; refresh=true; }
if (strstr(query_no_space,"stats_mysql_passthrough_auth_metrics"))
{ stats_mysql_passthrough_auth_metrics=true; refresh=true; }
if (strstr(query_no_space, "stats_pgsql_client_host_cache"))
{ stats_pgsql_client_host_cache = true; refresh = true; }
if (strstr(query_no_space, "stats_pgsql_client_host_cache_reset"))
@ -1737,6 +1740,9 @@ bool ProxySQL_Admin::GenericRefreshStatistics(const char *query_no_space, unsign
if (stats_mysql_passthrough_auth_cache) {
stats___mysql_passthrough_auth_cache();
}
if (stats_mysql_passthrough_auth_metrics) {
stats___mysql_passthrough_auth_metrics();
}
if (stats_pgsql_client_host_cache) {
stats___pgsql_client_host_cache(false);
}

@ -2022,6 +2022,41 @@ void ProxySQL_Admin::stats___mysql_passthrough_auth_cache() {
statsdb->execute("COMMIT");
}
/**
* @brief Populate stats_mysql_passthrough_auth_metrics from
* @c GloMyPTAuthCache->metrics_snapshot().
*
* The snapshot returns a vector of (name, value) pairs in stable order
* (counters first, gauges last). Each pair becomes one row. The
* counters are monotonic-since-startup; gauges (inflight_probes,
* cache_entries) reflect the current state at query time.
*/
void ProxySQL_Admin::stats___mysql_passthrough_auth_metrics() {
if (!GloMyPTAuthCache) return;
const auto snap = GloMyPTAuthCache->metrics_snapshot();
statsdb->execute("BEGIN");
statsdb->execute("DELETE FROM stats_mysql_passthrough_auth_metrics");
const char *query = (char*)"INSERT INTO stats_mysql_passthrough_auth_metrics VALUES (?1, ?2)";
auto [rc1, statement_unique] = statsdb->prepare_v2(query);
int rc = rc1;
sqlite3_stmt *statement = statement_unique.get();
ASSERT_SQLITE_OK(rc, statsdb);
for (const auto &m : snap) {
rc=(*proxy_sqlite3_bind_text)(statement, 1, m.name.c_str(), -1, SQLITE_TRANSIENT); ASSERT_SQLITE_OK(rc, statsdb);
rc=(*proxy_sqlite3_bind_int64)(statement, 2, static_cast<int64_t>(m.value)); ASSERT_SQLITE_OK(rc, statsdb);
SAFE_SQLITE3_STEP2(statement);
rc=(*proxy_sqlite3_clear_bindings)(statement);
rc=(*proxy_sqlite3_reset)(statement);
}
statsdb->execute("COMMIT");
}
void ProxySQL_Admin::stats___pgsql_client_host_cache(bool reset) {
if (!GloPTH) return;

Loading…
Cancel
Save