mirror of https://github.com/sysown/proxysql
check_logs_for_command() in pgsql-copy_from_test-t did a single-shot call
to get_matching_lines() right after the SQL that was expected to cause the
"Switching to Fast Forward" / "Switching back to Normal" log line. Because
ProxySQL writes that line asynchronously with respect to the query's reply,
the scan occasionally reached EOF before the producer had flushed the line,
turning the assertion into a flake.
The failure that originally surfaced this was legacy-g1, test 85/327 of
pgsql-copy_from_test-t: 1 of 236 assertions (not ok 126 - "Switching back
to Normal mode"). It never reproduced under three targeted reruns, which
is consistent with a timing race rather than a logic bug.
Two changes:
1. Add wait_for_log_match() in test/tap/tap/utils.{h,cpp}: retry
get_matching_lines() up to a timeout, clearing both eofbit and failbit
between iterations so successive getline() calls pick up lines appended
since the last scan. (Clearing only failbit, as get_matching_lines does
internally, is not enough — once a stream has hit EOF its sticky eofbit
keeps getline() from reading new bytes.) Defaults: 2000 ms total budget,
100 ms between polls — matching the idiom already used in
admin_set_credentials_logging-t.
2. Rewrite check_logs_for_command() in pgsql-copy_from_test-t.cpp to call
the new helper. Positive assertions still return as soon as the line
appears (no added latency in the common case). Negative assertions
(check_logs_for_command(...) == false) now wait up to the full timeout
to confirm absence, closing the symmetrical race where a line would
have arrived a few hundred ms later than the single-shot scan.
Verified: 3/3 reruns of pgsql-copy_from_test-t in legacy-g1 post-fix all
pass 236/236, runtime ~40 s per run (single-shot version was slightly
faster because it short-circuited on negative assertions without waiting).
v3.0-slim-dbdeployer-images
parent
6322d58376
commit
2b1a4262b1
Loading…
Reference in new issue