Cache pooler_check_query response per pool by vadv · Pull Request #254 · ozontech/pg_doorman

vadv · 2026-05-15T18:07:29Z

Summary

Replaces the hardcoded local reply for general.pooler_check_query with a
per-pool response cache. The first matching SimpleQuery in each pool's
lifetime is forwarded to PostgreSQL; every subsequent matching probe is
answered from cache without touching the backend. This unblocks non-empty
keepalive queries that JDBC pools default to — select 1 from WildFly,
HikariCP and most JDBC drivers now returns the real row instead of an
empty response.

Behavior changes

The default ; is no longer special-cased. The first probe per pool
performs one PostgreSQL round-trip and the response (empty for ;) is
cached. If PostgreSQL is unreachable at that moment, the probing client
sees a failure instead of an unconditional OK. The previous hardcoded
reply reported the pooler healthy even when PostgreSQL was down.
Cache is per pool, keyed by the query string. A RELOAD that changes
pooler_check_query invalidates the cache on the next ping. A reload
that keeps the same value keeps the cached response. ErrorResponse
from backend is forwarded to the client and never cached.

Operator contract

pooler_check_query must be stable: same input → same bytes, no side
effects. Safe: ;, select 1, select 'pg_doorman', select version().
Unsafe (cache will freeze the response): select now(),
select pg_is_in_recovery() (failover blindness),
select count(*) from <table>, UPDATE/INSERT/DELETE/CALL/DO.

New metrics

pg_doorman_pooler_check_query_backend_total — probes forwarded to
PostgreSQL (cache miss).
pg_doorman_pooler_check_query_cache_total — probes served from cache.

Ratio cache_total / (cache_total + backend_total) is the hit rate.

Test plan

17 unit tests: BufferingWriter, has_error_response frame parser,
CheckQueryCache get/set semantics.
5 BDD scenarios (make test-bdd TAGS=@pooler-check-query-cache):
default ; cache, custom select 1 cache, RELOAD invalidation,
backend ErrorResponse never cached, large response (10 KB) drained
via the recv-loop with pool staying in sync.
cargo fmt --all clean.
cargo clippy --lib --tests -- --deny warnings clean.
Pre-existing flaky test_prometheus_server_basic reproduces on
master too; unrelated.
make generate re-emitted pg_doorman.toml/yaml and EN reference
docs from the updated fields.yaml; library reference tests in sync.

Before, only ";" worked as pooler_check_query: pg_doorman always replied with a hardcoded EmptyQueryResponse + ReadyForQuery, regardless of the configured value. Setting "select 1" — the keep-alive query that Wildfly, HikariCP and many JDBC pools default to — produced the same empty response, which is semantically wrong for a non-empty query. Now pg_doorman forwards the first matching SimpleQuery to PostgreSQL, caches the response per pool, and serves subsequent matching probes from cache without touching the backend. The cache validates against the current general.pooler_check_query value on every lookup, so a RELOAD that changes the value triggers a fresh backend probe on the next ping without an explicit invalidation hook. ErrorResponse from the backend is never cached: the next probe re-fetches. The operator's responsibility: pooler_check_query must be a stable, side-effect-free SimpleQuery. Same input must produce the same output every time. Safe choices include ";" (default), "select 1", "select 'pg_doorman'", "select version()". Unsafe choices include "select now()", "select pg_is_in_recovery()" (changes on failover), or any UPDATE/DELETE/CALL — their first-call result would be cached and replayed indefinitely. Default ";" is no longer special-cased: it goes through the same cache flow. On a cold pool the first ";" now performs one backend round-trip and caches PostgreSQL's EmptyQueryResponse. Load balancers pinging through pg_doorman will surface a backend-unreachable condition as a probe failure on the very first probe of a fresh pool, instead of an unconditional OK from a hardcoded reply. Two new counter metrics expose the cache hit rate: - pg_doorman_pooler_check_query_backend_total (cache miss → backend probe) - pg_doorman_pooler_check_query_cache_total (cache hit → served from memory)

Operator-facing documentation for the cache shipped in the previous commit: - Stability contract: the query must be deterministic and side-effect free. - Safe values: ";", "select 1", "select 'pg_doorman'", "select version()". - Unsafe values with explicit cache-freeze failure modes: select now() / clock_timestamp() (frozen timestamp), pg_is_in_recovery() (failover blindness), counting queries (frozen count), UPDATE/INSERT/DELETE/CALL/DO (one-shot side effect cached as success forever). - Cold-pool behavior change called out: the first probe per pool now performs one PostgreSQL round-trip, even for the default ";". - Two new counter metrics with the hit-rate formula. - 3.10.0 changelog entry. pg_doorman.toml and pg_doorman.yaml regenerated from the updated fields.yaml so the library reference tests stay in sync.

The field was declared as Option<Vec<u8>> on the config struct, defaulted to None, and read by poller_check_query_request_bytes_vec via an if-let-Some shortcut. Nothing ever wrote a Some value into it, so the shortcut was unreachable and the field was permanently dead. Removed the field, the unreachable branch, and its entry in the fields.yaml structural-allowlist for the config validator.

…lity Multi-discipline review flagged three correctness issues and a hot-path regression: - Stale-client cache poisoning across RELOAD. Before, the byte-match used a snapshot captured at client connect, while the cache key came from the live config in the handler. A RELOAD that races with an in-flight probe could write a cache entry for one query under the key of another. Now the byte-match and the cache key read the same process-wide snapshot, swapped atomically when the config reloads. Bytes from a stale config no longer match the active probe; they fall through to the normal SQL path instead. - The handler re-resolved the pool through the global pool map, which could pick a different generation than the client's authenticated pool after a RELOAD or dynamic-pool swap. The authenticated pool is now threaded through. - Backend lifecycle was incomplete. recv/send/cleanup errors did not mark the backend bad before returning it to the pool, and the cache accepted any response without an ErrorResponse, including non-idle states like an open transaction or a COPY in progress. Errors now mark the backend bad before propagating, and the cache only accepts responses that end in ReadyForQuery('I')dle. Hot path: the cache reader switched from load_full to a guarded read, dropping an atomic refcount bump per warm hit. The full Config clone per probe is gone — the new snapshot already carries the query string. Cleanup: removed the per-client request_bytes vec and its initialisers, the unused poller_check_query_request_bytes_vec helper, and a config field that was declared but never written. Docs: the two probe counters are now in the canonical prometheus.md reference (EN regenerated, RU hand-edited). Tests: - 9 new unit tests (snapshot encoding, idle-RFQ detector). - 1 new BDD scenario: stale probe bytes after RELOAD bypass the cache path and reach the backend as regular SQL; the new probe value drives a fresh backend hit and populates a fresh cache entry.

@rust-1

The BDD matrix in .github/workflows/bdd-tests.yml filters scenarios by @rust-1..4 / @go / @nodejs / etc. tags. The cache feature shipped with only @pooler-check-query-cache, so every CI job silently skipped it while reporting green. Adding the @rust @rust-2 tags routes the six scenarios into the existing Rust part 2 job, the least-loaded Rust shard in the current matrix.

@rust-1

The BDD matrix in .github/workflows/bdd-tests.yml dispatches scenarios by matrix tag (@rust-1..4, @go, @python, @nodejs, etc.). Sixteen feature files carried only their own private tags (@config-test, @connection, @hba, @prometheus, ...) — they passed `make test-bdd TAGS=@foo` locally but every CI job silently skipped them while the suite reported green. The lapse was caught when feat/pooler-check-query-cache landed on the same trap. Added @rust @rust-N to each, distributing across the three less-loaded shards (rust-1 +5, rust-2 +5, rust-4 +6) and leaving rust-3 (sleep-heavy) untouched. Resulting balance: 8 / 9 / 20 / 14. `bench.feature` already runs via .github/workflows/bench.yml in release mode, so it stays on @bench alone.

The matrix wiring in the previous commit pulled sixteen orphan features into CI. Two of them masked existing bugs and two have pre-existing product issues outside this PR's scope. Fixed: - src/app/config.rs: when -t / --test-config exits on parse error, the message now always goes to stderr. Previously the branch for non-terminal stdin used `log::error!`, but the logger is not yet initialized at that point, so CI captured an empty stream. The scenario "Test invalid configuration file with -t flag" now sees the expected "Config parse error:" prefix. - tests/go/hba/hba_trust_test.go: the HBA reject error message no longer prefixes the peer with "IP address" and now includes the source port. The Go test for HBA deny was pinned to the older string and would never match the current output. Loosened the substring assertion to the stable portion ("to <user>@<db> (TLS: false) is not permitted by HBA configuration") which still proves the path is taken. Deferred (matrix tag reverted, scenarios silently skipped again, TODO issue notes saved in MEMORY/project_orphan_bdd_features.md): - pool-size-show-pools.feature "Dynamic pool" scenario: auth_query passthrough returns pool_size=1 instead of the configured default_pool_size=3. Product bug in dynamic-pool size resolution. - rust-deferred-begin-bug.feature: all four scenarios skip because a step definition the file relies on is no longer registered in the test binary. Needs a separate investigation of the extended-session helper.

pool-size-show-pools.feature "Dynamic pool" scenario: the YAML used auth_query.default_pool_size, a key that was renamed to auth_query.pool_size in commit e383a7a but never propagated here. With the new value the SHOW POOLS column matches the configured 3. Matrix tag @rust-4 restored. rust-deferred-begin-bug.feature: implemented the missing "session should receive ParseComplete/BindComplete/CommandComplete 'tag'/ReadyForQuery 'status'" step bindings as a thin helper over the already-captured session message log. Also renamed "we create extended session" to "we create session" — extended-protocol sessions are the same regular sessions, the Parse/Bind/Execute/Sync steps already exist. The deferred-begin scenarios still fail one assertion: after Parse "BEGIN" + Bind + Execute + Sync the server backend is allocated (sv_active=1), and the test wants 0. That is a real product gap — pg_doorman only short-circuits standalone BEGIN over SimpleQuery (is_standalone_begin in client/transaction.rs), not over the extended protocol. Implementing extended-BEGIN deferral is its own change. The matrix tag stays off until the gap is closed; the new step bindings are kept because they are useful to whichever PR picks this up.

dmitrivasilyev added 8 commits May 15, 2026 20:50

vadv merged commit 0884c63 into master May 16, 2026
51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache pooler_check_query response per pool#254

Cache pooler_check_query response per pool#254
vadv merged 8 commits into
masterfrom
feat/pooler-check-query-cache

vadv commented May 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vadv commented May 15, 2026

Summary

Behavior changes

Operator contract

New metrics

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant