Skip to content

Cache pooler_check_query response per pool#254

Merged
vadv merged 8 commits into
masterfrom
feat/pooler-check-query-cache
May 16, 2026
Merged

Cache pooler_check_query response per pool#254
vadv merged 8 commits into
masterfrom
feat/pooler-check-query-cache

Conversation

@vadv
Copy link
Copy Markdown
Collaborator

@vadv vadv commented May 15, 2026

Summary

Replaces the hardcoded local reply for general.pooler_check_query with a
per-pool response cache. The first matching SimpleQuery in each pool's
lifetime is forwarded to PostgreSQL; every subsequent matching probe is
answered from cache without touching the backend. This unblocks non-empty
keepalive queries that JDBC pools default to — select 1 from WildFly,
HikariCP and most JDBC drivers now returns the real row instead of an
empty response.

Behavior changes

  • The default ; is no longer special-cased. The first probe per pool
    performs one PostgreSQL round-trip and the response (empty for ;) is
    cached. If PostgreSQL is unreachable at that moment, the probing client
    sees a failure instead of an unconditional OK. The previous hardcoded
    reply reported the pooler healthy even when PostgreSQL was down.
  • Cache is per pool, keyed by the query string. A RELOAD that changes
    pooler_check_query invalidates the cache on the next ping. A reload
    that keeps the same value keeps the cached response. ErrorResponse
    from backend is forwarded to the client and never cached.

Operator contract

pooler_check_query must be stable: same input → same bytes, no side
effects. Safe: ;, select 1, select 'pg_doorman', select version().
Unsafe (cache will freeze the response): select now(),
select pg_is_in_recovery() (failover blindness),
select count(*) from <table>, UPDATE/INSERT/DELETE/CALL/DO.

New metrics

  • pg_doorman_pooler_check_query_backend_total — probes forwarded to
    PostgreSQL (cache miss).
  • pg_doorman_pooler_check_query_cache_total — probes served from cache.

Ratio cache_total / (cache_total + backend_total) is the hit rate.

Test plan

  • 17 unit tests: BufferingWriter, has_error_response frame parser,
    CheckQueryCache get/set semantics.
  • 5 BDD scenarios (make test-bdd TAGS=@pooler-check-query-cache):
    default ; cache, custom select 1 cache, RELOAD invalidation,
    backend ErrorResponse never cached, large response (10 KB) drained
    via the recv-loop with pool staying in sync.
  • cargo fmt --all clean.
  • cargo clippy --lib --tests -- --deny warnings clean.
  • Pre-existing flaky test_prometheus_server_basic reproduces on
    master too; unrelated.
  • make generate re-emitted pg_doorman.toml/yaml and EN reference
    docs from the updated fields.yaml; library reference tests in sync.

dmitrivasilyev added 8 commits May 15, 2026 20:50
Before, only ";" worked as pooler_check_query: pg_doorman always replied
with a hardcoded EmptyQueryResponse + ReadyForQuery, regardless of the
configured value. Setting "select 1" — the keep-alive query that Wildfly,
HikariCP and many JDBC pools default to — produced the same empty response,
which is semantically wrong for a non-empty query.

Now pg_doorman forwards the first matching SimpleQuery to PostgreSQL,
caches the response per pool, and serves subsequent matching probes from
cache without touching the backend. The cache validates against the
current general.pooler_check_query value on every lookup, so a RELOAD
that changes the value triggers a fresh backend probe on the next ping
without an explicit invalidation hook. ErrorResponse from the backend
is never cached: the next probe re-fetches.

The operator's responsibility: pooler_check_query must be a stable,
side-effect-free SimpleQuery. Same input must produce the same output
every time. Safe choices include ";" (default), "select 1",
"select 'pg_doorman'", "select version()". Unsafe choices include
"select now()", "select pg_is_in_recovery()" (changes on failover), or
any UPDATE/DELETE/CALL — their first-call result would be cached and
replayed indefinitely.

Default ";" is no longer special-cased: it goes through the same cache
flow. On a cold pool the first ";" now performs one backend round-trip
and caches PostgreSQL's EmptyQueryResponse. Load balancers pinging
through pg_doorman will surface a backend-unreachable condition as a
probe failure on the very first probe of a fresh pool, instead of an
unconditional OK from a hardcoded reply.

Two new counter metrics expose the cache hit rate:
- pg_doorman_pooler_check_query_backend_total (cache miss → backend probe)
- pg_doorman_pooler_check_query_cache_total (cache hit → served from memory)
Operator-facing documentation for the cache shipped in the previous
commit:

- Stability contract: the query must be deterministic and side-effect
  free.
- Safe values: ";", "select 1", "select 'pg_doorman'",
  "select version()".
- Unsafe values with explicit cache-freeze failure modes:
  select now() / clock_timestamp() (frozen timestamp),
  pg_is_in_recovery() (failover blindness), counting queries
  (frozen count), UPDATE/INSERT/DELETE/CALL/DO (one-shot side
  effect cached as success forever).
- Cold-pool behavior change called out: the first probe per pool
  now performs one PostgreSQL round-trip, even for the default ";".
- Two new counter metrics with the hit-rate formula.
- 3.10.0 changelog entry.

pg_doorman.toml and pg_doorman.yaml regenerated from the updated
fields.yaml so the library reference tests stay in sync.
The field was declared as Option<Vec<u8>> on the config struct, defaulted
to None, and read by poller_check_query_request_bytes_vec via an
if-let-Some shortcut. Nothing ever wrote a Some value into it, so the
shortcut was unreachable and the field was permanently dead.

Removed the field, the unreachable branch, and its entry in the
fields.yaml structural-allowlist for the config validator.
…lity

Multi-discipline review flagged three correctness issues and a hot-path
regression:

- Stale-client cache poisoning across RELOAD. Before, the byte-match used
  a snapshot captured at client connect, while the cache key came from the
  live config in the handler. A RELOAD that races with an in-flight probe
  could write a cache entry for one query under the key of another. Now
  the byte-match and the cache key read the same process-wide snapshot,
  swapped atomically when the config reloads. Bytes from a stale config
  no longer match the active probe; they fall through to the normal SQL
  path instead.

- The handler re-resolved the pool through the global pool map, which
  could pick a different generation than the client's authenticated pool
  after a RELOAD or dynamic-pool swap. The authenticated pool is now
  threaded through.

- Backend lifecycle was incomplete. recv/send/cleanup errors did not mark
  the backend bad before returning it to the pool, and the cache accepted
  any response without an ErrorResponse, including non-idle states like
  an open transaction or a COPY in progress. Errors now mark the backend
  bad before propagating, and the cache only accepts responses that end
  in ReadyForQuery('I')dle.

Hot path: the cache reader switched from load_full to a guarded read,
dropping an atomic refcount bump per warm hit. The full Config clone
per probe is gone — the new snapshot already carries the query string.

Cleanup: removed the per-client request_bytes vec and its initialisers,
the unused poller_check_query_request_bytes_vec helper, and a config
field that was declared but never written.

Docs: the two probe counters are now in the canonical prometheus.md
reference (EN regenerated, RU hand-edited).

Tests:
- 9 new unit tests (snapshot encoding, idle-RFQ detector).
- 1 new BDD scenario: stale probe bytes after RELOAD bypass the cache
  path and reach the backend as regular SQL; the new probe value drives
  a fresh backend hit and populates a fresh cache entry.
The BDD matrix in .github/workflows/bdd-tests.yml filters scenarios by
@rust-1..4 / @go / @nodejs / etc. tags. The cache feature shipped with
only @pooler-check-query-cache, so every CI job silently skipped it
while reporting green. Adding the @rust @rust-2 tags routes the six
scenarios into the existing Rust part 2 job, the least-loaded Rust
shard in the current matrix.
The BDD matrix in .github/workflows/bdd-tests.yml dispatches scenarios by
matrix tag (@rust-1..4, @go, @python, @nodejs, etc.). Sixteen feature
files carried only their own private tags (@config-test, @connection,
@hba, @prometheus, ...) — they passed `make test-bdd TAGS=@foo` locally
but every CI job silently skipped them while the suite reported green.
The lapse was caught when feat/pooler-check-query-cache landed on the
same trap.

Added @rust @rust-N to each, distributing across the three less-loaded
shards (rust-1 +5, rust-2 +5, rust-4 +6) and leaving rust-3
(sleep-heavy) untouched. Resulting balance: 8 / 9 / 20 / 14.

`bench.feature` already runs via .github/workflows/bench.yml in release
mode, so it stays on @bench alone.
The matrix wiring in the previous commit pulled sixteen orphan features
into CI. Two of them masked existing bugs and two have pre-existing
product issues outside this PR's scope.

Fixed:

- src/app/config.rs: when -t / --test-config exits on parse error, the
  message now always goes to stderr. Previously the branch for
  non-terminal stdin used `log::error!`, but the logger is not yet
  initialized at that point, so CI captured an empty stream. The
  scenario "Test invalid configuration file with -t flag" now sees the
  expected "Config parse error:" prefix.

- tests/go/hba/hba_trust_test.go: the HBA reject error message no longer
  prefixes the peer with "IP address" and now includes the source port.
  The Go test for HBA deny was pinned to the older string and would
  never match the current output. Loosened the substring assertion to
  the stable portion ("to <user>@<db> (TLS: false) is not permitted by
  HBA configuration") which still proves the path is taken.

Deferred (matrix tag reverted, scenarios silently skipped again, TODO
issue notes saved in MEMORY/project_orphan_bdd_features.md):

- pool-size-show-pools.feature "Dynamic pool" scenario: auth_query
  passthrough returns pool_size=1 instead of the configured
  default_pool_size=3. Product bug in dynamic-pool size resolution.

- rust-deferred-begin-bug.feature: all four scenarios skip because a
  step definition the file relies on is no longer registered in the
  test binary. Needs a separate investigation of the extended-session
  helper.
pool-size-show-pools.feature "Dynamic pool" scenario: the YAML used
auth_query.default_pool_size, a key that was renamed to auth_query.pool_size
in commit e383a7a but never propagated here. With the new value the
SHOW POOLS column matches the configured 3. Matrix tag @rust-4 restored.

rust-deferred-begin-bug.feature: implemented the missing
"session should receive ParseComplete/BindComplete/CommandComplete
'tag'/ReadyForQuery 'status'" step bindings as a thin helper over the
already-captured session message log. Also renamed
"we create extended session" to "we create session" — extended-protocol
sessions are the same regular sessions, the Parse/Bind/Execute/Sync
steps already exist.

The deferred-begin scenarios still fail one assertion: after
Parse "BEGIN" + Bind + Execute + Sync the server backend is allocated
(sv_active=1), and the test wants 0. That is a real product gap —
pg_doorman only short-circuits standalone BEGIN over SimpleQuery
(is_standalone_begin in client/transaction.rs), not over the extended
protocol. Implementing extended-BEGIN deferral is its own change. The
matrix tag stays off until the gap is closed; the new step bindings
are kept because they are useful to whichever PR picks this up.
@vadv vadv merged commit 0884c63 into master May 16, 2026
51 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant