Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions documentation/en/src/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,65 @@
# Changelog

### 3.10.0

Per-pool response cache for `general.pooler_check_query`. The first
matching SimpleQuery in each pool's lifetime is forwarded to PostgreSQL;
every subsequent matching probe is answered from the cache without
touching the backend.

#### Behavior change for cold pools

Before this release pg_doorman answered any `pooler_check_query` match
locally with a hardcoded empty result. The default `;` came back instantly
without ever talking to PostgreSQL, and a non-empty value such as `select 1`
returned an empty response that did not match what a real PostgreSQL would
have produced.

The first probe per pool now does one PostgreSQL round-trip and captures
the real response. If PostgreSQL is unreachable at that moment, the
probing client sees a probe failure instead of an unconditional OK; the
earlier hardcode reported the pooler as healthy even when PostgreSQL was
down. Typical JDBC keepalive queries such as `select 1` (WildFly, HikariCP)
and `select 'pg_doorman'` now return the expected row.

#### Cache lifecycle

The cache is per pool and keyed by the query string. A `RELOAD` that
changes `pooler_check_query` invalidates the cache on the next ping; the
new value triggers one fresh backend probe and is then served from cache
until the value changes again. A reload that keeps the same value keeps
the cached response. `ErrorResponse` from the backend is forwarded to
the client unchanged and is never cached, so the next probe retries
against PostgreSQL.

#### Operator contract

`pooler_check_query` must be stable: the same input must produce the
same bytes, with no side effects. Safe values: `;`, `select 1`,
`select 'pg_doorman'`, `select version()`.

Unsafe values that the cache will silently freeze:

- `select now()`, `select clock_timestamp()` — the cached timestamp
never advances.
- `select pg_is_in_recovery()` — a failover flips the role on
PostgreSQL but the cached response still reports the old role.
- `select count(*) from <table>` — the cached count is whatever the
first probe observed.
- `UPDATE`, `INSERT`, `DELETE`, `CALL`, `DO` — the side effect runs
once and the success response is cached forever.

#### New metrics

- `pg_doorman_pooler_check_query_backend_total` — counter, increments
on each probe forwarded to PostgreSQL (cache miss or
RELOAD-induced re-probe).
- `pg_doorman_pooler_check_query_cache_total` — counter, increments
on each probe served from the cache.

The ratio `cache_total / (cache_total + backend_total)` is the cache
hit rate.

### 3.9.1

Web admin console refresh and a follow-up pass on `startup_parameters`.
Expand Down
37 changes: 34 additions & 3 deletions documentation/ru/src/reference/general.md
Original file line number Diff line number Diff line change
Expand Up @@ -712,8 +712,39 @@ hostnossl all all 192.168.1.0/24 trust

### pooler_check_query

Когда клиент отправляет ровно этот запрос как SimpleQuery, pg_doorman отвечает немедленно
без перенаправления его в PostgreSQL. Полезно для health checks от балансировщиков нагрузки (HAProxy, в стиле pgbouncer `SELECT 1`)
или keepalive-проб уровня приложения. Значение по умолчанию `;` (пустой statement) — самая лёгкая возможная проверка.
Когда клиент отправляет ровно этот запрос как SimpleQuery, pg_doorman обслуживает его через
кеш ответа на уровне пула. Первая совпадающая проба за время жизни каждого пула отправляется
в PostgreSQL, и полный ответ сохраняется. Последующие совпадающие пробы отвечаются из кеша
без обращения к бэкенду.

Кеш индексируется по строке запроса. `RELOAD` с другим значением `pooler_check_query`
инвалидирует кеш на следующей пробе; новое значение вызывает одну свежую пробу к бэкенду
и затем обслуживается из кеша, пока значение снова не изменится. `RELOAD` с тем же значением
не сбрасывает кеш. `ErrorResponse` от бэкенда передаётся клиенту без изменений и не кешируется,
поэтому следующая проба снова идёт в PostgreSQL.

Поведение на холодном пуле изменилось: первая проба в каждом пуле теперь делает один
round-trip к PostgreSQL даже для значения по умолчанию `;`. Если PostgreSQL в этот момент
недоступен, клиент-пробер увидит ошибку пробы вместо безусловного OK. Прежний хардкод
локального ответа сообщал, что пулер здоров, даже когда PostgreSQL лежал, и для непустых
значений вроде `select 1` возвращал пустой ответ.

**Контракт для оператора.** Запрос должен быть детерминированным: один и тот же ввод
обязан давать один и тот же набор байт, без побочных эффектов. Безопасные значения:
`;`, `select 1`, `select 'pg_doorman'`, `select version()`.

Небезопасные значения, которые кеш молча заморозит:

- `select now()`, `select clock_timestamp()` — закешированный timestamp перестанет идти вперёд.
- `select pg_is_in_recovery()` — failover поменяет роль на PostgreSQL, но закешированный ответ
всё ещё будет показывать прежнюю роль.
- `select count(*) from <table>` — закешированное число останется тем, что увидела первая проба.
- `UPDATE`, `INSERT`, `DELETE`, `CALL`, `DO` — побочный эффект выполнится один раз, а ответ
об успехе закешируется навсегда.

Доля попаданий в кеш экспортируется двумя счётчиками без меток:
`pg_doorman_pooler_check_query_backend_total` (пробы, отправленные в PostgreSQL) и
`pg_doorman_pooler_check_query_cache_total` (пробы, обслуженные из кеша). Отношение
`cache_total / (cache_total + backend_total)` — это hit rate.

По умолчанию: `";"`.
2 changes: 2 additions & 0 deletions documentation/ru/src/reference/prometheus.md
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,8 @@ Query interner общий для процесса. У этих метрик не
| `pg_doorman_query_interner_evictions_total` | Counter по `kind` и `reason` (`gc_passive` или `ttl_expired`). Named-записи удаляются, когда их больше не держит ни один кеш вне interner; anonymous-записи удаляются после idle TTL. |
| `pg_doorman_query_interner_synthetic_misses_total` | Counter синтетических ответов SQLSTATE `26000` для anonymous prepared statements, состояние которых уже недоступно при последующем `Bind` или `Describe`. Перед увеличением `query_interner_anon_idle_ttl_seconds` проверьте вытеснения из клиентского Anonymous LRU, WARN-логи, `RESET INTERNER` и TTL-вытеснения. |
| `pg_doorman_query_interner_gc_duration_seconds` | Гистограмма времени одного прохода GC interner (named и anonymous вместе), в секундах. Помогает увидеть, когда большой interner делает обход заметным. |
| `pg_doorman_pooler_check_query_backend_total` | Counter пробов `pooler_check_query`, отправленных в PostgreSQL (промах кеша или повторная проба после RELOAD). После прогрева значение должно быть стабильным; постоянно растущий rate означает, что популовый кеш не удерживает запись. |
| `pg_doorman_pooler_check_query_cache_total` | Counter пробов `pooler_check_query`, обслуженных из популового кеша ответа без обращения к бэкенду. Hit rate = `cache_total / (cache_total + backend_total)`. |

### Метрики серверного TLS

Expand Down
6 changes: 4 additions & 2 deletions pg_doorman.toml
Original file line number Diff line number Diff line change
Expand Up @@ -235,8 +235,10 @@ sync_server_parameters = false
# Default: 1048576 (1048576 bytes)
message_size_to_be_stream = 1048576

# Query intercepted by pg_doorman and answered locally (never reaches PostgreSQL).
# Used by load balancers and monitoring to check if the pooler is alive.
# SimpleQuery used by load balancers and monitoring as a liveness probe.
# The first match per pool is forwarded to PostgreSQL; the response is cached
# and reused for every subsequent match without touching the backend.
# The query must be stable (same input → same output) and side-effect free.
# Default: ";"
pooler_check_query = ";"

Expand Down
6 changes: 4 additions & 2 deletions pg_doorman.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -275,8 +275,10 @@ general:
# Default: "1MB" (1048576 bytes)
message_size_to_be_stream: "1MB"

# Query intercepted by pg_doorman and answered locally (never reaches PostgreSQL).
# Used by load balancers and monitoring to check if the pooler is alive.
# SimpleQuery used by load balancers and monitoring as a liveness probe.
# The first match per pool is forwarded to PostgreSQL; the response is cached
# and reused for every subsequent match without touching the backend.
# The query must be stable (same input → same output) and side-effect free.
# Default: ";"
pooler_check_query: ";"

Expand Down
14 changes: 4 additions & 10 deletions src/app/config.rs
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
use log::error;
use std::io::{self, IsTerminal, Write};

use crate::config::{get_config, Config};
use tokio::runtime::Builder;

Expand All @@ -15,13 +12,10 @@ pub fn init_config(args: &Args) -> Result<Config, Box<dyn std::error::Error>> {
match crate::config::parse(args.config_file.as_str()).await {
Ok(_) => (),
Err(err) => {
let stdin = io::stdin();
if stdin.is_terminal() {
eprintln!("Config parse error: {err}");
io::stdout().flush().unwrap();
} else {
error!("Config parse error: {err:?}");
}
// Always write to stderr — the logger has not been
// initialized yet, so `log::error!` is swallowed on
// non-terminal stdin (CI, supervisor).
eprintln!("Config parse error: {err}");
std::process::exit(exitcode::CONFIG);
}
};
Expand Down
12 changes: 6 additions & 6 deletions src/app/generate/annotated.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2149,14 +2149,15 @@ mod tests {
if !trimmed.starts_with("pub ") || !trimmed.contains(':') {
return None;
}
// Skip pub fn, pub mod, pub struct, pub enum, pub use, pub type, pub const
// Skip pub fn, pub mod, pub struct, pub enum, pub use, pub type, pub const, pub static
if trimmed.starts_with("pub fn ")
|| trimmed.starts_with("pub mod ")
|| trimmed.starts_with("pub struct ")
|| trimmed.starts_with("pub enum ")
|| trimmed.starts_with("pub use ")
|| trimmed.starts_with("pub type ")
|| trimmed.starts_with("pub const ")
|| trimmed.starts_with("pub static ")
{
return None;
}
Expand Down Expand Up @@ -2252,11 +2253,10 @@ mod tests {

// Structural/internal fields that don't have their own fields.yaml entry
let structural_fields: &[&str] = &[
"users", // nested sub-section
"pools", // top-level section
"path", // internal runtime field
"pooler_check_query_request_bytes", // derived from pooler_check_query
"auth_query", // nested struct, checked via "auth_query" section
"users", // nested sub-section
"pools", // top-level section
"path", // internal runtime field
"auth_query", // nested struct, checked via "auth_query" section
];

// AuthQueryConfig pub fields live in pool.rs alongside Pool pub fields.
Expand Down
4 changes: 3 additions & 1 deletion src/app/generate/docs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -508,7 +508,9 @@ fn write_prometheus_metrics_section(out: &mut String) {
let _ = writeln!(out, "| `pg_doorman_query_interner_bytes` | Gauge by `kind` (`named` or `anonymous`). Total bytes of interned query text. Refreshed once per GC sweep. |");
let _ = writeln!(out, "| `pg_doorman_query_interner_evictions_total` | Counter by `kind` and `reason` (`gc_passive` or `ttl_expired`). Named entries are removed when no cache outside the interner still holds them; anonymous entries are removed after the idle TTL. |");
let _ = writeln!(out, "| `pg_doorman_query_interner_synthetic_misses_total` | Counter of synthetic SQLSTATE `26000` responses for anonymous prepared statements whose state was no longer available when a later `Bind` or `Describe` referenced it. Check client Anonymous LRU evictions, WARN logs, `RESET INTERNER`, and TTL evictions before increasing `query_interner_anon_idle_ttl_seconds`. |");
let _ = writeln!(out, "| `pg_doorman_query_interner_gc_duration_seconds` | Histogram of one interner GC sweep (named and anonymous combined), in seconds. Use this to detect large interners that make sweep time visible. |\n");
let _ = writeln!(out, "| `pg_doorman_query_interner_gc_duration_seconds` | Histogram of one interner GC sweep (named and anonymous combined), in seconds. Use this to detect large interners that make sweep time visible. |");
let _ = writeln!(out, "| `pg_doorman_pooler_check_query_backend_total` | Counter of `pooler_check_query` probes forwarded to PostgreSQL (cache miss or RELOAD-induced re-probe). Steady-state value should be flat after warmup; a continuously rising rate means the per-pool cache is not retaining its entry. |");
let _ = writeln!(out, "| `pg_doorman_pooler_check_query_cache_total` | Counter of `pooler_check_query` probes answered from the per-pool response cache without touching the backend. Hit rate = `cache_total / (cache_total + backend_total)`. |\n");

// Grafana Dashboard
let _ = writeln!(out, "## Grafana Dashboard\n");
Expand Down
50 changes: 42 additions & 8 deletions src/app/generate/fields.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -758,15 +758,49 @@ fields:
pooler_check_query:
config:
en: |
Query intercepted by pg_doorman and answered locally (never reaches PostgreSQL).
Used by load balancers and monitoring to check if the pooler is alive.
ru: |
Запрос, перехватываемый pg_doorman и отвечаемый локально (не доходит до PostgreSQL).
Используется балансировщиками и мониторингом для проверки живости пулера.
SimpleQuery used by load balancers and monitoring as a liveness probe.
The first match per pool is forwarded to PostgreSQL; the response is cached
and reused for every subsequent match without touching the backend.
The query must be stable (same input → same output) and side-effect free.
ru: |
SimpleQuery, который балансировщики и мониторинг используют как проверку живости.
Первое совпадение в пуле отправляется в PostgreSQL; ответ кешируется и переиспользуется
для всех последующих совпадений без обращения к бэкенду.
Запрос должен быть детерминированным и без побочных эффектов.
doc: |
When a client sends this exact query as a SimpleQuery, pg_doorman responds immediately
without forwarding it to PostgreSQL. Useful for health checks from load balancers (HAProxy, pgbouncer-style `SELECT 1`)
or application-level keepalive probes. The default `;` (empty statement) is the lightest possible check.
When a client sends this exact query as a SimpleQuery, pg_doorman serves it through a per-pool
response cache. The first matching probe in each pool's lifetime is forwarded to PostgreSQL and
the full response is captured. Subsequent matching probes are answered from the cache without
touching the backend.

The cache is keyed by the query string. A `RELOAD` that changes `pooler_check_query` invalidates
the cache on the next ping; the new value triggers one fresh backend probe and is then served
from cache until the value changes again. A reload that keeps the same value keeps the cached
response. `ErrorResponse` from the backend is forwarded to the client unchanged and is never
cached, so the next probe retries against PostgreSQL.

Cold-pool behavior changed: the first probe per pool now does one PostgreSQL round-trip even
for the default `;`. If PostgreSQL is unreachable at that moment, the probing client sees a
probe failure instead of an unconditional OK. The earlier hardcoded local answer reported the
pooler as healthy even when PostgreSQL was down, and made non-empty values such as `select 1`
return an empty response.

**Operator contract.** The query must be stable: the same input must always produce the same
bytes, with no side effects. Safe values: `;`, `select 1`, `select 'pg_doorman'`, `select version()`.

Unsafe values that the cache will silently freeze:

- `select now()`, `select clock_timestamp()` — the cached timestamp never advances.
- `select pg_is_in_recovery()` — a failover flips the role on PostgreSQL but the cached response
still reports the old role.
- `select count(*) from <table>` — the cached count is whatever the first probe observed.
- `UPDATE`, `INSERT`, `DELETE`, `CALL`, `DO` — the side effect runs once and the success
response is cached forever.

Cache hit rate is exported as two counters without labels:
`pg_doorman_pooler_check_query_backend_total` (probes forwarded to PostgreSQL) and
`pg_doorman_pooler_check_query_cache_total` (probes served from cache). The ratio
`cache_total / (cache_total + backend_total)` is the hit rate.
default: '";"'

prepared_statements:
Expand Down
2 changes: 0 additions & 2 deletions src/client/core.rs
Original file line number Diff line number Diff line change
Expand Up @@ -585,8 +585,6 @@ pub struct Client<S, T> {

pub(crate) client_last_messages_in_tx: PooledBuffer,

pub(crate) pooler_check_query_request_vec: Vec<u8>,

/// Pending BEGIN message for deferred connection optimization.
/// When client sends standalone "begin;", we synthesize response
/// and defer actual BEGIN until next query arrives.
Expand Down
2 changes: 0 additions & 2 deletions src/client/migration.rs
Original file line number Diff line number Diff line change
Expand Up @@ -526,7 +526,6 @@ pub async fn reconstruct_client(
prepared,
client_last_messages_in_tx: PooledBuffer::new(),
max_memory_usage: config.general.max_memory_usage.as_bytes(),
pooler_check_query_request_vec: config.general.poller_check_query_request_bytes_vec(),
client_pending_begin: None,
#[cfg(unix)]
raw_fd,
Expand Down Expand Up @@ -633,7 +632,6 @@ pub async fn reconstruct_tls_client(
prepared,
client_last_messages_in_tx: PooledBuffer::new(),
max_memory_usage: config.general.max_memory_usage.as_bytes(),
pooler_check_query_request_vec: config.general.poller_check_query_request_bytes_vec(),
client_pending_begin: None,
#[cfg(unix)]
raw_fd,
Expand Down
Loading
Loading