-
Notifications
You must be signed in to change notification settings - Fork 42
Description
This is a question whether the behavior we observe in k8s is the expected one, otherwise if there's a misconfiguration on our side.
- We're using pgpool in front of a zalando-patroni managed postgres cluster. The cluster is made of a leader (exposed through k8s service "postgres-db-pg-cluster") and two replicas (exposed through k8s service "postgres-db-pg-cluster-repl"). This question applies even if there is a single replica pod.
- we're pretty much using the configuration described in this repo.
- sometimes, incidents occur and replication is broken between leader pod and replica(s), which become unhealthy. therefore postgres-db-pg-cluster-repl doesn't have endpoints anymore (they fail liveness probes).
Observed behavior: while replicas are down, all postgres clients report errors connecting to postgres (through pgpool), despite the leader postgres pod being healthy.
Expected behavior: given postgres leader is still up and running, I was expecting pgpool to accept client connections and use the remaining healthy backend.
How to reproduce it : we can simulate this situation based on any similar set-up on k8s and tampering with the replica service selector so that it doesn't have any endpoints (this is equivalent to replica pods being unhealthy and therefore excluded from the replica service).
Remark:
the expected behavior can be achieved if I set those two parameters to the values below :
backend_flag1 = 'ALLOW_TO_FAILOVER'
failover_on_backend_error = 'on'
but in that case when replicas are back to a healthy state, the corresponding backend is not reattached, even if auto_failback is set to on (but that perhaps is a different story/question about sr_check and application_name, etc.).
appreciate your help and clarifications on this.