Skip to content

what happens when temporarily loosing replicas  #30

@aminebt

Description

@aminebt

This is a question whether the behavior we observe in k8s is the expected one, otherwise if there's a misconfiguration on our side.

  • We're using pgpool in front of a zalando-patroni managed postgres cluster. The cluster is made of a leader (exposed through k8s service "postgres-db-pg-cluster") and two replicas (exposed through k8s service "postgres-db-pg-cluster-repl"). This question applies even if there is a single replica pod.
  • we're pretty much using the configuration described in this repo.
  • sometimes, incidents occur and replication is broken between leader pod and replica(s), which become unhealthy. therefore postgres-db-pg-cluster-repl doesn't have endpoints anymore (they fail liveness probes).

Observed behavior: while replicas are down, all postgres clients report errors connecting to postgres (through pgpool), despite the leader postgres pod being healthy.
Expected behavior: given postgres leader is still up and running, I was expecting pgpool to accept client connections and use the remaining healthy backend.
How to reproduce it : we can simulate this situation based on any similar set-up on k8s and tampering with the replica service selector so that it doesn't have any endpoints (this is equivalent to replica pods being unhealthy and therefore excluded from the replica service).

Remark:
the expected behavior can be achieved if I set those two parameters to the values below :
backend_flag1 = 'ALLOW_TO_FAILOVER'
failover_on_backend_error = 'on'
but in that case when replicas are back to a healthy state, the corresponding backend is not reattached, even if auto_failback is set to on (but that perhaps is a different story/question about sr_check and application_name, etc.).

appreciate your help and clarifications on this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions