-
Notifications
You must be signed in to change notification settings - Fork 77
Implement replication lag detection for automatic replica traffic management #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Co-authored-by: christoudias <[email protected]>
|
1 similar comment
|
Co-authored-by: christoudias <[email protected]>
@levkk please consider increasing the priority of this feature, it is extremely important for using read balancing in production. |
@@ -12,6 +12,10 @@ read_write_strategy = "aggressive" | |||
prepared_statements_limit = 500 | |||
# client_idle_timeout = 5_000 | |||
|
|||
# Replication lag detection settings | |||
# replication_lag_check_interval = 10_000 # Check every 10 seconds (default) | |||
# max_replication_lag_bytes = 1048576 # Ban replicas lagging by more than 1MB (default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# max_replication_lag_bytes = 1048576 # Ban replicas lagging by more than 1MB (default) | |
# max_replication_lag_bytes = 1048576 # Ban replicas lagging by more than 1MB (default) |
this threshold is too low, and it can lead to frequent ban of the replica. I think it should be at least 10MB or even 100MB by default.
This PR implements replication lag detection for pgdog, enabling automatic traffic management based on replica lag status. When replicas fall behind, they are automatically excluded from traffic until they catch up.
Features
🔄 Automatic Lag Detection
pg_stat_replication
on the primary serverpg_current_wal_flush_lsn()
🚦 Traffic Management
⚙️ Configuration
🏗️ Implementation Details
Example Usage
With a 3-replica setup, if replica-2 starts lagging:
Traffic automatically shifts to healthy replicas. Once replica-2 catches up, it's automatically re-enabled.
Testing
Added comprehensive unit tests covering:
Fixes #215.
💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.