-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
toygresToygres test applicationToygres test application
Description
Repository
Concept
The instance actor (long-running orchestration managing instance lifecycle) monitors primary health and automatically triggers failover when issues are detected.
Health Monitoring
pub struct HealthMonitorConfig {
pub check_interval: Duration,
pub failure_threshold: u32,
pub check_timeout: Duration,
pub max_eligible_lag_bytes: u64,
}
pub struct HealthCheck {
pub connectivity: bool,
pub query_responsive: bool,
pub replication_healthy: bool,
pub disk_space_ok: bool,
}Instance Actor Integration
- Use durable timers for health check scheduling
- Track consecutive failures
- When threshold exceeded, select best replica and trigger automatic failover
- If no eligible replica, send critical alert
Failover Target Selection
Select replica with:
- State == Streaming
- Replication lag within threshold
- Minimum lag among eligible replicas
Safeguards
- Cooldown period between automatic failovers
- Manual override to disable automatic failover
- Quorum check to prevent false positives from network partition
- Notification to on-call before/during failover
- Audit log of all automatic failover decisions
See: proposals/toygres-improvements.md
Metadata
Metadata
Assignees
Labels
toygresToygres test applicationToygres test application