[toygres] Automatic failover monitoring via instance actor

## Repository

[affandar/toygres](https://github.com/affandar/toygres)

## Concept

The instance actor (long-running orchestration managing instance lifecycle) monitors primary health and automatically triggers failover when issues are detected.

## Health Monitoring

```rust
pub struct HealthMonitorConfig {
    pub check_interval: Duration,
    pub failure_threshold: u32,
    pub check_timeout: Duration,
    pub max_eligible_lag_bytes: u64,
}

pub struct HealthCheck {
    pub connectivity: bool,
    pub query_responsive: bool,
    pub replication_healthy: bool,
    pub disk_space_ok: bool,
}
```

## Instance Actor Integration

- Use durable timers for health check scheduling
- Track consecutive failures
- When threshold exceeded, select best replica and trigger automatic failover
- If no eligible replica, send critical alert

## Failover Target Selection

Select replica with:
- State == Streaming
- Replication lag within threshold
- Minimum lag among eligible replicas

## Safeguards

- Cooldown period between automatic failovers
- Manual override to disable automatic failover
- Quorum check to prevent false positives from network partition
- Notification to on-call before/during failover
- Audit log of all automatic failover decisions

See: `proposals/toygres-improvements.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[toygres] Automatic failover monitoring via instance actor #26

Repository

Concept

Health Monitoring

Instance Actor Integration

Failover Target Selection

Safeguards

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[toygres] Automatic failover monitoring via instance actor #26

Description

Repository

Concept

Health Monitoring

Instance Actor Integration

Failover Target Selection

Safeguards

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions