Skip to content

[toygres] Manual failover support #25

@affandar

Description

@affandar

Repository

affandar/toygres

Concept

Allow operators to manually trigger failover from primary to a designated replica. Useful for:

  • Planned maintenance on primary
  • Testing disaster recovery procedures
  • Upgrading primary with minimal downtime

API

client.failover(FailoverRequest {
    instance_id: "pg-prod-1".into(),
    target_replica_id: "pg-prod-1-replica-1".into(),
    options: FailoverOptions {
        wait_for_sync: true,
        sync_timeout: Duration::from_secs(300),
        old_primary_action: OldPrimaryAction::ConvertToReplica,
    },
}).await?;

pub enum OldPrimaryAction {
    ConvertToReplica,  // Convert old primary to replica of new primary
    StopAndRetain,     // Stop old primary but keep data
    Terminate,         // Terminate old primary completely
}

Orchestration Flow

  1. Validate target replica exists and is healthy
  2. If wait_for_sync: pause writes, wait for replica to catch up
  3. Fence primary (prevent writes)
  4. Promote replica to primary
  5. Handle old primary per OldPrimaryAction
  6. Update DNS / connection routing
  7. Update instance metadata

See: proposals/toygres-improvements.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    toygresToygres test application

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions