Task: Implement automatic rollback mechanism on health check failures
Description
Implement a comprehensive automatic rollback system that monitors application health during and after deployments, automatically reverting to the previous stable version when health checks fail. This critical safety mechanism protects production environments from broken deployments by continuously validating application health using configurable health check strategies (HTTP endpoints, TCP connections, custom scripts) and orchestrating seamless rollbacks when issues are detected.
Modern deployment strategies (rolling updates, blue-green, canary) require intelligent health validation to ensure applications remain available. This task builds the foundation for production-grade deployment safety by implementing:
- Multi-Strategy Health Checking: HTTP endpoint validation, TCP port checks, custom script execution, container status monitoring
- Configurable Health Policies: Define success criteria (status codes, response times, consecutive successes), failure thresholds (max retries, timeout durations)
- Automated Rollback Orchestration: Trigger rollback on health check failures, coordinate state restoration across deployment strategies, preserve previous deployment artifacts
- Health Check Persistence: Store health check results in database, track health history for post-deployment analysis, generate health trend reports
- Real-Time Notifications: Alert administrators via WebSocket, email, and Slack when health checks fail and rollbacks execute
- Deployment History: Maintain comprehensive deployment and rollback audit trail with state snapshots
Integration with Existing Coolify Architecture:
- Extends
ApplicationDeploymentJob with health check validation phases
- Integrates with existing
Server SSH execution infrastructure via ExecuteRemoteCommand trait
- Uses Coolify's existing notification system for health check failure alerts
- Leverages Docker container inspection for health status validation
- Coordinates with proxy configuration updates (Nginx/Traefik) for traffic management
Integration with Enterprise Deployment System:
- Works with
EnhancedDeploymentService (Task 32) for strategy-aware rollbacks
- Coordinates with
CapacityManager (Task 26) for resource state restoration
- Uses health check data in deployment decision-making algorithms
- Integrates with resource monitoring for correlation between health and resource usage
Why this task is critical: Automatic rollback is the safety net that prevents catastrophic production failures. Without health-based rollbacks, broken deployments can take applications offline for extended periods while administrators manually diagnose and fix issues. Automated rollback restores service within seconds, minimizing downtime and customer impact. This transforms deployments from high-risk operations requiring human supervision into reliable automated processes that self-correct when problems occur.
Acceptance Criteria
Core Functionality
Rollback Orchestration
Deployment Strategy Integration
Configuration & Policy
Persistence & Reporting
Notifications & Alerts
Error Handling & Edge Cases
Technical Details
File Paths
Service Layer (NEW):
app/Services/Enterprise/Deployment/HealthCheckService.php - Health check execution and validation
app/Services/Enterprise/Deployment/RollbackOrchestrator.php - Rollback coordination across strategies
app/Contracts/HealthCheckServiceInterface.php - Health check service interface
app/Contracts/RollbackOrchestratorInterface.php - Rollback orchestrator interface
Models (NEW):
app/Models/Enterprise/HealthCheckConfig.php - Health check configuration per application
app/Models/Enterprise/HealthCheckResult.php - Health check execution results
app/Models/Enterprise/DeploymentHistory.php - Deployment and rollback audit trail
app/Models/Enterprise/DeploymentSnapshot.php - State snapshots for rollback restoration
Jobs (ENHANCE EXISTING):
app/Jobs/ApplicationDeploymentJob.php - Enhance with health check validation phase
app/Jobs/HealthCheckMonitorJob.php - NEW: Scheduled health check monitoring post-deployment
Actions (NEW):
app/Actions/Deployment/ExecuteHealthCheck.php - Execute individual health check
app/Actions/Deployment/ExecuteRollback.php - Execute rollback for single deployment
app/Actions/Deployment/CreateDeploymentSnapshot.php - Capture deployment state before changes
app/Actions/Deployment/RestoreDeploymentSnapshot.php - Restore previous deployment state
Database Migrations:
database/migrations/2025_01_XX_create_health_check_configs_table.php
database/migrations/2025_01_XX_create_health_check_results_table.php
database/migrations/2025_01_XX_create_deployment_histories_table.php
database/migrations/2025_01_XX_create_deployment_snapshots_table.php
Tests:
tests/Unit/Enterprise/Deployment/HealthCheckServiceTest.php
tests/Unit/Enterprise/Deployment/RollbackOrchestratorTest.php
tests/Feature/Enterprise/Deployment/AutomaticRollbackTest.php
tests/Feature/Enterprise/Deployment/HealthCheckExecutionTest.php
Database Schema
health_check_configs table:
<?php
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
return new class extends Migration
{
public function up(): void
{
Schema::create('health_check_configs', function (Blueprint $table) {
$table->id();
$table->foreignId('application_id')->constrained()->cascadeOnDelete();
$table->string('name')->nullable(); // User-defined name for this health check
$table->enum('type', ['http', 'tcp', 'script', 'docker_container'])->default('http');
// HTTP health check configuration
$table->string('http_endpoint')->nullable(); // e.g., /health, /api/status
$table->string('http_method')->default('GET'); // GET, POST, HEAD
$table->json('http_expected_status_codes')->nullable(); // [200, 204]
$table->integer('http_timeout_seconds')->default(10);
$table->text('http_expected_body_contains')->nullable(); // Optional body validation
$table->json('http_headers')->nullable(); // Custom headers
// TCP health check configuration
$table->integer('tcp_port')->nullable();
$table->integer('tcp_timeout_seconds')->default(5);
// Script health check configuration
$table->text('script_command')->nullable(); // Shell command to execute
$table->integer('script_timeout_seconds')->default(30);
$table->integer('script_expected_exit_code')->default(0);
// Docker container health check
$table->boolean('use_docker_health_status')->default(false);
// Health check policy
$table->integer('success_threshold')->default(1); // Consecutive successes needed
$table->integer('failure_threshold')->default(3); // Consecutive failures before rollback
$table->integer('check_interval_seconds')->default(10); // Time between checks
$table->integer('initial_delay_seconds')->default(30); // Wait before first check
$table->integer('monitoring_duration_seconds')->default(300); // How long to monitor (5 min default)
// Rollback policy
$table->boolean('auto_rollback_enabled')->default(true);
$table->boolean('notify_on_failure')->default(true);
$table->json('notification_channels')->nullable(); // ['email', 'slack', 'discord']
$table->boolean('is_active')->default(true);
$table->timestamps();
$table->index(['application_id', 'is_active']);
});
}
public function down(): void
{
Schema::dropIfExists('health_check_configs');
}
};
health_check_results table:
<?php
use Illuminate\Database\Migrations\Migration;
use Illuminate\Database\Schema\Blueprint;
use Illuminate\Support\Facades\Schema;
return new class extends Migration
{
public function up(): void
{
Schema::create('health_check_results', function (Blueprint $table) {
$table->id();
$table->foreignId('health_check_config_id')->constrained()->cascadeOnDelete();
$table->foreignId('deployment_id')->nullable()->constrained('application_deployments')->nullOnDelete();
$table->foreignId('application_id')->constrained()->cascadeOnDelete();
$table->foreignId('server_id')->nullable()->constrained()->nullOnDelete();
$table->enum('status', ['success', 'failure', 'timeout', 'error'])->index();
Task: Implement automatic rollback mechanism on health check failures
Description
Implement a comprehensive automatic rollback system that monitors application health during and after deployments, automatically reverting to the previous stable version when health checks fail. This critical safety mechanism protects production environments from broken deployments by continuously validating application health using configurable health check strategies (HTTP endpoints, TCP connections, custom scripts) and orchestrating seamless rollbacks when issues are detected.
Modern deployment strategies (rolling updates, blue-green, canary) require intelligent health validation to ensure applications remain available. This task builds the foundation for production-grade deployment safety by implementing:
Integration with Existing Coolify Architecture:
ApplicationDeploymentJobwith health check validation phasesServerSSH execution infrastructure viaExecuteRemoteCommandtraitIntegration with Enterprise Deployment System:
EnhancedDeploymentService(Task 32) for strategy-aware rollbacksCapacityManager(Task 26) for resource state restorationWhy this task is critical: Automatic rollback is the safety net that prevents catastrophic production failures. Without health-based rollbacks, broken deployments can take applications offline for extended periods while administrators manually diagnose and fix issues. Automated rollback restores service within seconds, minimizing downtime and customer impact. This transforms deployments from high-risk operations requiring human supervision into reliable automated processes that self-correct when problems occur.
Acceptance Criteria
Core Functionality
Rollback Orchestration
Deployment Strategy Integration
Configuration & Policy
Persistence & Reporting
Notifications & Alerts
Error Handling & Edge Cases
Technical Details
File Paths
Service Layer (NEW):
app/Services/Enterprise/Deployment/HealthCheckService.php- Health check execution and validationapp/Services/Enterprise/Deployment/RollbackOrchestrator.php- Rollback coordination across strategiesapp/Contracts/HealthCheckServiceInterface.php- Health check service interfaceapp/Contracts/RollbackOrchestratorInterface.php- Rollback orchestrator interfaceModels (NEW):
app/Models/Enterprise/HealthCheckConfig.php- Health check configuration per applicationapp/Models/Enterprise/HealthCheckResult.php- Health check execution resultsapp/Models/Enterprise/DeploymentHistory.php- Deployment and rollback audit trailapp/Models/Enterprise/DeploymentSnapshot.php- State snapshots for rollback restorationJobs (ENHANCE EXISTING):
app/Jobs/ApplicationDeploymentJob.php- Enhance with health check validation phaseapp/Jobs/HealthCheckMonitorJob.php- NEW: Scheduled health check monitoring post-deploymentActions (NEW):
app/Actions/Deployment/ExecuteHealthCheck.php- Execute individual health checkapp/Actions/Deployment/ExecuteRollback.php- Execute rollback for single deploymentapp/Actions/Deployment/CreateDeploymentSnapshot.php- Capture deployment state before changesapp/Actions/Deployment/RestoreDeploymentSnapshot.php- Restore previous deployment stateDatabase Migrations:
database/migrations/2025_01_XX_create_health_check_configs_table.phpdatabase/migrations/2025_01_XX_create_health_check_results_table.phpdatabase/migrations/2025_01_XX_create_deployment_histories_table.phpdatabase/migrations/2025_01_XX_create_deployment_snapshots_table.phpTests:
tests/Unit/Enterprise/Deployment/HealthCheckServiceTest.phptests/Unit/Enterprise/Deployment/RollbackOrchestratorTest.phptests/Feature/Enterprise/Deployment/AutomaticRollbackTest.phptests/Feature/Enterprise/Deployment/HealthCheckExecutionTest.phpDatabase Schema
health_check_configs table:
health_check_results table: