Skip to content

feat: resilient background job retry & monitoring (bounty #130)#936

Open
alexanderxfgl-bit wants to merge 1 commit intorohitdash08:mainfrom
alexanderxfgl-bit:bounty/130-final
Open

feat: resilient background job retry & monitoring (bounty #130)#936
alexanderxfgl-bit wants to merge 1 commit intorohitdash08:mainfrom
alexanderxfgl-bit:bounty/130-final

Conversation

@alexanderxfgl-bit
Copy link
Copy Markdown

Bounty Claim: $250 — Issue #130

Implementation

Production-ready background job queue with retry, exponential backoff, and dead-letter queue support.

Backend Changes

Model (app/models.py):

  • JobStatus enum: pending → running → succeeded/failed → retrying → dead
  • BackgroundJob model with: task_type, payload (JSON), status, attempts, max_attempts, last_error, result, next_retry_at, scheduled_for

Service (app/services/jobs.py):

  • enqueue_job() — create new jobs with optional scheduling
  • mark_running/succeeded/failed() — lifecycle transitions
  • _compute_backoff() — exponential backoff (30s base, 2x multiplier, 1hr cap)
  • get_jobs_due_for_retry() / get_pending_jobs() — query helpers for workers
  • mark_pending() — manual retry of dead/failed jobs
  • Dead-letter queue: jobs exceeding max_attempts move to DEAD status

Routes (app/routes/jobs.py):

  • POST /jobs — enqueue job with validation
  • GET /jobs — list with pagination + status/task_type filters
  • GET /jobs/<id> — get job details
  • POST /jobs/<id>/retry — manual retry of failed/dead jobs
  • DELETE /jobs/<id> — remove job
  • GET /jobs/dead — dead-letter queue listing
  • GET /jobs/stats — job statistics (total/pending/running/succeeded/failed/dead)

Schema (app/db/schema.sql):

  • background_jobs table with job_status enum type
  • Optimized partial indexes for retry and pending queries

Tests

11 tests covering:

  • CRUD operations (create, list, get, delete)
  • Input validation (missing task_type, invalid types)
  • Scheduled job creation
  • Failed job retry workflow
  • Retry validation (pending jobs cannot be retried)
  • Job statistics
  • Filtering by task_type
  • Dead-letter queue listing

Closes #130

…#130)

Implements a production-ready background job queue with:
- BackgroundJob model with status tracking (pending/running/succeeded/failed/retrying/dead)
- Exponential backoff retry logic (30s base, 2x multiplier, 1hr cap)
- Dead-letter queue for permanently failed jobs
- REST API: POST/GET/DELETE /jobs, GET /jobs/stats, GET /jobs/dead, POST /jobs/<id>/retry
- Scheduled job execution via scheduled_for field
- PostgreSQL schema with optimized indexes for retry/pending queries
- Comprehensive test suite (11 tests covering CRUD, retry, DLQ, validation)

Closes rohitdash08#130
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Resilient background job retry & monitoring

1 participant