Add periodic cleanup for orphaned Celery pidbox queues#3085
Add periodic cleanup for orphaned Celery pidbox queues#3085majamassarini wants to merge 1 commit intopackit:mainfrom
Conversation
98dfdbf to
69a55e2
Compare
There was a problem hiding this comment.
Code Review
This pull request implements a maintenance task to clean up orphaned Celery pidbox reply queues in Redis by assigning a TTL to keys without one. The changes include a centralized Redis configuration utility, a new Prometheus metric for monitoring total Redis keys, and corresponding unit tests. Review feedback identifies a typo in a Redis environment variable and suggests using Redis pipelines to optimize the cleanup process by batching network operations.
dc8a923 to
14d3b83
Compare
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 48s |
Problem: Celery workers create pidbox (control) reply queues for worker management commands (inspect, ping, stats, etc.). These queues accumulate when workers crash or restart improperly, leading to: - 1,693+ orphaned *.reply.celery.pidbox keys in production - Keys with no TTL (TTL = -1) that persist indefinitely Root cause: Celery's Redis transport does not provide a native way to set TTL on pidbox reply queues when they're created. These are internal implementation details of Celery's broadcast/control mechanism, and there's no configuration option to automatically expire them. Solution: Heartbeat cleanup task Since we cannot tell Celery to natively set TTL on pidbox messages, we implement a periodic heartbeat task that: - Runs nightly at 12:30 AM via Celery beat - Scans for *.reply.celery.pidbox keys without TTL - Sets 1-hour expiration on orphaned queues - Tracks total Redis keys via Prometheus for monitoring Related to: packit/deployment#701 Should fix: packit#2983 Assisted-By: Claude Sonnet 4.5 <noreply@anthropic.com> Assisted-By: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
14d3b83 to
d63d7dc
Compare
|
Build succeeded. ✔️ pre-commit SUCCESS in 1m 50s |
Problem: Celery workers create pidbox (control) reply queues for worker management commands (inspect, ping, stats, etc.). These queues accumulate when workers crash or restart improperly, leading to:
Root cause: Celery's Redis transport does not provide a native way to set TTL on pidbox reply queues when they're created. These are internal implementation details of Celery's broadcast/control mechanism, and there's no configuration option to automatically expire them.
Solution: Heartbeat cleanup task Since we cannot tell Celery to natively set TTL on pidbox messages, we implement a periodic heartbeat task that:
Related to: packit/deployment#701
Should fix: #2983