Add configurable delay for sandbox cleanup after automation runs complete#33
Add configurable delay for sandbox cleanup after automation runs complete#33jpshackelford wants to merge 3 commits into
Conversation
…lete This implements issue #31 by adding a SANDBOX_CLEANUP_DELAY_MINS setting that allows sandboxes to remain available for inspection after automation runs complete. Changes: - Add sandbox_cleanup_delay_mins config setting (default: 60 minutes) - Add cleanup_at column to AutomationRun model for scheduling cleanup - Update router.py complete_run to set cleanup_at instead of immediate cleanup - Update watchdog.py to use delayed cleanup via cleanup_at - Add cleanup_pending_sandboxes function to process cleanup after delay - Create database migration for cleanup_at column - Update AutomationRunResponse schema to include cleanup_at field - Add comprehensive tests for delayed cleanup functionality When sandbox_cleanup_delay_mins is 0, immediate cleanup occurs (legacy behavior). When > 0, cleanup is scheduled for that many minutes after run completion. The SDK's OpenHandsCloudWorkspace does not directly control sandbox deletion - it calls the automation service's callback endpoint, and the automation service decides whether to cleanup based on the run's keep_alive flag and cleanup delay. Closes #31
|
🚀 Deploy Preview PR Created/Updated A deploy preview has been created/updated for this PR. Deploy PR: https://github.com/OpenHands/deploy/pull/3638 Once the deploy PR's CI passes, the automation service will be deployed to the feature environment. |
- Fix unbound api_key variable in watchdog.py (can't cleanup without API key) - Add assertions for sandbox_id and completed_at type narrowing - Fix test patch targets: use automation.config.get_settings instead of automation.router.get_settings
Updated implementation ready to merge 🟢This PR was written against an older repo structure ( Changes ported to current
|
| File | Change |
|---|---|
openhands/automation/config.py |
sandbox_cleanup_delay_mins: int = 60 on ServiceSettings |
openhands/automation/models.py |
cleanup_at column on AutomationRun |
openhands/automation/router.py |
complete_run sets cleanup_at for deferred deletion (or immediate if delay=0) |
openhands/automation/watchdog.py |
_compute_cleanup_at, cleanup_pending_sandboxes, updated _verify_and_mark_run and loop |
migrations/versions/009_add_cleanup_at.py |
New migration (current main is at 008) |
Test results
684 passed, 0 failed, 32 skipped
(200 errors = Docker/testcontainers not available in CI sandbox — same as pre-existing)
To apply
Someone with push access to this repo (e.g. @robamesbury or @rohitMalhotra123) can apply the patch:
git checkout -b feat/sandbox-cleanup-delay main
git apply automation-cleanup-delay.patch
git push origin feat/sandbox-cleanup-delayOr ping @rohitMalhotra123 who greenlit this PR — ready to merge once pushed.
Linear: APP-2287
This comment was created by an AI agent (OpenHands) on behalf of Rajiv Shah.
Resolve package relocation conflicts and port delayed sandbox cleanup to the new backend-based watchdog flow. Co-authored-by: openhands <openhands@all-hands.dev>
Summary
This PR implements configurable sandbox cleanup delay as requested in issue #31. Sandboxes can now remain available for inspection after automation runs complete, making it easier to debug automation failures.
Changes
Configuration
AUTOMATION_SANDBOX_CLEANUP_DELAY_MINSenvironment variable (default: 60 minutes)Database
cleanup_atcolumn toautomation_runstable (nullable DateTime with index)003_add_cleanup_at.pyCode Changes
automation/config.pysandbox_cleanup_delay_minssettingautomation/models.pycleanup_atfield toAutomationRunmodelautomation/router.pycomplete_runto setcleanup_atinstead of immediate cleanup when delay > 0automation/watchdog.py_compute_cleanup_athelper function_verify_and_mark_runto setcleanup_atinstead of immediate cleanupcleanup_pending_sandboxesfunction that processes runs past their cleanup deadlinewatchdog_loopto call cleanup scanner each intervalautomation/schemas.pycleanup_atfield toAutomationRunResponseTests
_compute_cleanup_atcleanup_pending_sandboxesfunctioncomplete_runendpoint cleanup behaviorSDK Behavior Note
Regarding the question in the issue comments:
Answer: The SDK's
OpenHandsCloudWorkspacedoes not directly delete the sandbox. When the context manager exits, it calls the automation service's callback endpoint (/runs/{run_id}/complete), and the automation service decides whether to cleanup based on:keep_aliveflag (set on the AutomationRun model)sandbox_cleanup_delay_minssettingSo the SDK's
keep_aliveparameter inOpenHandsCloudWorkspaceis not directly related to sandbox deletion control - that's handled entirely by the automation service.Example Usage
Testing
The unit tests pass with the new behavior. Integration tests require Docker and were not run in this environment.
Closes #31
This PR was created by an AI assistant (OpenHands) on behalf of a user.
@jpshackelford can click here to continue refining the PR