-
Notifications
You must be signed in to change notification settings - Fork 2
Description
Problem
The renew_work_item_lock provider contract specifies that:
Ok(ExecutionState::Running)→ lock was extended, worker should continueOk(ExecutionState::Terminal { status })→ lock was NOT extended, worker should trigger cancellationOk(ExecutionState::Missing)→ lock was NOT extended, worker should trigger cancellationErr(ProviderError)→ only for invalid/expired lock tokens
However, the current provider validation tests only verify that the correct ExecutionState is returned — they do not verify that the lock was actually NOT extended for Terminal/Missing states.
Current Gap
In duroxide-pg-opt, all 77 provider validation tests were passing even when our renew_work_item_lock implementation was incorrectly extending the lock for all states (Running, Terminal, and Missing). We only discovered the bug through code review.
The existing tests like test_renew_returns_terminal_when_orchestration_completed verify the return value but don't check that the lock duration wasn't extended.
Proposed Solution
Add a new validation test (or extend existing ones) that:
- Fetches a work item with a short lock timeout (e.g., 2 seconds)
- Marks the orchestration as terminal (Completed/Failed)
- Calls
renew_work_item_lock- should returnTerminalwithout extending - Waits for original lock timeout to elapse
- Attempts to fetch the work item again - should succeed (proving lock was NOT extended)
- Verify the lock would have been extended if this was a Running state (control test)
Example pseudocode:
#[tokio::test]
async fn test_renew_terminal_does_not_extend_lock() {
let factory = TestProviderFactory::new();
let provider = factory.create_provider().await;
let lock_timeout = Duration::from_secs(2);
// Setup: Create orchestration, enqueue activity, fetch work item
let (work_item, lock_token, _) = provider
.fetch_work_item(lock_timeout, Duration::ZERO)
.await?
.unwrap();
// Mark orchestration as terminal
complete_orchestration(&provider, &work_item.instance).await;
// Call renew - should return Terminal but NOT extend lock
let state = provider
.renew_work_item_lock(&lock_token, Duration::from_secs(30))
.await?;
assert!(matches!(state, ExecutionState::Terminal { .. }));
// Wait for ORIGINAL lock timeout (2s) - not the 30s renewal
tokio::time::sleep(lock_timeout + Duration::from_millis(100)).await;
// If lock was NOT extended, another fetch should succeed now
let result = provider
.fetch_work_item(lock_timeout, Duration::ZERO)
.await?;
assert!(result.is_some(), "Lock should have expired at original timeout, not been extended");
}Impact
Without this test, provider implementations can incorrectly extend locks for Terminal/Missing states, which could cause:
- Activities running after orchestration cancellation (wasting resources)
- Delayed cleanup of work items for terminated orchestrations
- Subtle race conditions in cancellation flows