Skip to content

Provider validation: add test to verify renew_work_item_lock does NOT extend lock for Terminal/Missing states #36

@affandar

Description

@affandar

Problem

The renew_work_item_lock provider contract specifies that:

  • Ok(ExecutionState::Running) → lock was extended, worker should continue
  • Ok(ExecutionState::Terminal { status }) → lock was NOT extended, worker should trigger cancellation
  • Ok(ExecutionState::Missing) → lock was NOT extended, worker should trigger cancellation
  • Err(ProviderError) → only for invalid/expired lock tokens

However, the current provider validation tests only verify that the correct ExecutionState is returned — they do not verify that the lock was actually NOT extended for Terminal/Missing states.

Current Gap

In duroxide-pg-opt, all 77 provider validation tests were passing even when our renew_work_item_lock implementation was incorrectly extending the lock for all states (Running, Terminal, and Missing). We only discovered the bug through code review.

The existing tests like test_renew_returns_terminal_when_orchestration_completed verify the return value but don't check that the lock duration wasn't extended.

Proposed Solution

Add a new validation test (or extend existing ones) that:

  1. Fetches a work item with a short lock timeout (e.g., 2 seconds)
  2. Marks the orchestration as terminal (Completed/Failed)
  3. Calls renew_work_item_lock - should return Terminal without extending
  4. Waits for original lock timeout to elapse
  5. Attempts to fetch the work item again - should succeed (proving lock was NOT extended)
  6. Verify the lock would have been extended if this was a Running state (control test)

Example pseudocode:

#[tokio::test]
async fn test_renew_terminal_does_not_extend_lock() {
    let factory = TestProviderFactory::new();
    let provider = factory.create_provider().await;
    let lock_timeout = Duration::from_secs(2);
    
    // Setup: Create orchestration, enqueue activity, fetch work item
    let (work_item, lock_token, _) = provider
        .fetch_work_item(lock_timeout, Duration::ZERO)
        .await?
        .unwrap();
    
    // Mark orchestration as terminal
    complete_orchestration(&provider, &work_item.instance).await;
    
    // Call renew - should return Terminal but NOT extend lock
    let state = provider
        .renew_work_item_lock(&lock_token, Duration::from_secs(30))
        .await?;
    assert!(matches!(state, ExecutionState::Terminal { .. }));
    
    // Wait for ORIGINAL lock timeout (2s) - not the 30s renewal
    tokio::time::sleep(lock_timeout + Duration::from_millis(100)).await;
    
    // If lock was NOT extended, another fetch should succeed now
    let result = provider
        .fetch_work_item(lock_timeout, Duration::ZERO)
        .await?;
    assert!(result.is_some(), "Lock should have expired at original timeout, not been extended");
}

Impact

Without this test, provider implementations can incorrectly extend locks for Terminal/Missing states, which could cause:

  • Activities running after orchestration cancellation (wasting resources)
  • Delayed cleanup of work items for terminated orchestrations
  • Subtle race conditions in cancellation flows

Metadata

Metadata

Assignees

No one assigned

    Labels

    duroxide-pgIssues reported by duroxide-pg-opt provider

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions