Skip to content

Conversation

@torosent
Copy link
Member

@torosent torosent commented Dec 2, 2025

Summary

This PR fixes an issue where successful activities and orchestrations were incorrectly reporting HTTP 500 response codes in Application Insights telemetry when Distributed Tracing v2 was enabled.

Fixes: Azure/azure-functions-durable-extension#3199

Problem

When an exception is thrown within an activity function but caught and handled via try-catch (for example, logging it then continuing), the System.Diagnostics.Activity span's status was being left in an Error state. The telemetry module (WebJobsTelemetryModule/DurableTelemetryModule) translates ActivityStatusCode.Error into HTTP 500, resulting in misleading telemetry where:

  • Response Code = 500
  • Succeeded = True

This was confusing for users monitoring their applications, as the telemetry falsely indicated failures for operations that actually succeeded.

Root Cause

The dispatchers in DurableTask.Core were setting ActivityStatusCode.Error when exceptions occurred (which is correct), but they never reset the status to OK when the activity/orchestration completed successfully. This meant any intermediate error status from:

  1. Caught and handled exceptions in user code
  2. Custom instrumentation that sets error status
  3. Resilience pipelines (retry logic, etc.)

...would persist to the final telemetry even when the overall operation succeeded.

Solution

The fix ensures that all dispatcher paths explicitly set ActivityStatusCode.OK for successful completions, overriding any stale error status from intermediate exception handling.

Changes

  1. TaskActivityDispatcher.cs: Before stopping the trace activity, explicitly set ActivityStatusCode.OK when the response is a TaskCompletedEvent (successful completion).

  2. TaskOrchestrationDispatcher.cs: Added a centralized SetOrchestrationActivityStatus() helper that sets the appropriate status based on orchestration outcome:

    • Completed / ContinuedAsNewActivityStatusCode.OK
    • Failed / TerminatedActivityStatusCode.Error
  3. TraceHelper.cs: Updated EndActivitiesForProcessingEntityInvocation() to set explicit OK/Error status for entity operation spans.

Why This Is The Right Solution

  1. Follows OpenTelemetry semantics: The final span status should reflect the actual outcome of the operation, not intermediate states. Setting OK on success is the correct behavior per OpenTelemetry spec.

  2. Minimal and targeted: The fix only modifies the final status assignment at completion time, preserving all existing error-path behavior.

  3. Preserves true failures: Activities/orchestrations that actually fail (unhandled exceptions, explicit failures) still correctly report Error status because the success status reset only happens for TaskCompletedEvent / OrchestrationStatus.Completed.

  4. Covers all dispatcher types: The fix addresses activities, orchestrations, and entities - all paths that generate telemetry spans.

Testing

Added regression tests that simulate the exact issue scenario:

  • ActivityAndOrchestrationSpansResetStatuses: Integration test that:

    1. Creates an orchestration and activity that deliberately set Error status during execution
    2. Verifies both complete successfully
    3. Asserts that the captured trace spans have Status == Ok
  • TraceHelperTests: Unit tests for entity invocation status handling:

    • EndActivitiesForEntityInvocationResetsSuccessfulStatus: Verifies OK status on success
    • EndActivitiesForEntityInvocationMarksFailures: Verifies Error status on failure

All 94 tests pass.

Checklist

  • Changes are covered by unit/integration tests
  • Code follows existing patterns in the codebase
  • No breaking changes to public APIs
  • Tested locally with dotnet test

When an exception is caught and handled within user code (not causing
the activity/orchestration to fail), the Activity span status was being
left in an Error state. This caused Application Insights telemetry to
incorrectly report Response Code 500 even though Succeeded was True.

The fix ensures dispatchers explicitly set ActivityStatusCode.OK for
successful completions, overriding any stale error status from
intermediate exception handling or custom instrumentation.

Changes:
- TaskActivityDispatcher: Set OK status before stopping span on success
- TaskOrchestrationDispatcher: Add centralized status logic for
  orchestration completions (OK for Completed/ContinuedAsNew,
  Error for Failed/Terminated)
- TraceHelper: Entity invocation spans now get explicit OK/Error status

Fixes Azure/azure-functions-durable-extension#3199
@torosent torosent marked this pull request as ready for review December 3, 2025 00:50
@torosent torosent requested a review from bachuv December 3, 2025 00:50
Copilot AI review requested due to automatic review settings December 4, 2025 20:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a telemetry issue where successful activities and orchestrations incorrectly reported HTTP 500 response codes in Application Insights when Distributed Tracing v2 was enabled and intermediate exceptions were caught and handled. The fix ensures that ActivityStatusCode.OK is explicitly set for successful completions, overriding any stale error status from intermediate exception handling.

Key Changes:

  • Added explicit status setting to ActivityStatusCode.OK for successful completions in all dispatcher paths
  • Introduced centralized status handling logic for orchestrations via SetOrchestrationActivityStatus() helper
  • Added comprehensive tests to verify status reset behavior for activities, orchestrations, and entities

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/DurableTask.Core.Tests/DispatcherMiddlewareTests.cs Adds integration test ActivityAndOrchestrationSpansResetStatuses with helper classes to verify that activity/orchestration spans correctly report OK status when they succeed despite setting Error status during execution
src/DurableTask.Core/Tracing/TraceHelper.cs Updates entity invocation handling to explicitly set ActivityStatusCode.OK for successful operations and ActivityStatusCode.Error for failures
src/DurableTask.Core/TaskOrchestrationDispatcher.cs Introduces SetOrchestrationActivityStatus() helper that sets appropriate status based on orchestration outcome (OK for Completed/ContinuedAsNew, Error for Failed/Terminated); removes duplicate status-setting code
src/DurableTask.Core/TaskActivityDispatcher.cs Adds explicit ActivityStatusCode.OK status setting for TaskCompletedEvent to ensure successful executions override prior error statuses from custom instrumentation
Test/DurableTask.Core.Tests/TraceHelperTests.cs New unit test file with tests for entity invocation status handling: EndActivitiesForEntityInvocationResetsSuccessfulStatus and EndActivitiesForEntityInvocationMarksFailures

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants