Skip to content

Add Aspire startup OTEL profiling harness#16775

Merged
davidfowl merged 5 commits intomainfrom
otel-startup-tracing
May 8, 2026
Merged

Add Aspire startup OTEL profiling harness#16775
davidfowl merged 5 commits intomainfrom
otel-startup-tracing

Conversation

@davidfowl
Copy link
Copy Markdown
Contributor

@davidfowl davidfowl commented May 5, 2026

Description

Adds opt-in startup/profiling telemetry for Aspire using dedicated OTEL ActivitySources, keeping profiling diagnostics separate from reported/customer telemetry.

This change:

  • Adds dedicated CLI and Hosting ProfilingTelemetry helpers for high-cardinality startup/profiling spans.
  • Propagates profiling session and W3C trace context across CLI, AppHost, DCP, guest AppHosts, npm/guest runtime commands, aspire-managed processes, and AppHost server helper processes.
  • Keeps profiling telemetry out of AspireCliTelemetry reported telemetry.
  • Supports profiling outside run mode so publish/deploy-style AppHost operations can be profiled too.
  • Uses the AppHost assembly informational version for the OTEL service version.
  • Adds span tags/events for process IDs, exit codes, command metadata, backchannel connection points, resource waits, DCP object creation, DCP resource observations, Kubernetes API retries/timeouts, and emitted MSBuild binlogs.
  • Documents profiling env vars and preserves startup-named env var propagation for DCP compatibility.
  • Adds a startup OTEL harness and shared C# validator:
    • eng/scripts/verify-startup-otel.sh
    • eng/scripts/verify-startup-otel.ps1
    • tools/StartupOtelValidator/ValidateStartupOtelExport.cs
  • Updates the startup performance skill with collection and validation guidance.

Related DCP dependency: microsoft/dcp#136

Reviewer validation

To validate from a PR build:

eng/scripts/get-aspire-cli-pr.sh 16775

Then run the startup OTEL harness from this repo:

./eng/scripts/verify-startup-otel.sh --collect-dotnet-binlogs

For a local already-built tree, this faster form can be used:

./eng/scripts/verify-startup-otel.sh --skip-build --collect-dotnet-binlogs

The harness starts an Aspire app with profiling enabled, exports dashboard traces, and validates that one profiling session contains correlated CLI, AppHost/Hosting, DCP resource creation/observation, resource wait, and binlog metadata spans. Use --require-dcp-spans when validating with a DCP build that emits dcp.startup spans.

Latest local successful capture after rebasing:

  • Summary: artifacts/tmp/startup-otel-harness/20260505-094204/summary.json
  • Profiling session: 3e85dcfdc5d2432497f9225e632e601a
  • Trace ID: 64a102f0dbffb22b73a91daed78d1cfa
  • Correlated spans: 116
  • Binlogs collected: 4

Validation

  • dotnet build src/Aspire.Cli/Aspire.Cli.csproj /p:SkipNativeBuild=true /p:ContinuousIntegrationBuild=true
  • dotnet build src/Aspire.Cli/Aspire.Cli.csproj /p:SkipNativeBuild=true
  • dotnet build src/Aspire.Hosting/Aspire.Hosting.csproj /p:SkipNativeBuild=true
  • dotnet test --project tests/Aspire.Cli.Tests/Aspire.Cli.Tests.csproj --no-launch-profile -- --filter-class "*.ProfilingTelemetryTests" --filter-class "*.ProfilingTelemetryContextTests" --filter-class "*.GuestRuntimeTests" --filter-class "*.NpmRunnerTests" --filter-class "*.DotNetCliRunnerTests" --filter-class "*.AppHostServerSessionTests" --filter-class "*.GuestAppHostProjectTests" --filter-not-trait "quarantined=true" --filter-not-trait "outerloop=true"
  • ./eng/scripts/verify-startup-otel.sh --skip-build --collect-dotnet-binlogs
  • Previous CI failures were unrelated Docker/kind CLI E2E infrastructure hangs and cleared on rerun; CI for the feedback-fix commit is running after push.

Fixes # (issue)

Checklist

  • Is this feature complete?
    • Yes. Ready to ship.
    • No. Follow-up changes expected.
  • Are you including unit tests for the changes and scenario tests if relevant?
    • Yes
    • No
  • Did you add public API?
    • Yes
      • If yes, did you have an API Review for it?
        • Yes
        • No
      • Did you add <remarks /> and <code /> elements on your triple slash comments?
        • Yes
        • No
    • No
  • Does the change make any security assumptions or guarantees?
    • Yes
      • If yes, have you done a threat model and had a security review?
        • Yes
        • No
    • No
  • Does the change require an update in our Aspire docs?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 5, 2026

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16775

Or

  • Run remotely in PowerShell:
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16775"

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an opt-in OpenTelemetry-based startup profiling harness for Aspire and introduces dedicated “profiling-only” telemetry plumbing across the CLI and Hosting layers, with a shared validator to confirm span correlation (CLI ↔ AppHost ↔ DCP) from dashboard exports.

Changes:

  • Adds a new PowerShell-free startup OTEL harness (verify-startup-otel.sh) plus a shared C# export validator (StartupOtelValidator), and updates the PowerShell harness to match.
  • Introduces dedicated ProfilingTelemetry/ProfilingTelemetryContext types and profiling ActivitySources for CLI and Hosting to keep profiling data separate from reported telemetry.
  • Expands profiling coverage to CLI-spawned child processes (dotnet, npm, guest runtimes, AppHost server sessions) and adds optional MSBuild binlog + dotnet-trace capture.
Show a summary per file
File Description
tools/StartupOtelValidator/ValidateStartupOtelExport.cs Shared validator that reads dashboard OTLP JSON export and enforces correlated profiling span expectations.
eng/scripts/verify-startup-otel.sh New bash startup harness: starts dashboard + AppHost, exports telemetry, runs validator, optional trace/binlog capture.
eng/scripts/verify-startup-otel.ps1 PowerShell harness updated to use the shared validator and align with the new OTEL workflow.
.github/skills/startup-perf/SKILL.md Updates skill docs to describe the OTEL harness workflow and artifacts.
src/Shared/KnownConfigNames.cs Adds new profiling env var names and preserves legacy startup-profiling keys for compatibility.
src/Aspire.Hosting/DistributedApplicationBuilder.cs Conditionally wires a profiling-only OTLP exporter pipeline when profiling is enabled.
src/Aspire.Hosting/Diagnostics/ProfilingTelemetry.cs New Hosting profiling ActivitySource, span names/tags/events, and helpers for DCP/resource lifecycle profiling.
src/Aspire.Hosting/Orchestrator/ApplicationOrchestrator.cs Adds profiling span around resource “before start wait” phase with error tagging.
src/Aspire.Hosting/ApplicationModel/ResourceNotificationService.cs Adds profiling spans/events for resource dependency waits, plus richer wait-condition tagging.
src/Aspire.Hosting/Dcp/DcpExecutor.cs Adds profiling spans across DCP orchestration phases and error tagging.
src/Aspire.Hosting/Dcp/DcpResourceWatcher.cs Emits profiling span for “DCP resource observed” events (from watch/reconcile).
src/Aspire.Hosting/Dcp/KubernetesService.cs Adds profiling spans + retry/timeout events around Kubernetes API calls and kubeconfig init.
src/Aspire.Hosting/Aspire.Hosting.csproj Adds OpenTelemetry hosting + OTLP exporter package references for profiling pipeline support.
src/Aspire.Cli/Telemetry/TelemetryServiceCollectionExtensions.cs Registers CLI ProfilingTelemetry in DI.
src/Aspire.Cli/Telemetry/TelemetryManager.cs Adds profiling ActivitySource to the DEBUG diagnostic provider only when profiling is enabled.
src/Aspire.Cli/Telemetry/ProfilingTelemetryContext.cs New cross-process profiling correlation context (session id + W3C context) with legacy key bridging.
src/Aspire.Cli/Telemetry/ProfilingTelemetry.cs New CLI profiling ActivitySource with span/tag/event helpers for CLI orchestration + child process lifetimes.
src/Aspire.Cli/Telemetry/AspireCliTelemetry.cs Adds overloads to start reported/diagnostic activities with explicit parent context.
src/Aspire.Cli/Commands/RunCommand.cs Wraps key aspire run phases in profiling spans and propagates profiling context to AppHost env.
src/Aspire.Cli/Commands/AppHostLauncher.cs Adds profiling around detached child spawn/backchannel wait and propagates profiling env vars.
src/Aspire.Cli/Backchannel/AppHostCliBackchannel.cs Adds profiling spans/events for backchannel connect and dashboard URL RPC.
src/Aspire.Cli/Backchannel/AppHostAuxiliaryBackchannel.cs Adds optional profiling events for auxiliary backchannel dashboard URL resolution.
src/Aspire.Cli/Backchannel/AuxiliaryBackchannelMonitor.cs Passes profiling telemetry into auxiliary backchannel creation for consistent instrumentation.
src/Aspire.Cli/Backchannel/AppHostConnectionResolver.cs Adds optional profiling telemetry plumb-through when connecting via sockets.
src/Aspire.Cli/Projects/DotNetAppHostProject.cs Replaces diagnostic spans with profiling spans across run phases (isolated mode, certs, build, run lifetime).
src/Aspire.Cli/Projects/GuestAppHostProject.cs Plumbs ProfilingTelemetry into guest apphost workflows and guest runtime creation.
src/Aspire.Cli/Projects/GuestRuntime.cs Adds profiling for guest runtime init/dependency install/exec command launches with exit-code error tagging.
src/Aspire.Cli/Projects/AppHostServerSession.cs Adds profiling for AppHost server process lifetime, pid/exit code tagging, and DI plumbing.
src/Aspire.Cli/DotNet/IProcessExecution.cs Adds ProcessId to support correlating spawned process spans.
src/Aspire.Cli/DotNet/ProcessExecution.cs Implements ProcessId pass-through from underlying Process.
src/Aspire.Cli/DotNet/DotNetCliRunner.cs Adds profiling spans/events for dotnet invocations; optional MSBuild binlog injection and output counters.
src/Aspire.Cli/Npm/NpmRunner.cs Adds profiling span around npm command execution; switches to Directory.CreateTempSubdirectory.
src/Aspire.Cli/Program.cs Minor changes around main reported activity initialization.
tests/Aspire.Cli.Tests/Utils/CliTestHelper.cs Registers ProfilingTelemetry in CLI test DI container.
tests/Aspire.Cli.Tests/TestServices/TestProcessExecutionFactory.cs Updates test runner helpers for new DotNetCliRunner ctor and process id support.
tests/Aspire.Cli.Tests/Telemetry/ProfilingTelemetryTests.cs Adds unit tests for CLI profiling activity source behavior.
tests/Aspire.Cli.Tests/Telemetry/ProfilingTelemetryContextTests.cs Adds unit tests for profiling context creation, env propagation, and legacy key bridging.
tests/Aspire.Cli.Tests/Telemetry/AspireCliTelemetryTests.cs Adds tests for new explicit-parent-context activity creation overloads.
tests/Aspire.Cli.Tests/DotNet/DotNetCliRunnerTests.cs Adds tests validating binlog argument behavior.
tests/Aspire.Cli.Tests/Commands/RunCommandTests.cs Adds tests for detached child env propagation of profiling context.
tests/Aspire.Cli.Tests/Projects/GuestAppHostProjectTests.cs Updates guest project construction for required ProfilingTelemetry.

Copilot's findings

  • Files reviewed: 41/41 changed files
  • Comments generated: 5

Comment thread src/Aspire.Cli/Program.cs Outdated
Comment thread eng/scripts/verify-startup-otel.sh
Comment thread tests/Aspire.Cli.Tests/TestServices/TestProcessExecutionFactory.cs
Comment thread tests/Aspire.Cli.Tests/TestServices/TestProcessExecutionFactory.cs
Comment thread tests/Aspire.Cli.Tests/Projects/GuestAppHostProjectTests.cs Outdated
@davidfowl davidfowl force-pushed the otel-startup-tracing branch from 31b1f36 to 598a283 Compare May 7, 2026 02:49
Comment thread src/Aspire.Cli/Telemetry/ProfilingTelemetryContext.cs Outdated
Comment thread src/Shared/KnownConfigNames.cs
Comment thread src/Shared/KnownConfigNames.cs Outdated
davidfowl and others added 4 commits May 7, 2026 08:08
Add opt-in profiling telemetry for the Aspire CLI, AppHost, and DCP startup paths, including cross-process propagation, process/binlog span metadata, and startup verification scripts.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the threaded profiling context object and use Activity baggage plus process-boundary helpers to carry the profiling session through CLI spans and child processes. Seed the ambient Activity ancestor chain so sibling profiling spans reuse the same session without adding profiling tags to reported telemetry.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Explain why profiling session baggage is written to ambient Activity ancestors while profiling tags remain limited to profiling spans.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address review feedback by using the standard traceparent/tracestate names for profiling trace context propagation and removing incorrect legacy wording from the startup trace-context comments.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@davidfowl davidfowl force-pushed the otel-startup-tracing branch from e695c15 to 282e55a Compare May 7, 2026 15:08
Copy link
Copy Markdown
Contributor

@karolz-ms karolz-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed DCP-related parts, looks good.

I think you should consider ripping out AspireEventSource as part of this work and re-writing startup-perf skill to use OTEL instead of EventSource.

EDIT: just noticed you did, awesome 👍
You have my blessing to delete AspireEventSource too 😄

Comment thread src/Aspire.Hosting/Dcp/DcpResourceWatcher.cs
Explain why DCP resource observations are represented as short child activities from annotated trace context instead of events on the current activity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

🎬 CLI E2E Test Recordings — 77 recordings uploaded (commit de3edc3)

View all recordings
Status Test Recording
AddPackageInteractiveWhileAppHostRunningDetached ▶️ View Recording
AddPackageWhileAppHostRunningDetached ▶️ View Recording
AgentCommands_AllHelpOutputs_AreCorrect ▶️ View Recording
AgentInitCommand_DefaultSelection_InstallsSkillOnly ▶️ View Recording
AgentInitCommand_MigratesDeprecatedConfig ▶️ View Recording
AspireAddPackageVersionToDirectoryPackagesProps ▶️ View Recording
AspireInitSingleFileAppHostRunsViaDotnetRunAppHost ▶️ View Recording
AspireUpdateRemovesAppHostPackageVersionFromDirectoryPackagesProps ▶️ View Recording
Banner_DisplayedOnFirstRun ▶️ View Recording
Banner_DisplayedWithExplicitFlag ▶️ View Recording
Banner_NotDisplayedWithNoLogoFlag ▶️ View Recording
CertificatesClean_RemovesCertificates ▶️ View Recording
CertificatesTrust_WithNoCert_CreatesAndTrustsCertificate ▶️ View Recording
CertificatesTrust_WithUntrustedCert_TrustsCertificate ▶️ View Recording
ConfigSetGet_CreatesNestedJsonFormat ▶️ View Recording
CreateAndRunAspireStarterProject ▶️ View Recording
CreateAndRunAspireStarterProjectWithBundle ▶️ View Recording
CreateAndRunEmptyAppHostProject ▶️ View Recording
CreateAndRunJavaEmptyAppHostProject ▶️ View Recording
CreateAndRunJsReactProject ▶️ View Recording
CreateAndRunPythonReactProject ▶️ View Recording
CreateAndRunTypeScriptEmptyAppHostProject ▶️ View Recording
CreateAndRunTypeScriptStarterProject ▶️ View Recording
CreateJavaAppHostWithViteApp ▶️ View Recording
CreateTypeScriptAppHostWithViteApp_UsesConfiguredToolchain ▶️ View Recording
DashboardRunWithOtelTracesReturnsNoTraces ▶️ View Recording
DeployK8sBasicApiService ▶️ View Recording
DeployK8sWithGarnet ▶️ View Recording
DeployK8sWithMongoDB ▶️ View Recording
DeployK8sWithMySql ▶️ View Recording
DeployK8sWithPostgres ▶️ View Recording
DeployK8sWithRabbitMQ ▶️ View Recording
DeployK8sWithRedis ▶️ View Recording
DeployK8sWithSqlServer ▶️ View Recording
DeployK8sWithValkey ▶️ View Recording
DeployTypeScriptAppToKubernetes ▶️ View Recording
DescribeCommandResolvesReplicaNames ▶️ View Recording
DescribeCommandShowsRunningResources ▶️ View Recording
DetachFormatJsonProducesValidJson ▶️ View Recording
DetachFormatJsonProducesValidJsonWhenRestartingExistingInstance ▶️ View Recording
DoListStepsShowsPipelineSteps ▶️ View Recording
DocsCommand_RendersInteractiveMarkdownFromLocalSource ▶️ View Recording
DoctorCommand_DetectsDeprecatedAgentConfig ▶️ View Recording
DoctorCommand_TypeScriptAppHostReportsMissingConfiguredToolchain ▶️ View Recording
DoctorCommand_WithSslCertDir_ShowsTrusted ▶️ View Recording
DoctorCommand_WithoutSslCertDir_ShowsPartiallyTrusted ▶️ View Recording
GlobalMigration_HandlesCommentsAndTrailingCommas ▶️ View Recording
GlobalMigration_HandlesMalformedLegacyJson ▶️ View Recording
GlobalMigration_PreservesAllValueTypes ▶️ View Recording
GlobalMigration_SkipsWhenNewConfigExists ▶️ View Recording
GlobalSettings_MigratedFromLegacyFormat ▶️ View Recording
InitTypeScriptAppHost_AugmentsExistingViteRepoAtRoot ▶️ View Recording
InteractiveCSharpInitCreatesExpectedFiles ▶️ View Recording
InvalidAppHostPathWithComments_IsHealedOnRun ▶️ View Recording
LatestCliCanStartStableChannelAppHost ▶️ View Recording
LatestCliCanStartStableChannelTypeScriptAppHost ▶️ View Recording
LegacySettingsMigration_AdjustsRelativeAppHostPath ▶️ View Recording
LogsCommandShowsResourceLogs ▶️ View Recording
OtelLogsReturnsStructuredLogsFromStarterAppCore ▶️ View Recording
PsCommandListsRunningAppHost ▶️ View Recording
PsFormatJsonOutputsOnlyJsonToStdout ▶️ View Recording
PublishWithConfigureEnvFileUpdatesEnvOutput ▶️ View Recording
PublishWithDockerComposeServiceCallbackSucceeds ▶️ View Recording
PublishWithoutOutputPathUsesAppHostDirectoryDefault ▶️ View Recording
RestoreGeneratesSdkFiles ▶️ View Recording
RestoreGeneratesSdkFiles_WithConfiguredToolchain ▶️ View Recording
RestoreRefreshesGeneratedSdkAfterAddingIntegration ▶️ View Recording
RestoreSupportsConfigOnlyHelperPackageAndCrossPackageTypes ▶️ View Recording
RunFromParentDirectory_UsesExistingConfigNearAppHost ▶️ View Recording
SecretCrudOnDotNetAppHost ▶️ View Recording
SecretCrudOnTypeScriptAppHost ▶️ View Recording
StagingChannel_ConfigureAndVerifySettings_ThenSwitchChannels ▶️ View Recording
StartAndWaitForTypeScriptSqlServerAppHostWithNativeAssets ▶️ View Recording
StopAllAppHostsFromAppHostDirectory ▶️ View Recording
StopNonInteractiveSingleAppHost ▶️ View Recording
StopWithNoRunningAppHostExitsSuccessfully ▶️ View Recording
UnAwaitedChainsCompileWithAutoResolvePromises ▶️ View Recording

📹 Recordings uploaded automatically from CI run #25523689230

@davidfowl davidfowl merged commit 8d254db into main May 8, 2026
289 checks passed
@github-actions github-actions Bot added this to the 13.4 milestone May 8, 2026
@aspire-repo-bot
Copy link
Copy Markdown
Contributor

✅ No documentation update needed.

This PR adds an internal startup OTEL profiling harness and developer tooling infrastructure — no user-facing API, configuration, or behavior changes were introduced. The PR author explicitly marked that no docs update is required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants