Add Aspire startup OTEL profiling harness#16775
Conversation
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 16775Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 16775" |
0c99853 to
4135248
Compare
There was a problem hiding this comment.
Pull request overview
Adds an opt-in OpenTelemetry-based startup profiling harness for Aspire and introduces dedicated “profiling-only” telemetry plumbing across the CLI and Hosting layers, with a shared validator to confirm span correlation (CLI ↔ AppHost ↔ DCP) from dashboard exports.
Changes:
- Adds a new PowerShell-free startup OTEL harness (
verify-startup-otel.sh) plus a shared C# export validator (StartupOtelValidator), and updates the PowerShell harness to match. - Introduces dedicated
ProfilingTelemetry/ProfilingTelemetryContexttypes and profilingActivitySources for CLI and Hosting to keep profiling data separate from reported telemetry. - Expands profiling coverage to CLI-spawned child processes (dotnet, npm, guest runtimes, AppHost server sessions) and adds optional MSBuild binlog + dotnet-trace capture.
Show a summary per file
| File | Description |
|---|---|
| tools/StartupOtelValidator/ValidateStartupOtelExport.cs | Shared validator that reads dashboard OTLP JSON export and enforces correlated profiling span expectations. |
| eng/scripts/verify-startup-otel.sh | New bash startup harness: starts dashboard + AppHost, exports telemetry, runs validator, optional trace/binlog capture. |
| eng/scripts/verify-startup-otel.ps1 | PowerShell harness updated to use the shared validator and align with the new OTEL workflow. |
| .github/skills/startup-perf/SKILL.md | Updates skill docs to describe the OTEL harness workflow and artifacts. |
| src/Shared/KnownConfigNames.cs | Adds new profiling env var names and preserves legacy startup-profiling keys for compatibility. |
| src/Aspire.Hosting/DistributedApplicationBuilder.cs | Conditionally wires a profiling-only OTLP exporter pipeline when profiling is enabled. |
| src/Aspire.Hosting/Diagnostics/ProfilingTelemetry.cs | New Hosting profiling ActivitySource, span names/tags/events, and helpers for DCP/resource lifecycle profiling. |
| src/Aspire.Hosting/Orchestrator/ApplicationOrchestrator.cs | Adds profiling span around resource “before start wait” phase with error tagging. |
| src/Aspire.Hosting/ApplicationModel/ResourceNotificationService.cs | Adds profiling spans/events for resource dependency waits, plus richer wait-condition tagging. |
| src/Aspire.Hosting/Dcp/DcpExecutor.cs | Adds profiling spans across DCP orchestration phases and error tagging. |
| src/Aspire.Hosting/Dcp/DcpResourceWatcher.cs | Emits profiling span for “DCP resource observed” events (from watch/reconcile). |
| src/Aspire.Hosting/Dcp/KubernetesService.cs | Adds profiling spans + retry/timeout events around Kubernetes API calls and kubeconfig init. |
| src/Aspire.Hosting/Aspire.Hosting.csproj | Adds OpenTelemetry hosting + OTLP exporter package references for profiling pipeline support. |
| src/Aspire.Cli/Telemetry/TelemetryServiceCollectionExtensions.cs | Registers CLI ProfilingTelemetry in DI. |
| src/Aspire.Cli/Telemetry/TelemetryManager.cs | Adds profiling ActivitySource to the DEBUG diagnostic provider only when profiling is enabled. |
| src/Aspire.Cli/Telemetry/ProfilingTelemetryContext.cs | New cross-process profiling correlation context (session id + W3C context) with legacy key bridging. |
| src/Aspire.Cli/Telemetry/ProfilingTelemetry.cs | New CLI profiling ActivitySource with span/tag/event helpers for CLI orchestration + child process lifetimes. |
| src/Aspire.Cli/Telemetry/AspireCliTelemetry.cs | Adds overloads to start reported/diagnostic activities with explicit parent context. |
| src/Aspire.Cli/Commands/RunCommand.cs | Wraps key aspire run phases in profiling spans and propagates profiling context to AppHost env. |
| src/Aspire.Cli/Commands/AppHostLauncher.cs | Adds profiling around detached child spawn/backchannel wait and propagates profiling env vars. |
| src/Aspire.Cli/Backchannel/AppHostCliBackchannel.cs | Adds profiling spans/events for backchannel connect and dashboard URL RPC. |
| src/Aspire.Cli/Backchannel/AppHostAuxiliaryBackchannel.cs | Adds optional profiling events for auxiliary backchannel dashboard URL resolution. |
| src/Aspire.Cli/Backchannel/AuxiliaryBackchannelMonitor.cs | Passes profiling telemetry into auxiliary backchannel creation for consistent instrumentation. |
| src/Aspire.Cli/Backchannel/AppHostConnectionResolver.cs | Adds optional profiling telemetry plumb-through when connecting via sockets. |
| src/Aspire.Cli/Projects/DotNetAppHostProject.cs | Replaces diagnostic spans with profiling spans across run phases (isolated mode, certs, build, run lifetime). |
| src/Aspire.Cli/Projects/GuestAppHostProject.cs | Plumbs ProfilingTelemetry into guest apphost workflows and guest runtime creation. |
| src/Aspire.Cli/Projects/GuestRuntime.cs | Adds profiling for guest runtime init/dependency install/exec command launches with exit-code error tagging. |
| src/Aspire.Cli/Projects/AppHostServerSession.cs | Adds profiling for AppHost server process lifetime, pid/exit code tagging, and DI plumbing. |
| src/Aspire.Cli/DotNet/IProcessExecution.cs | Adds ProcessId to support correlating spawned process spans. |
| src/Aspire.Cli/DotNet/ProcessExecution.cs | Implements ProcessId pass-through from underlying Process. |
| src/Aspire.Cli/DotNet/DotNetCliRunner.cs | Adds profiling spans/events for dotnet invocations; optional MSBuild binlog injection and output counters. |
| src/Aspire.Cli/Npm/NpmRunner.cs | Adds profiling span around npm command execution; switches to Directory.CreateTempSubdirectory. |
| src/Aspire.Cli/Program.cs | Minor changes around main reported activity initialization. |
| tests/Aspire.Cli.Tests/Utils/CliTestHelper.cs | Registers ProfilingTelemetry in CLI test DI container. |
| tests/Aspire.Cli.Tests/TestServices/TestProcessExecutionFactory.cs | Updates test runner helpers for new DotNetCliRunner ctor and process id support. |
| tests/Aspire.Cli.Tests/Telemetry/ProfilingTelemetryTests.cs | Adds unit tests for CLI profiling activity source behavior. |
| tests/Aspire.Cli.Tests/Telemetry/ProfilingTelemetryContextTests.cs | Adds unit tests for profiling context creation, env propagation, and legacy key bridging. |
| tests/Aspire.Cli.Tests/Telemetry/AspireCliTelemetryTests.cs | Adds tests for new explicit-parent-context activity creation overloads. |
| tests/Aspire.Cli.Tests/DotNet/DotNetCliRunnerTests.cs | Adds tests validating binlog argument behavior. |
| tests/Aspire.Cli.Tests/Commands/RunCommandTests.cs | Adds tests for detached child env propagation of profiling context. |
| tests/Aspire.Cli.Tests/Projects/GuestAppHostProjectTests.cs | Updates guest project construction for required ProfilingTelemetry. |
Copilot's findings
- Files reviewed: 41/41 changed files
- Comments generated: 5
31b1f36 to
598a283
Compare
Add opt-in profiling telemetry for the Aspire CLI, AppHost, and DCP startup paths, including cross-process propagation, process/binlog span metadata, and startup verification scripts. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the threaded profiling context object and use Activity baggage plus process-boundary helpers to carry the profiling session through CLI spans and child processes. Seed the ambient Activity ancestor chain so sibling profiling spans reuse the same session without adding profiling tags to reported telemetry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Explain why profiling session baggage is written to ambient Activity ancestors while profiling tags remain limited to profiling spans. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Address review feedback by using the standard traceparent/tracestate names for profiling trace context propagation and removing incorrect legacy wording from the startup trace-context comments. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
e695c15 to
282e55a
Compare
There was a problem hiding this comment.
Reviewed DCP-related parts, looks good.
I think you should consider ripping out AspireEventSource as part of this work and re-writing startup-perf skill to use OTEL instead of EventSource.
EDIT: just noticed you did, awesome 👍
You have my blessing to delete AspireEventSource too 😄
Explain why DCP resource observations are represented as short child activities from annotated trace context instead of events on the current activity. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
🎬 CLI E2E Test Recordings — 77 recordings uploaded (commit View all recordings
📹 Recordings uploaded automatically from CI run #25523689230 |
|
✅ No documentation update needed. This PR adds an internal startup OTEL profiling harness and developer tooling infrastructure — no user-facing API, configuration, or behavior changes were introduced. The PR author explicitly marked that no docs update is required. |
Description
Adds opt-in startup/profiling telemetry for Aspire using dedicated OTEL ActivitySources, keeping profiling diagnostics separate from reported/customer telemetry.
This change:
ProfilingTelemetryhelpers for high-cardinality startup/profiling spans.AspireCliTelemetryreported telemetry.eng/scripts/verify-startup-otel.sheng/scripts/verify-startup-otel.ps1tools/StartupOtelValidator/ValidateStartupOtelExport.csRelated DCP dependency: microsoft/dcp#136
Reviewer validation
To validate from a PR build:
Then run the startup OTEL harness from this repo:
For a local already-built tree, this faster form can be used:
The harness starts an Aspire app with profiling enabled, exports dashboard traces, and validates that one profiling session contains correlated CLI, AppHost/Hosting, DCP resource creation/observation, resource wait, and binlog metadata spans. Use
--require-dcp-spanswhen validating with a DCP build that emitsdcp.startupspans.Latest local successful capture after rebasing:
artifacts/tmp/startup-otel-harness/20260505-094204/summary.json3e85dcfdc5d2432497f9225e632e601a64a102f0dbffb22b73a91daed78d1cfaValidation
dotnet build src/Aspire.Cli/Aspire.Cli.csproj /p:SkipNativeBuild=true /p:ContinuousIntegrationBuild=truedotnet build src/Aspire.Cli/Aspire.Cli.csproj /p:SkipNativeBuild=truedotnet build src/Aspire.Hosting/Aspire.Hosting.csproj /p:SkipNativeBuild=truedotnet test --project tests/Aspire.Cli.Tests/Aspire.Cli.Tests.csproj --no-launch-profile -- --filter-class "*.ProfilingTelemetryTests" --filter-class "*.ProfilingTelemetryContextTests" --filter-class "*.GuestRuntimeTests" --filter-class "*.NpmRunnerTests" --filter-class "*.DotNetCliRunnerTests" --filter-class "*.AppHostServerSessionTests" --filter-class "*.GuestAppHostProjectTests" --filter-not-trait "quarantined=true" --filter-not-trait "outerloop=true"./eng/scripts/verify-startup-otel.sh --skip-build --collect-dotnet-binlogsFixes # (issue)
Checklist
<remarks />and<code />elements on your triple slash comments?aspire.devissue: