-
Notifications
You must be signed in to change notification settings - Fork 126
Adding support for capturing NetFx call stacks #4591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
From the contributing docs:
Given this a significant feature it should be discussed in an issue before it can be accepted. Changing to this PR to draft so that you can reference it (e.g. as PoC) in the issue. PS. Please sign the CLA 😉 |
…gs done via OutputDebugString
…ry-dotnet-instrumentation into NetFX-Stack-Capture
…ing function with detailed comments
|
The approach looks good to me. It looks like there is some commented out code that could use some cleanup. There should also be a test implemented for the netfx side of things. |
Thank you for the feedback. Integration tests are under development. Sure - there may be some dead code or commented code, I will clean it up. |
|
@eftiquar, tests are almost in place. When #4631 is merged, you should be able just to remove conditional compilation from For now, all of them are red on .NET Fx on your branch locally compiled. |
Thanks @Kielek I have merged your changes into my branch. I will verify the tests. |
…e reported if allocation sampling is enabled on Net FX
Stack Capture Engine: Design Overview
Introduction
The Stack Capture Engine enables safe, deadlock-free continuous profiling of .NET Framework applications by capturing managed call stacks from running threads. This document describes the core design principles that make this possible, focusing on two critical mechanisms: the Canary Thread pattern for runtime safety detection and RTL-based Stack Seeding for accurate context preparation. The .NET Framework runtime cannot be suspended; therefore, the stack capture implementation must employ the safety mechanisms outlined below to prevent application deadlocks.
1. The Canary Thread Pattern
1.1 The Core Problem
The CLR's
DoStackSnapshotAPI can cause deadlocks when called during unsafe runtime states—such as during garbage collection, JIT compilation, or when critical runtime locks are held. Traditional approaches that directly suspend and snapshot threads risk:CORPROF_E_STACKSNAPSHOT_UNSAFEerrors1.2 The Canary Solution
Rather than blindly attempting to capture stacks, the engine uses a dedicated canary thread as a safety sentinel. This is a known, controlled thread created by the application specifically for profiling purposes, identified by a configurable name prefix (default:
"OpenTelemetry Continuous").How It Works:
Before capturing stacks from any production threads, the engine performs a safety probe using the canary thread:
DoStackSnapshotcalled on the canary itself (tests CLR profiling API safety)If all probe operations complete successfully within the timeout (default 250ms), the runtime is considered safe for capturing production thread stacks. If the probe fails or times out, the engine skips the current capture cycle and waits for the next interval.
Key Safety Properties:
__try/__except) to catch access violations gracefully1.3 Canary Thread Lifecycle
The engine tracks all managed threads through profiler callbacks (
ThreadNameChanged,ThreadAssignedToOSThread). When a thread with the canary name prefix is detected:If the canary thread is destroyed, the engine clears its registration and waits for a new canary to be designated before resuming captures.
2. RTL-Based Stack Seeding
2.1 The Context Preparation Challenge
The CLR's
DoStackSnapshotAPI requires a valid starting context pointing to managed code. However, when a thread is suspended for profiling, its instruction pointer (RIPon x64) may be in:Sleep,WaitForSingleObject)Passing a context pointing to native code causes
DoStackSnapshotto fail or return incomplete stacks. The engine must walk the native stack frames to locate the first managed frame before invokingDoStackSnapshot.2.2 The RTL Function Solution
Windows provides low-level Runtime Library (RTL) functions for exception handling and stack unwinding:
RtlLookupFunctionEntry: Retrieves unwind metadata for a given instruction pointerRtlVirtualUnwind: Simulates stack unwinding using function metadata, updating the context to the caller's frameThese functions are the same primitives the CLR uses internally for exception handling, making them reliable and accurate.
The Seeding Algorithm:
GetFunctionFromIPRtlLookupFunctionEntryto get unwind metadata for the current RIPRtlVirtualUnwindto update the context to the caller frame[RSP]) and adjust RSPGetFunctionFromIPto check if we've reached managed codeDoStackSnapshot2.3 Critical Design Details
Function Begin Address vs. Current RIP:
The CLR's metadata associates managed functions with their entry points (begin addresses), not arbitrary instruction pointers mid-function. When unwind metadata is available, the engine uses
imageBase + runtimeFunction->BeginAddressfor managed detection, not the current RIP. This ensures reliableGetFunctionFromIPlookups.Leaf Function Handling:
Leaf functions (functions with no stack frame) lack unwind metadata. The engine detects this case and manually pops the return address from the stack:
Stack Corruption Detection:
The engine tracks the stack pointer (RSP) across frames. If RSP fails to grow (or moves backward), the stack is considered corrupted and the walk terminates to prevent crashes.
SEH Protection:
All memory reads and RTL function calls are wrapped in Structured Exception Handling. If an access violation occurs (invalid memory, corrupted unwind metadata), the operation fails gracefully without crashing the application.
2.4 Why This Matters
Without accurate seeding:
With RTL-based seeding:
3. How They Work Together
3.1 The Capture Flow
For each profiling cycle:
DoStackSnapshotwith the seeded context3.2 Layered Safety
The design provides defense in depth:
Each layer can independently fail without bringing down the application. The engine simply logs the failure and skips to the next capture cycle.
4. Platform & Technology Constraints
4.1 Windows x64 Only
The RTL functions (
RtlLookupFunctionEntry,RtlVirtualUnwind) are Windows-specific APIs. The engine is currently limited to: