Skip to content

[cdac] x86: implement IGCInfoDecoder.EnumerateLiveSlots; unblock GCRoots stackref tests#129547

Open
max-charlamb wants to merge 23 commits into
dotnet:mainfrom
max-charlamb:cdac-x86-enumlive-slots
Open

[cdac] x86: implement IGCInfoDecoder.EnumerateLiveSlots; unblock GCRoots stackref tests#129547
max-charlamb wants to merge 23 commits into
dotnet:mainfrom
max-charlamb:cdac-x86-enumlive-slots

Conversation

@max-charlamb

@max-charlamb max-charlamb commented Jun 17, 2026

Copy link
Copy Markdown
Member

Note

This PR was authored with assistance from GitHub Copilot.

Summary

Builds on the partial x86 IGCInfo support added in #129456 by porting the remaining decoder pieces required for GC-root scanning on x86, so that IStackWalk.WalkStackReferences returns live frame slots on x86 cDAC. Unblocks the two [SkipOnArch("x86", "GCInfo decoder does not support x86")] markers on cdac/tests/DumpTests/StackReferenceDumpTests.cs.

The x86 GC info uses the legacy bit-packed InfoHdr byte-stream encoding (src/coreclr/vm/gc_unwind_x86.inl, src/coreclr/inc/gcdecoder.cpp) instead of the modern GcInfoDecoder shared by other architectures, so the implementation lives entirely on the existing x86 decoder under Contracts/GCInfo/X86/.

Changes

  • GCInfo.cs (x86 decoder) — decode untracked-locals and VarPtr-tracked-lifetimes tables (previously skipped) into two new lazy internal properties (UntrackedSlots, VarPtrLifetimes).
  • IGCInfoDecoder.GetInterruptibleRanges — flip from NotSupportedException to a real implementation. Fully-interruptible methods report one range covering the post-prolog body (minus epilogs); partially-interruptible methods emit each call site as a single-byte range. Consumed by StackWalk_1.WalkStackReferences for the catch-handler PC override (x86 now uses the funclet EH model, see Enable new exception handling on win-x86 #115957).
  • IGCInfoDecoder.EnumerateLiveSlots — flip from NotSupportedException to a real implementation mirroring EnumGcRefsX86:
    • Early-returns empty on IsParentOfFuncletStackFrame, in prolog/epilog, or for aborted-execution at a non-safe-point in non-interruptible code.
    • Emits untracked frame locals (suppressed on filter funclets via SuppressUntrackedSlots).
    • Emits VarPtr-tracked lifetimes whose [BeginOffset, EndOffset) covers the queried offset (evaluated at instructionOffset - 1 on non-active frames; EBP-frame offsets are stored negated).
    • Walks Transitions up to the queried offset to accumulate live registers and pushed pointer args.
    • For partially-interruptible code, emits the register set + pointer args of the matching GcTransitionCall (huge 0xFB encoding uses explicit per-pointer offsets; tiny/small/medium/large use a uint32 bitmap walked low-to-high).
    • Honors GcSlotEnumerationOptions.ReportFPBasedSlotsOnly as a post-filter that drops register slots and non-frame-relative stack slots, mirroring GCInfoDecoder.ReportSlot.
  • GetSizeOfStackParameterAreareturn 0 on x86 (no separate outgoing-argument scratch area; per-offset transitions report pushed args directly).
  • GCTransition.cs — fix GcTransitionRegister and GcTransitionPointer constructors to actually store the isThis/iptr parameters they accept. The properties were defaulting to false, silently dropping every 0xBF interior-pointer prefix and every 0xBC this-pointer prefix in the byte stream.
  • GCArgTable.cs
    • Fix GetTransitionsEbpFrame to declare cumulative curOffs outside its outer loop (matches the encoder in gcdumpx86.cpp). The local was being reset every iteration, causing all partial-interrupt EBP-frame call-site transitions to be emitted at small per-iteration deltas instead of cumulative method offsets.
    • In GetTransitionsNoEbp (partial-interruptible ESP-frame walker), emit StackDepthTransition with a negative delta at call sites so the consumer matches native scanArgRegTable's stackDepth -= callArgCnt. Unreachable on current x86 codegen (all post-funclet x86 methods are EBP frames), but the bug was real.
  • EnumerateLiveSlots consumer fixes:
    • ApplyRegisterTransition skips ESP-only push/pop (depth tracking only — no fake pointer entry in pushedPtrs).
    • ApplyPointerTransition honors GcTransitionPointer.IsPtr=false for non-pointer arg pushes (encoding 0xB0..0xB7).
    • Pushed pointer args are recorded by push-index and emitted as positive SP-relative offsets at emit time (addr = ESP_call + (finalDepth - 1 - pushIndex) * sizeof(DWORD)).
    • On non-leaf frames, register-liveness events at offset > instructionOffset - 1 are skipped (regOffset), mirroring native's curOffsRegs.
    • Callee-trashed scratch (EAX/ECX/EDX) is filtered out of the result on non-active frames.
  • HelpersIsCodeOffsetInProlog/IsCodeOffsetInEpilog, RegMaskToRegisterNumber (single-bit RegMask → x86 ModRM register number).
  • Tests — remove the two [SkipOnArch("x86", ...)] markers on the GCRoots StackReferenceDumpTests.
  • docs/design/datacontracts/GCInfo.md — document the x86 EnumerateLiveSlots and GetInterruptibleRanges behavior end-to-end in a dedicated x86 specifics section.

Validation

  • All 2525 cDAC unit tests pass (no x64 regression).

  • The two unblocked GCRoots_* x86 dump tests pass.

  • cDAC GC stress suite (Windows x86 Checked, 9 debuggees, every managed allocation triggers cDAC vs. native comparison): 803,556 frames matched / 0 mismatched / 134 known-NIE (deferred transition-frame markers for PromoteCallerStack, which is tracked separately). Match progression while debugging this PR locally:

    Stage Mismatched Match %
    Pre-fix baseline 27,702 68%
    After pushed-arg / regOffset / VarPtr fixes 1,742 98%
    After GcArgTable.curOffs scope fix 30 99.97%
    After GcTransition ctor iptr/isThis fix 14 99.984%
    After IsParentOfFuncletStackFrame + ArgMask fixes 0 100%

Out of scope (deferred follow-ups)

  • info.thisPtrResult reporting for synchronized methods on the !willContinueExecution path. The regular live-register report covers willContinueExecution, which is what stress exercises.
  • IPtrMask (0xF0) interior-pointer bitmaps for pushed args — only used in the partial-interruptible ESP-frame walker, not exercised by current x86 codegen (all post-funclet x86 methods are EBP frames).
  • Porting TransitionFrame.PromoteCallerStack (the source of the 134 KNOWN_NIE entries in the stress suite) — separate cDAC work.

References

…ots stackref tests

Builds on the partial x86 IGCInfo support added in dotnet#129456 by porting the
remaining decoder pieces required for GC-root scanning on x86, so that
`IStackWalk.WalkStackReferences` returns live frame slots on x86 cDAC.

The x86 GC info uses the legacy bit-packed `InfoHdr` byte-stream encoding
(`src/coreclr/vm/gc_unwind_x86.inl`, `src/coreclr/inc/gcdecoder.cpp`)
instead of the modern `GcInfoDecoder` shared by other architectures, so
the implementation lives entirely on the existing `X86GCInfo` decoder
under `Contracts/GCInfo/X86/`.

Changes
-------

* `X86GCInfo`: add `UntrackedSlots` lazy property +
  `DecodeUntrackedSlots()` -- delta-decoded signed varints with the
  double-align-frame rebase from `gc_unwind_x86.inl:3467`.
* `X86GCInfo`: add `VarPtrLifetimes` lazy property +
  `DecodeVarPtrLifetimes()` -- triplets of (varOffs, begOffs delta,
  endOffs delta) for EBP-frame tracked locals.
* Two new public record types `UntrackedSlot` and `VarPtrLifetime`
  capture the decoded entries.
* `IsCodeOffsetInProlog` / `IsCodeOffsetInEpilog` helpers
  (offset-parameterised, so EnumerateLiveSlots can answer for any
  instruction offset without re-constructing X86GCInfo).
* `RegMaskToRegisterNumber` helper maps the single-bit `RegMask`
  flags-enum values to the x86 ModRM register numbers used by
  `X86Context.TryReadRegister` and `LiveSlot.RegisterNumber`.
* Implement `IGCInfoDecoder.EnumerateLiveSlots(uint offset, options)`:
  early-return empty in prolog/epilog (or aborted+non-interruptible),
  emit untracked locals (suppressed for filter funclets), emit VarPtr
  lifetimes covering `offset`, walk `Transitions` up to `offset`
  accumulating live registers + pushed pointer args, and emit a
  partially-interruptible `GcTransitionCall` exactly at `offset`.
* Flip `IGCInfoDecoder.GetSizeOfStackParameterArea` from
  `NotSupportedException` to `return 0` for x86 -- x86 has no
  separate outgoing-argument scratch area; per-offset transitions
  report pushed args directly, so the GcScanner scratch-area filter is
  a no-op (correct).
* Remove the `[SkipOnArch("x86", "GCInfo decoder does not support
  x86")]` markers on `GCRoots_WalkStackReferences_FindsRefs` and
  `GCRoots_RefsPointToValidObjects`.
* `DumpTests.targets`: add optional `DebuggeeFilter=<Name>` to
  restrict `GenerateAllDumps` to a single debuggee. Useful for
  iterative local x86 work where some other debuggee's publish may
  fail.
* `docs/design/datacontracts/GCInfo.md`: enumerate which
  `IGCInfoDecoder` APIs are wired up on x86.

Out of scope (deferred)
-----------------------

* `GetInterruptibleRanges` for x86 -- the only consumer is the
  catch-handler PC override in `StackWalk_1`; no x86-relevant
  scenarios today.
* "this"-pointer special-case reporting for synchronized methods
  (VarPtr 0x2 bit currently masked out).
* IPtrMask interior-pointer bitmaps for pushed args (uses the simpler
  per-push `Iptr` flag).
* Funclet handling beyond the existing `IsParentOfFuncletStackFrame`
  caller-side early-skip.
* Finer `IsActiveFrame` register filter precision.

Validation
----------

* All 2525 cDAC unit tests pass.
* The two unblocked `GCRoots_*` tests pass against a freshly
  generated x86 GCRoots dump.
* Broader `DumpTests` x86 sweep: 34 pass / 46 fail / 830 skip --
  net +2 vs. before this change (the two GCRoots tests), zero
  regressions. The 46 pre-existing failures are all unrelated to
  GCInfo (`ThreadDumpTests` / `ComWrappersDumpTests` /
  `RuntimeInfoDumpTests` / `WorkstationGCDumpTests` and similar).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The intro paragraph was getting wordy with the per-API status; pull that
content out into a new "x86 specifics" section at the end of the file
with a table covering supported/not-implemented APIs and a deferred-edges
list. Intro now just notes that x86 is partially supported and links to
the section.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends the cDAC x86 GCInfo decoder so stack-walk GC root scanning can enumerate live frame slots on x86, and enables the previously x86-skipped GCRoots WalkStackReferences dump tests.

Changes:

  • Implements IGCInfoDecoder.EnumerateLiveSlots for x86, plus lazy decoding for untracked locals and VarPtr lifetimes.
  • Enables x86 execution for the two GCRoots StackReferenceDumpTests by removing the x86 skip.
  • Adds an MSBuild DebuggeeFilter option to limit dump generation to a single debuggee and updates GCInfo contract documentation.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 8 comments.

File Description
src/native/managed/cdac/tests/DumpTests/StackReferenceDumpTests.cs Removes x86 skips so GCRoots stackref dump tests run on x86.
src/native/managed/cdac/tests/DumpTests/DumpTests.targets Adds DebuggeeFilter support to limit debuggee csproj discovery/build.
src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs Adds lazy decoding for untracked/VarPtr tables and implements x86 live-slot enumeration.
docs/design/datacontracts/GCInfo.md Updates the x86 support statement in the GCInfo contract doc.

Comment thread docs/design/datacontracts/GCInfo.md Outdated
Context: x86 has used the funclet EH model since PR dotnet#115957 / dotnet#122872
(I had previously assumed otherwise and documented this API as
intentionally not implemented). Catch funclet unwinding does call into
`IGCInfo.GetInterruptibleRanges` via the parent-frame PC override
path in `StackWalk_1.WalkStackReferences`, so throwing
`NotSupportedException` is a real correctness gap (silently swallowed
by the per-frame try/catch and producing missed parent-frame GC roots).

Implementation
--------------

Match the semantics of the native x86 walker (`EnumGcRefsX86` in
`gc_unwind_x86.inl`):

* Fully interruptible (`Header.Interruptible == true`): emit one range
  per gap between the prolog end and each epilog start, plus a final
  range from the last epilog end to `MethodSize`. This mirrors the
  native walker's prolog/epilog short-circuit return at line 3091.
* Partially interruptible: walk `Transitions` and emit a single-byte
  `(offset, offset + 1)` range for each `GcTransitionCall`. Per the
  native partial-interrupt encoding doc (`gc_unwind_x86.inl:1066+`),
  call sites are the only GC-safe points in this mode -- the
  intervening `GcTransitionRegister` / `GcTransitionPointer` /
  `StackDepthTransition` / `IPtrMask` / `CalleeSavedRegister`
  events are all bookkeeping, not safe points.

Docs
----

Update `docs/design/datacontracts/GCInfo.md` x86 specifics:

* Move `GetInterruptibleRanges` from "Not implemented" to
  "Implemented" in the supported APIs table.
* Remove the false claim that x86 has no funclets / no x86-relevant
  scenarios for this API.
* Reference PR dotnet#115957 (Enable new exception handling on win-x86) for
  context on the EH model.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 13:54
Pure file rename (git mv) -- the class was renamed `GCInfo` -> `X86GCInfo`
in dotnet#129456 to avoid collision with the empty IGCInfo fallback struct, but
the file kept the old name. Bring the filename in line.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (7)

src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:638

  • EnumerateTransitionLiveSlots emits pushed pointer-arg slots using the raw offsets stored in pushedPtrs (SpOffset: pushed.Key). Those offsets are tracked in a pre-push coordinate system (negative, relative to the SP value before the outstanding pushes), but GcScanner interprets SpOffset as an offset from the current SP_REL base. As-is, this will compute incorrect stack addresses for pushed args (and the comment above claims the offsets are positive). Translate the offsets to be relative to the current SP before emitting LiveSlots.
    src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:602
  • activeCallSite is currently captured for GcTransitionCall entries even in fully-interruptible methods. That makes activeCallSite non-null at call offsets and can change downstream behavior (e.g., it can force register reporting even when IsActiveFrame is false). GcTransitionCall should only be used as the authoritative live-state source for partially-interruptible methods.
    src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:631
  • Register reporting is currently guarded by if (options.IsActiveFrame || activeCallSite is not null). Even after restricting activeCallSite to partially-interruptible methods, this still reports the accumulated liveRegs set at call sites, which can double-report registers alongside the activeCallSite.CallRegisters emission and can re-enable reporting when IsActiveFrame is false. Consider keeping accumulated register emission strictly tied to IsActiveFrame and skipping it when an activeCallSite is present.
    src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:76
  • This adds new public surface area (UntrackedSlots) on X86GCInfo. Since the new data is only consumed internally within the x86 decoder, consider making this internal (or private) to avoid expanding the public API surface and the associated API-approval requirements for externally-consumable contracts assemblies.
    src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:83
  • This adds new public surface area (VarPtrLifetimes) on X86GCInfo. If it’s not intended for external consumption, making it internal helps avoid locking in an API contract for implementation details (and avoids needing API approval for these additions).
    src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:750
  • UntrackedSlot is introduced as a public type but is only used as an implementation detail within X86GCInfo. Consider making it internal to avoid committing to a public API shape for decoder internals (and to avoid needing API approval). If you do this, ensure UntrackedSlots isn’t public either (it can’t return an internal type).
    src/native/managed/cdac/Microsoft.Diagnostics.DataContractReader.Contracts/Contracts/GCInfo/X86/GCInfo.cs:765
  • VarPtrLifetime is introduced as a public type but is only used as an implementation detail within X86GCInfo. Consider making it internal to avoid expanding public API surface for decoder internals (and to avoid needing API approval). If you do this, ensure VarPtrLifetimes isn’t public either.

Max Charlamb and others added 4 commits June 18, 2026 11:02
…ed bits

I had been masking out the 0x2 bit for VarPtr lifetimes, claiming it
meant "this" pointer not pinned for tracked locals. That's wrong on
modern x86: the native VarPtr loop in gc_unwind_x86.inl:3610-3613
explicitly says

    // First  Bit : byref
    // Second Bit : pinned
    // Both bits are valid
    flags |= lowBits;

The this_OFFSET_FLAG = 0x2 interpretation in gcinfo.h was scoped to the
legacy JIT32_ENCODER on x86 without funclets, which has been gone since
dotnet#115957 enabled funclet EH on win-x86 (and dotnet#122872 removed the rest).

Pass LowBits straight through into LiveSlot.GcFlags for VarPtr entries,
matching what we already do for untracked slots. Update the
VarPtrLifetime LowBits xmldoc to drop the wrong "this"-pointer note.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
GcTransitionRegister and GcTransitionPointer constructors took `isThis`
and `iptr` parameters but silently dropped them on the floor. The
default-false properties survived past every `0xBF` interior-pointer
prefix, every `0xBC` this-pointer prefix, and every byref CallRegister,
so the LIVE state walker's iptr accumulator was always 0. Stress tests
on Windows x86 saw ~30 register-resident interior-pointer mismatches per
debuggee in CoreLib code (e.g. EventSource.DefineEventPipeEvents) where
RT reported Reg=EDI Flags=0x1 (interior) and cDAC reported Flags=0x0.

Add the missing `IsThis = isThis; Iptr = iptr;` assignments. Drops the
BasicAlloc x86 stress mismatch count from 30 to 0 and the full-suite
total from 316 to 14 (99.984% match).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Native gcdumpx86.cpp:361 declares `curOffs = 0` once *outside* the
partial-interrupt EBP-frame walk loop and accumulates code-offset
deltas across iterations. cDAC's `GetTransitionsEbpFrame` declared it
*inside* the `while (true)` loop body, so it was reset to 0 every
iteration -- causing all partial-interrupt EBP-frame call-site
transitions to be emitted at small per-iteration deltas instead of
true cumulative method offsets.

Effect: callee-saved register liveness reported by RT on real call
sites in CoreLib R2R'd code (where the JIT typically keeps GC refs
in EBX/ESI/EDI across calls) was missed by cDAC's EnumerateLiveSlots,
because activeCallSite never matched at the queried offset. Stress
tests on Windows x86 saw ~3K such under-reports per BasicAlloc run
(1480 EDI + 1284 EBX + 476 ESI), concentrated in
System.Diagnostics.Tracing.EventSource.* methods.

Move `uint curOffs = 0;` outside the loop to mirror native, dropping
the BasicAlloc x86 stress mismatch count from 1742 to 30 (98.0% ->
99.97% match).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Four related correctness fixes in X86GCInfo's `EnumerateLiveSlots` and
`EnumerateTransitionLiveSlots`, mirroring native EnumGcRefsX86 in
gc_unwind_x86.inl. These were uncovered by running the cDAC GC stress
tests against Windows x86 corerun and comparing cDAC's slot output
against the runtime's at every managed allocation.

1. **Pushed-pointer-arg address calculation** -- Native at
   gc_unwind_x86.inl:3262 addresses bit `i` of the args mask at
   `pPendingArgFirst - i*sizeof(DWORD)`, where pPendingArgFirst is the
   FIRST-pushed-arg address (highest among pushed args). cDAC was
   storing pushed pointers keyed by `-depthSlots*4` and emitting them
   as SP_REL slots with the negative offset, resolving to addresses
   *below* the call-site SP -- impossible for live args. Switched to
   storing by push-index (depth-at-push-time, 0-indexed) and
   translating to positive SP-relative offset
   `(finalDepth - 1 - pushIndex) * 4` at emit time.

2. **ESP-only push/pop semantics** -- GcArgTable encodes ESP push/pop
   bytes as `GcTransitionRegister` with `RegMask.ESP` and Action.PUSH/
   POP, but those mean "stack-depth tracking" (non-pointer args), not
   "pointer push". cDAC was treating them as pointer pushes (adding
   phantom entries to pushedPtrs) and was filtering them through the
   regOffset gate (which is for register-liveness LIVE/DEAD events
   only, not arg-stream events). Now ESP-only pushes only advance
   depth, and the regOffset gate fires only for `Action.LIVE`/
   `Action.DEAD`.

3. **Register-state offset on non-leaf frames** -- Native EnumGcRefsX86
   uses `curOffsRegs = curOffs - 1` for non-active stack frames
   because register liveness can change across calls
   (gc_unwind_x86.inl:3149+). Mirror that: register-liveness LIVE/DEAD
   events at offset > regOffset are skipped on non-active frames.

4. **VarPtr lifetimes on EBP frames** -- Native at
   gc_unwind_x86.inl:3567-3573 negates the encoded VarPtr stack offset
   for EBP frames (the encoded value is positive but means
   EBP-relative-negative for locals). cDAC was leaving it positive,
   reporting locals at the wrong addresses. Also use `curOffs-1` for
   the lifetime-range check on non-active frames, mirroring
   gc_unwind_x86.inl:3540-3548 (a variable could be dead at the
   return address if the call was the last instruction of a try and
   the return jumps to a catch handler).

5. **Callee-trashed register filtering** -- On non-active frames the
   callee will have overwritten EAX/ECX/EDX, so any GC refs they held
   at the call site are stale. Native gates these via the
   `ActiveStackFrame` flag at gc_unwind_x86.inl:3189-3199; mirror that
   in EnumerateTransitionLiveSlots.

Combined effect on Windows x86 stress (BasicAlloc, Checked):
  Before:  59,514 matched / 27,702 mismatched (68%)
  After:    87,778 matched /     0 mismatched (100%)

Across the full 9-debuggee suite: 43,771 PASS / 14 FAIL (99.97%).
The remaining 14 are filter-funclet / state-context edge cases tracked
separately.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 19:09

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.

…rgMask call sites

Two follow-up correctness fixes for residual stress-test mismatches.

1. **ParentOfFuncletStackFrame early-return** -- Native EnumGcRefsX86
   (gc_unwind_x86.inl:3039) returns without reporting any GC refs when
   the stack walker has flagged the frame as a parent whose locals are
   already being reported via a funclet (e.g. a catch handler). cDAC's
   `StackWalk_1` does set `GcSlotEnumerationOptions.IsParentOfFuncletStackFrame`
   correctly via `gcFrame.ShouldParentToFuncletSkipReportingGCReferences`,
   but `X86GCInfo.EnumerateLiveSlots` was ignoring it and emitting the
   parent's slots a second time. This caused stress mismatches in
   exception-handling scenarios where a catch funclet appears alongside
   its parent (e.g. `Program.NestedExceptionScenario`): cDAC reported 9
   refs on the parent frame while RT correctly reported 0.

2. **Call-site `ArgMask` iteration** -- For partially-interruptible call
   sites, GcArgTable populates either `GcTransitionCall.PtrArgs` (huge
   `0xFB` encoding with explicit per-pointer offsets) or
   `GcTransitionCall.ArgMask` / `IArgs` (tiny / small / medium / large
   encodings with a bitmap of pushed-pointer slots). cDAC was only
   iterating `PtrArgs`, silently dropping every call site that used the
   bitmap form. Native scanArgRegTable (gc_unwind_x86.inl:3373-3402)
   walks the bitmap low-to-high with `argAddr = ESP + i*sizeof(DWORD)`;
   mirror that. Affected `System.Diagnostics.Tracing.ManifestBuilder.
   CreateManifestString` and similar large CoreLib methods where the
   missing slot was always at `SP+0` (the lowest-bit pushed pointer).

Combined with the prior commits in this branch, x86 cDAC GC root
walking now reports identical results to the runtime across all 9
stress debuggees:

  Pre-session baseline: 87,294 frames / 27,702 mismatched (68% match)
  This commit:         803,556 frames /      0 mismatched (100%)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service

Copy link
Copy Markdown
Contributor

Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag
See info in area-owners.md if you want to be subscribed.

…LiveSlots

Tighten comments in `X86GCInfo.cs` and `GCArgTable.cs`:
- Replace `gc_unwind_x86.inl:NNNN` line-number citations with file +
  function-name references (line numbers churn). Cite `EnumGcRefsX86`,
  `scanArgRegTableI`, `scanArgRegTable`, etc.
- Compact the per-block "why" comments to a single concise sentence,
  removing mechanical "we do X because native does X" duplication.
- Tag the regOffset/curOffsRegs note in `EnumerateTransitionLiveSlots`
  to point at the equivalent native field rather than line numbers.

Document the now-functional `EnumerateLiveSlots` in
`docs/design/datacontracts/GCInfo.md`. Update the Supported APIs table
to reflect that it is implemented and stress-validated, and replace the
stale Deferred edges list (now-resolved items dropped, remaining items
reframed as "not exercised by current x86 codegen / not stressed").

No behavioral changes; full stress suite still reports 0 mismatches.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 19:42
Reverts commit cb82b21 to keep the file name as `GCInfo.cs`. Even
though the class inside is `X86GCInfo` (renamed in dotnet#129456 to avoid
colliding with the empty `Contracts.GCInfo` IGCInfo fallback struct),
the file name churn shows up as a delete+add in the PR diff, which is
harder to review. The C# class name does not need to match the file
name.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 1 comment.

Comment thread docs/design/datacontracts/GCInfo.md Outdated
…re now implemented

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 20:03
…ction

Drop the per-API x86 callouts in the IGCInfoHandle interface comments
and the intro cross-reference. All x86 detail now lives in the
dedicated 'x86 specifics' section, minimizing the diff footprint
outside that section.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

All APIs are now implemented on x86; the table was just noise. Distill
the two truly x86-specific behavior notes (GetSizeOfStackParameterArea
returns 0, GetInterruptibleRanges encoding) into a short bullet list in
the intro of the x86 specifics section. The EnumerateLiveSlots
behavior subsection is unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 20:12
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Was added as a local convenience to restrict GenerateAllDumps to a
single debuggee during iterative x86 work. Not needed in the shipping
infrastructure.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

- Narrow visibility of decoder-internal types and properties:
  `UntrackedSlot`, `VarPtrLifetime`, `UntrackedSlots`, `VarPtrLifetimes`
  are only consumed by `EnumerateLiveSlots`, so make them `internal`
  (was `public`) to avoid expanding the contracts assembly surface.
- Replace hardcoded `* 4` with `_target.PointerSize` in the pushed-arg
  emit and the ArgMask bitmap iteration -- consistent with the rest of
  the decoder.
- Fix stale VarPtr comment that claimed `0x2` was "this NOT pinned".
  Modern x86 JIT uses `0x2` for pinned (matching `LiveSlot.GcFlags`);
  the legacy `this` interpretation only applied to JIT32_ENCODER which
  is no longer in use. The encoded bits already passed through
  correctly -- this is a comment fix only.
- Implement `GcSlotEnumerationOptions.ReportFPBasedSlotsOnly` on x86 as
  a post-filter that drops register slots and non-frame-relative stack
  slots, matching `GCInfoDecoder.ReportSlot` semantics on other arches.
  Update GCInfo.md to mention the filter under the x86 specifics
  section.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 20:22

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Max Charlamb and others added 2 commits June 18, 2026 16:28
…errupt EBP-less call sites

Native scanArgRegTable (gc_unwind_x86.inl) decrements stackDepth by
callArgCnt at every call encoding in the EBP-less partial-interrupt
walker. cDAC was emitting the delta as positive, which the consumer
(EnumerateLiveSlots) was then *adding* to depthSlots -- the opposite
direction. Negate at the four StackDepthTransition emission sites in
GetTransitionsNoEbp so the consumer's `depthSlots += delta` matches
native semantics.

In practice this path is unreachable on current x86 codegen because all
post-funclet x86 methods are EBP frames (PR dotnet#115957), and the cDAC GC
stress suite still reports 0 mismatches across all 9 debuggees -- but
the bug was real and the fix is mechanical.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… pushes

GcArgTable's GetTransitionsFullyInterruptible emits non-pointer arg
pushes (encoding 0xB0..0xB7) as GcTransitionPointer with IsPtr=false.
EnumerateLiveSlots was unconditionally recording every PUSH into
pushedPtrs, which would incorrectly report non-pointer slots as GC
roots. Skip the pushedPtrs entry when IsPtr is false (still advance
depthSlots so positional offsets stay correct), matching the
RegMask.ESP path in ApplyRegisterTransition.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 18, 2026 20:32
@max-charlamb max-charlamb marked this pull request as ready for review June 18, 2026 20:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.

Max Charlamb and others added 3 commits June 19, 2026 12:09
…est platforms

The x86 EnumerateLiveSlots / GetInterruptibleRanges implementation in
this PR is validated locally to match EnumGcRefsX86 across all 9 stress
debuggees (0 mismatches, 803K frames). Enable CI coverage by including
windows_x86 in cdacStressPlatforms, matching the existing windows_x86
entries in cdacDumpPlatforms and cdacXPlatDumpPlatforms.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…by pushedSize

Native EnumGcRefsX86 computes `argBase = ESP + pushedSize` for ESP frames
when iterating the untracked locals table (and VarPtr if present), so an
encoded `stkOffs` resolves to `ptrAddr = ESP + pushedSize + stkOffs`.
The SP-relative offset cDAC needs to emit is therefore `pushedSize +
stkOffs`, not bare `stkOffs`.

Compute the pushed-arg size at the queried instruction offset and add it
to non-EBP-relative untracked / VarPtr slot offsets. EBP-frame slots are
already FRAMEREG-relative and need no adjustment.

Refactor `CalculatePushedArgSize()` to delegate to a new
`CalculatePushedArgSizeAt(uint codeOffset)` so both the existing
`RelativeOffset`-bound `PushedArgSize` property and the new
`EnumerateLiveSlots(instructionOffset, ...)` consumer share the walk.

Unreachable on current x86 codegen (all post-funclet x86 methods use
EBP frames, see dotnet#115957), but the bug was real and would have produced
wrong addresses on any future ESP-frame methods.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 19, 2026 17:14

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 5 comments.

Comment on lines +79 to +84
/// <summary>
/// The frame variable lifetime (VarPtr) table, per-offset-range tracked GC variables.
/// Decoded lazily on first access. Empty for non-EBP frames (only EBP frames track variables this way).
/// </summary>
internal ImmutableArray<VarPtrLifetime> VarPtrLifetimes => _varPtrLifetimes.Value;
private readonly Lazy<ImmutableArray<VarPtrLifetime>> _varPtrLifetimes;
Comment on lines +531 to +535
// (2) VarPtr-tracked frame locals -- live when the lifetime-check offset is within [Begin, End).
// On non-active frames EnumGcRefsX86 evaluates lifetimes at curOffs-1: a variable can be dead
// at the return address (call was last instruction of a try, return jumps to a catch handler).
// Only EBP frames produce entries; the table is empty for ESP frames.
{
Comment on lines +450 to +465
// Body minus prolog minus all epilogs. Epilogs are stored as code offsets
// (start of each epilog); each spans `EpilogSize` bytes.
uint cursor = Header.PrologSize;
uint methodSize = MethodSize;
List<InterruptibleRange> ranges = [];
foreach (int epilogStart in Header.Epilogs.OrderBy(e => e))
{
uint eStart = (uint)epilogStart;
uint eEnd = eStart + Header.EpilogSize;
if (eStart > cursor)
ranges.Add(new InterruptibleRange(cursor, eStart));
cursor = Math.Max(cursor, eEnd);
}
if (cursor < methodSize)
ranges.Add(new InterruptibleRange(cursor, methodSize));
return ranges;
Comment on lines +224 to 230
private uint CalculatePushedArgSizeAt(uint codeOffset)
{
int depth = 0;
foreach (int offset in Transitions.Keys.OrderBy(i => i))
{
if (offset > RelativeOffset)
if (offset > codeOffset)
break; // calculate only to current offset
Comment on lines 264 to 268
}
}
}

return (uint)(depth * _target.PointerSize);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eh, I'm not sure this is necessary.

@steveisok steveisok self-requested a review June 19, 2026 19:31
Native DynamicHelperFrame::GcScanRoots_Impl (frames.cpp) uses
`offsetof(ArgumentRegisters, ECX)` for ObjectArg and
`offsetof(ArgumentRegisters, EDX)` for ObjectArg2 on x86, where the
ArgumentRegisters struct is declared in REVERSED calling-convention
order via ENUM_ARGUMENT_REGISTERS_BACKWARD (cgencpu.h: { EDX, ECX } so
EDX is at offset 0 and ECX at offset +PointerSize). On every other
architecture the struct lays the registers out in forward call order
(first arg at offset 0, second at offset +PointerSize), and the
non-x86 native path adds `sizeof(TADDR)` for ObjectArg2.

cDAC's ScanDynamicHelperFrame assumed the uniform non-x86 layout, so on
x86 it reported ObjectArg at EDX's location and ObjectArg2 at ECX's
location -- the addresses came out 4 bytes off from the runtime, which
surfaced as flaky stress-test mismatches once we added windows_x86 to
cdacStressPlatforms (the previous cDAC stress matrix excluded x86, so
this code path was never exercised against the live runtime).

Detect TARGET_X86 via RuntimeInfo.GetTargetArchitecture and swap the
two offsets to match native semantics. Verified across 5 BasicAlloc
stress runs: 0 mismatches each.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants