ILGPU V2.0: Faster ILGPUC Compile Times + NuGet Packaging#1585
Open
m4rs-mt wants to merge 14 commits into
Open
ILGPU V2.0: Faster ILGPUC Compile Times + NuGet Packaging#1585m4rs-mt wants to merge 14 commits into
m4rs-mt wants to merge 14 commits into
Conversation
The frontend dominates compile time at ~95 ms / iteration on a
trivial `a[i] = b[i] * c[i]` kernel (~90 % of total). Breakdown:
DisassembleMethods walk : ~66 ms wall (1720 methods, only 9 ms is
real disassembly — rest is
lock contention)
LoadDebugSymbols (PDB) : ~28 ms wall (5 streams opened + parsed
every iter)
Sequence-point attach : 0.05 ms
Both disassembly and PDB load are backend-independent and identical
across every kernel compiled by one KernelCompiler instance. None of
that work was being shared.
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Co-Authored-By: Claude <noreply@anthropic.com>
c7098ae to
e1c1c51
Compare
Co-Authored-By: Claude <noreply@anthropic.com>
e1c1c51 to
c65779e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A follow-on to the V2.0 stack focused on two largely-independent concerns: (1) cutting frontend compile time on the new AOT pipeline and (2) shipping ILGPUC as a real NuGet package with an end-to-end test harness that proves it.
A note on authorship
This PR was done with AI-assistant pair programming (Claude via Claude Code). Every real commit carries a
Co-Authored-By: Claudetrailer.What's in this PR
1. Compile-time performance (
Src/ILGPUC/Frontend/)Profiling a trivial
a[i] = b[i] * c[i]kernel showed the frontend dominating compile time at ~95 ms / iteration (~90% of total). Breakdown:DisassembleMethodswalk: ~66 ms wall (1720 methods, only ~9 ms of which was real disassembly — the rest was lock contention).LoadDebugSymbols(PDB): ~28 ms wall (5 streams opened + parsed every iter).Both are backend-independent and identical across every kernel compiled by one
KernelCompilerinstance, but none of that work was being shared. Two commits address this:Cached disassembly + PDB load across kernel compiles— a newILFrontendCacheshares disassembled methods and PDB streams across kernel compiles within a singleKernelCompiler.Lazy frontend disassembly with codegen-time intrinsic safety net— replaces the eager whole-assembly walk with on-demand disassembly. A codegen-time safety net makes sure intrinsic-bound calls are still resolved correctly even when the surrounding method was never disassembled.KernelLibraryAttribute(Src/ILGPU/KernelLibraryAttribute.cs) lets library assemblies opt in to having their kernel-relevant methods discovered without forcing a full assembly walk in user code.2. Compile-time perf regression coverage (
Src/ILGPUC.Tests/PerfTests/,Src/ILGPUC.Tests/IRTests/)A new test layer keeps the wins above from regressing:
PerfTests/PerfTestBase.cs,PerfTests/CompilePerfRegressionTests.cs— perf budget assertions over representative kernels.PerfTests/CompileBenchFacts.cs— checked-in profiling fact (xUnit[Fact]) so the breakdown above can be re-measured on demand.Kernels/PerfRegressionKernels.cs— the kernel zoo the perf tests compile against.IRTests/DeepCallStackIRTests.cs+Kernels/DeepCallStackKernels.cs— exercise the lazy walk with deep call graphs and add depth + negative assertions so a future regression can't quietly fall back to eager disassembly.Framework/CompilationTestBase.cs,Framework/MsBuildRunner.cs— small framework additions to support the new layers.3. NuGet packaging for ILGPUC (
Src/ILGPUC/,Src/scripts/pack-ilgpuc.sh)ILGPUC is now packaged as a NuGet with R2R-compiled native binaries:
Src/ILGPUC/ILGPUC.csproj— packs the AOT-built compiler binaries per RID and wires up MSBuild integration props/targets.Src/ILGPUC/PACKAGE.md— the README that ships in the package.Src/ILGPUC/build/ILGPU.Kernels.targets— consumer-side MSBuild integration so referencing the package is enough to drive kernel compilation.Src/scripts/pack-ilgpuc.sh— pack script that runs the R2R build + nupkg assembly across RIDs.ILGPUis published as a transitive NuGet dependency ofILGPUC, so a consumer only references one package.4. End-to-end NuGet consumer harness (
EndToEndTest/,Samples/LocalNuGetConsumer/)The previous
Src/ILGPUC.Tests/IntegrationTests/NuGetIntegrationTests.cs(and itsNuGetHellotemplate scaffolding) was a transient, in-process test. It's replaced by a persistent, on-disk consumer project that exercises the real toolchain end-to-end:EndToEndTest/HelloKernel/— a standalone consumer project (.csproj + Program.cs) that pulls the packed NuGets, compiles a kernel, and runs it.EndToEndTest/run.sh— driver script: pack ILGPUC locally, restore against the local feed, build, run.EndToEndTest/NuGet.config.template,EndToEndTest/README.md— template feed config + docs.Samples/LocalNuGetConsumer/— a user-facing sample mirroring the same flow (with its ownpack-local.shand README), so consumers can copy the pattern.5.
KernelLibrarysample (Samples/KernelLibraryAttribute/)A new sample showing how to ship a kernel-helper library decorated with
[KernelLibrary]and consume it from a downstream project (MyKernelLib/+Consumer/).6. CI (
.github/workflows/ci.yml)e2e-testjob that runsEndToEndTest/run.shagainst the freshly packed NuGets.e2e-testsucceeding, so a packaging regression fails fast before fanning out across GPU runners.Scope boundary
ILGPUC.csprojpackaging picks up their existing R2R outputs.KernelLibraryAttributeand theEndToEndTest/+Samples/LocalNuGetConsumer/directories.NuGetIntegrationTests.csand itsNuGetHellotemplates are deleted outright; their coverage moves toEndToEndTest/.Depends on
PR #1584 (and the rest of the V2.0 stack: #1355, #1576, #1577, #1578, #1579, #1580).
Known limitations