ILGPU V2.0: Faster ILGPUC Compile Times + NuGet Packaging by m4rs-mt · Pull Request #1585 · m4rs-mt/ILGPU

m4rs-mt · 2026-04-30T22:52:05Z

A follow-on to the V2.0 stack focused on two largely-independent concerns: (1) cutting frontend compile time on the new AOT pipeline and (2) shipping ILGPUC as a real NuGet package with an end-to-end test harness that proves it.

A note on authorship

This PR was done with AI-assistant pair programming (Claude via Claude Code). Every real commit carries a Co-Authored-By: Claude trailer.

What's in this PR

1. Compile-time performance (`Src/ILGPUC/Frontend/`)

Profiling a trivial a[i] = b[i] * c[i] kernel showed the frontend dominating compile time at ~95 ms / iteration (~90% of total). Breakdown:

DisassembleMethods walk: ~66 ms wall (1720 methods, only ~9 ms of which was real disassembly — the rest was lock contention).
LoadDebugSymbols (PDB): ~28 ms wall (5 streams opened + parsed every iter).
Sequence-point attach: 0.05 ms.

Both are backend-independent and identical across every kernel compiled by one KernelCompiler instance, but none of that work was being shared. Two commits address this:

Cached disassembly + PDB load across kernel compiles — a new ILFrontendCache shares disassembled methods and PDB streams across kernel compiles within a single KernelCompiler.
Lazy frontend disassembly with codegen-time intrinsic safety net — replaces the eager whole-assembly walk with on-demand disassembly. A codegen-time safety net makes sure intrinsic-bound calls are still resolved correctly even when the surrounding method was never disassembled.
New KernelLibraryAttribute (Src/ILGPU/KernelLibraryAttribute.cs) lets library assemblies opt in to having their kernel-relevant methods discovered without forcing a full assembly walk in user code.

2. Compile-time perf regression coverage (`Src/ILGPUC.Tests/PerfTests/`, `Src/ILGPUC.Tests/IRTests/`)

A new test layer keeps the wins above from regressing:

PerfTests/PerfTestBase.cs, PerfTests/CompilePerfRegressionTests.cs — perf budget assertions over representative kernels.
PerfTests/CompileBenchFacts.cs — checked-in profiling fact (xUnit [Fact]) so the breakdown above can be re-measured on demand.
Kernels/PerfRegressionKernels.cs — the kernel zoo the perf tests compile against.
IRTests/DeepCallStackIRTests.cs + Kernels/DeepCallStackKernels.cs — exercise the lazy walk with deep call graphs and add depth + negative assertions so a future regression can't quietly fall back to eager disassembly.
Framework/CompilationTestBase.cs, Framework/MsBuildRunner.cs — small framework additions to support the new layers.

3. NuGet packaging for ILGPUC (`Src/ILGPUC/`, `Src/scripts/pack-ilgpuc.sh`)

ILGPUC is now packaged as a NuGet with R2R-compiled native binaries:

Src/ILGPUC/ILGPUC.csproj — packs the AOT-built compiler binaries per RID and wires up MSBuild integration props/targets.
Src/ILGPUC/PACKAGE.md — the README that ships in the package.
Src/ILGPUC/build/ILGPU.Kernels.targets — consumer-side MSBuild integration so referencing the package is enough to drive kernel compilation.
Src/scripts/pack-ilgpuc.sh — pack script that runs the R2R build + nupkg assembly across RIDs.
ILGPU is published as a transitive NuGet dependency of ILGPUC, so a consumer only references one package.

4. End-to-end NuGet consumer harness (`EndToEndTest/`, `Samples/LocalNuGetConsumer/`)

The previous Src/ILGPUC.Tests/IntegrationTests/NuGetIntegrationTests.cs (and its NuGetHello template scaffolding) was a transient, in-process test. It's replaced by a persistent, on-disk consumer project that exercises the real toolchain end-to-end:

EndToEndTest/HelloKernel/ — a standalone consumer project (.csproj + Program.cs) that pulls the packed NuGets, compiles a kernel, and runs it.
EndToEndTest/run.sh — driver script: pack ILGPUC locally, restore against the local feed, build, run.
EndToEndTest/NuGet.config.template, EndToEndTest/README.md — template feed config + docs.
Samples/LocalNuGetConsumer/ — a user-facing sample mirroring the same flow (with its own pack-local.sh and README), so consumers can copy the pattern.

5. `KernelLibrary` sample (`Samples/KernelLibraryAttribute/`)

A new sample showing how to ship a kernel-helper library decorated with [KernelLibrary] and consume it from a downstream project (MyKernelLib/ + Consumer/).

6. CI (`.github/workflows/ci.yml`)

New e2e-test job that runs EndToEndTest/run.sh against the freshly packed NuGets.
The four GPU compile jobs (CUDA / ROCm / OpenCL / Metal) are now gated on e2e-test succeeding, so a packaging regression fails fast before fanning out across GPU runners.

Scope boundary

No changes to backend code generation behavior — backends are touched only insofar as the ILGPUC.csproj packaging picks up their existing R2R outputs.
No new public surface beyond KernelLibraryAttribute and the EndToEndTest/ + Samples/LocalNuGetConsumer/ directories.
The replaced NuGetIntegrationTests.cs and its NuGetHello templates are deleted outright; their coverage moves to EndToEndTest/.

Depends on

PR #1584 (and the rest of the V2.0 stack: #1355, #1576, #1577, #1578, #1579, #1580).

Known limitations

The R2R native binaries shipped in the NuGet are per-RID; consumers on a RID that isn't packed will fall back to a managed-only path (or fail to restore, depending on the project's RID settings).

The frontend dominates compile time at ~95 ms / iteration on a trivial `a[i] = b[i] * c[i]` kernel (~90 % of total). Breakdown: DisassembleMethods walk : ~66 ms wall (1720 methods, only 9 ms is real disassembly — rest is lock contention) LoadDebugSymbols (PDB) : ~28 ms wall (5 streams opened + parsed every iter) Sequence-point attach : 0.05 ms Both disassembly and PDB load are backend-independent and identical across every kernel compiled by one KernelCompiler instance. None of that work was being shared. Co-Authored-By: Claude <noreply@anthropic.com>

Co-Authored-By: Claude <noreply@anthropic.com>

m4rs-mt and others added 13 commits April 30, 2026 23:44

PRs #1355, #1576, #1577, #1578, #1579, #1580, #1584.

84344be

Lazy frontend disassembly with codegen-time intrinsic safety net.

a0bb738

Co-Authored-By: Claude <noreply@anthropic.com>

Improved lazy walk with deep-call-stack tests.

6a3ec39

Co-Authored-By: Claude <noreply@anthropic.com>

Strengthened assertions with depth + negative checks.

527d1ff

Co-Authored-By: Claude <noreply@anthropic.com>

Fixed CS0419 cref warning, scope helper kernels to internal.

09773f1

Co-Authored-By: Claude <noreply@anthropic.com>

Added compile-time perf regression test layer.

03ef287

Co-Authored-By: Claude <noreply@anthropic.com>

Added CompileBench profiling fact to ILGPUC.Tests.

b5adbb9

Co-Authored-By: Claude <noreply@anthropic.com>

Added [KernelLibrary] sample.

08dec60

Co-Authored-By: Claude <noreply@anthropic.com>

Added NuGet packaging for ILGPUC with R2R native binaries.

a94a343

Co-Authored-By: Claude <noreply@anthropic.com>

Made ILGPU a transitive NuGet dependency of ILGPUC.

72a52e9

Co-Authored-By: Claude <noreply@anthropic.com>

Replaced NuGetIntegrationTests with persistent EndToEndTest/

94aafaf

Co-Authored-By: Claude <noreply@anthropic.com>

Gated the four GPU compile jobs on a new e2e-test job.

1ea8255

Co-Authored-By: Claude <noreply@anthropic.com>

m4rs-mt added this to the v2.0 milestone Apr 30, 2026

m4rs-mt force-pushed the faster_compile_time branch from c7098ae to e1c1c51 Compare April 30, 2026 22:54

Added LocalNuGetConsumer sample.

c65779e

Co-Authored-By: Claude <noreply@anthropic.com>

m4rs-mt force-pushed the faster_compile_time branch from e1c1c51 to c65779e Compare April 30, 2026 23:02

m4rs-mt marked this pull request as ready for review May 1, 2026 10:07

m4rs-mt mentioned this pull request May 3, 2026

ILGPU V2.0: Added GPU execution tests after the compile-only stage #1586

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ILGPU V2.0: Faster ILGPUC Compile Times + NuGet Packaging#1585

ILGPU V2.0: Faster ILGPUC Compile Times + NuGet Packaging#1585
m4rs-mt wants to merge 14 commits into
masterfrom
faster_compile_time

m4rs-mt commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

m4rs-mt commented Apr 30, 2026

A note on authorship

What's in this PR

1. Compile-time performance (Src/ILGPUC/Frontend/)

2. Compile-time perf regression coverage (Src/ILGPUC.Tests/PerfTests/, Src/ILGPUC.Tests/IRTests/)

3. NuGet packaging for ILGPUC (Src/ILGPUC/, Src/scripts/pack-ilgpuc.sh)

4. End-to-end NuGet consumer harness (EndToEndTest/, Samples/LocalNuGetConsumer/)

5. KernelLibrary sample (Samples/KernelLibraryAttribute/)

6. CI (.github/workflows/ci.yml)

Scope boundary

Depends on

Known limitations

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Compile-time performance (`Src/ILGPUC/Frontend/`)

2. Compile-time perf regression coverage (`Src/ILGPUC.Tests/PerfTests/`, `Src/ILGPUC.Tests/IRTests/`)

3. NuGet packaging for ILGPUC (`Src/ILGPUC/`, `Src/scripts/pack-ilgpuc.sh`)

4. End-to-end NuGet consumer harness (`EndToEndTest/`, `Samples/LocalNuGetConsumer/`)

5. `KernelLibrary` sample (`Samples/KernelLibraryAttribute/`)

6. CI (`.github/workflows/ci.yml`)