Bump version to 0.21.0 #602

gmarkall · 2025-11-20T13:36:00Z

Add support for cache-hinted load and store operations (feat: add support for cache-hinted load and store operations #587)
Add more thirdparty tests (Add more thirdparty tests #586)
Add sphinx-lint to pre-commit and fix errors (Add sphinx-lint to pre-commit and fix errors #597)
Add DWARF variant part support for polymorphic variables in CUDA debug info (Add DWARF variant part support for polymorphic variables in CUDA debug info #544)
chore: clean up dead workaround for unavailable lru_cache (chore: clean up dead workaround for unavailable lru_cache #598)
chore(docs): format types docs (chore(docs): format types docs #596)
refactor: decouple Context from Stream and Event objects (refactor: decouple Context from Stream and Event objects #579)
Fix freezing in of constant arrays with negative strides (Fix freezing in of constant arrays with negative strides #589)
Update tests to accept variants of generated PTX (Update tests to accept variants of generated PTX #585)
refactor: replace device functionality with cuda.core APIs (refactor: replace device functionality with cuda.core APIs #581)
Move frontend tests to cudapy namespace (Move frontend tests to cudapy namespace #558)
Generalize the concurrency group for main merges (Generalize the concurrency group for main merges #582)
ci: move pre-commit checks to pre commit action (ci: move pre-commit checks to pre commit action #577)
chore(pixi): set up doc builds; remove most build-conda dependencies (chore(pixi): set up doc builds; remove most build-conda dependencies #574)
ci: ensure that python version in ci matches matrix (ci: ensure that python version in ci matches matrix #575)
Fix the cuda.is_supported_version() API (Fix the cuda.is_supported_version() API #571)
Fix checks on main (Fix checks on main #576)
feat: add math.nextafter (feat: add math.nextafter #543)
ci: replace conda testing with pixi (ci: replace conda testing with pixi #554)
[CI] Run PR workflow on merge to main ([CI] Run PR workflow on merge to main #572)
Propose Alternative Module Path for ext_types and Maintain numba.cuda.types.bfloat16 Import API (Propose Alternative Module Path for ext_types and Maintain numba.cuda.types.bfloat16 Import API #569)
test: enable fail-on-warn and clean up resulting failures (test: enable fail-on-warn and clean up resulting failures #529)
[Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes ([Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes #565)
Fix registration with Numba, vendor MakeFunctionToJITFunction tests (Fix registration with Numba, vendor MakeFunctionToJITFunction tests #566)
[Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules ([Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules #561)
test: refactor process-based tests to use concurrent futures in order to simplify tests (test: refactor process-based tests to use concurrent futures in order to simplify tests #550)
test: revert back to ipc futures that await each iteration (test: revert back to ipc futures that await each iteration #564)
chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments #551)
[Refactor][NFC] Vendor-in errors for future CUDA-specific changes ([Refactor][NFC] Vendor-in errors for future CUDA-specific changes #534)
Remove dependencies on target_extension for CUDA target (Remove dependencies on target_extension for CUDA target #555)
Relax the pinning to cuda-core to allow it floating across minor releases (Relax the pinning to cuda-core to allow it floating across minor releases #559)
[WIP] Port numpy reduction tests to CUDA ([WIP] Port numpy reduction tests to CUDA #523)
ci: add timeout to avoid blocking the job queue (ci: add timeout to avoid blocking the job queue #556)
Handle cuda.core.Stream in driver operations (Handle cuda.core.Stream in driver operations #401)
feat: add support for math.exp2 (feat: add support for math.exp2 #541)
Vendor in types and datamodel for CUDA-specific changes (Vendor in types and datamodel for CUDA-specific changes #533)
refactor: cleanup device constructor (refactor: cleanup device constructor #548)
bench: add cupy to array constructor kernel launch benchmarks (bench: add cupy to array constructor kernel launch benchmarks #547)
perf: cache dimension computations (perf: cache dimension computations #542)
perf: remove duplicated size computation (perf: remove duplicated size computation #537)
chore(perf): add torch to benchmark (chore(perf): add torch to benchmark #539)
test: speed up ipc tests by ~6.5x (test: speed up ipc tests by ~6.5x #527)
perf: speed up kernel launch (perf: speed up kernel launch #510)
perf: remove context threading in various pointer abstractions (perf: remove context threading in various pointer abstractions #536)
perf: reduce the number of __cuda_array_interface__ accesses (perf: reduce the number of __cuda_array_interface__ accesses #538)
refactor: remove unnecessary custom map and set implementations (refactor: remove unnecessary custom map and set implementations #530)
[Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes ([Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes #513)
test: add benchmarks for kernel launch for reproducibility (test: add benchmarks for kernel launch for reproducibility #528)
test(pixi): update pixi testing command to work with the new testing directory (test(pixi): update pixi testing command to work with the new testing directory #522)
refactor: fully remove USE_NV_BINDING (refactor: fully remove USE_NV_BINDING #525)
Draft: Vendor in the IR module (Draft: Vendor in the IR module #439)
pyproject.toml: add search path for Pyrefly (pyproject.toml: add search path for Pyrefly #524)
Vendor in numba.core.typing for CUDA-specific changes (Vendor in numba.core.typing for CUDA-specific changes #473)
Use numba.config when available, otherwise use numba.cuda.config (Use numba.core.config when available, otherwise use numba.cuda.core.config #497)
[MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback ([MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback #479)
Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes #502)
build: allow parallelization of nvcc testing builds (build: allow parallelization of nvcc testing builds #521)
chore(dev-deps): add pixi (chore(dev-deps): add pixi #505)
Vendor the imputils module for CUDA refactoring (Vendor the imputils module for CUDA refactoring #448)
Don't use MemoryLeakMixin for tests that don't use NRT (Don't use MemoryLeakMixin for tests that don't use NRT #519)
Switch back to stable cuDF release in thirdparty tests (Switch back to stable cuDF release in thirdparty tests #518)
Updating .gitignore with binaries in the testing folder (Updating .gitignore with binaries in the testing folder #516)
Remove some unnecessary uses of ContextResettingTestCase (Remove some unnecessary uses of ContextResettingTestCase #507)
Vendor in _helperlib cext for CUDA-specific changes (Vendor in _helperlib cext for CUDA-specific changes #512)
Vendor in typeconv for future CUDA-specific changes (Vendor in typeconv for future CUDA-specific changes #499)
[Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes ([Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes #493)
[Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes ([Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes #494)
Make the CUDA target the default for CUDA overload decorators (Make the CUDA target the default for CUDA overload decorators #511)
Remove C extension loading hacks (Remove C extension loading hacks #506)
Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched #437)
[Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes ([Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes #433)
Fix Bf16 Test OB Error (Fix Bf16 Test OB Error #509)
Vendor in components from numba.core.runtime for CUDA-specific changes (Vendor in components from numba.core.runtime for CUDA-specific changes #498)
[Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization ([Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization #373)
[MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 ([MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 #488)
Improve debug value range coverage (Improve debug value range coverage #461)
Add compile_all API (Add compile_all API #484)
Vendor in core.registry for CUDA-specific changes (Vendor in core.registry for CUDA-specific changes #485)
[Refactor][NFC] Vendor in numba.misc for CUDA-specific changes ([Refactor][NFC] Vendor in numba.misc for CUDA-specific changes #457)
Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (Vendor in optional, boxing for CUDA-specific changes, fix dangling imports #476)
[test] Remove dependency on cpu_target ([test] Remove dependency on cpu_target #490)
Change dangling imports of numba.core.lowering to numba.cuda.lowering (Change dangling imports of numba.core.lowering to numba.cuda.lowering #475)
[test] Use numpy's tolerance for float16 ([test] Use numpy's tolerance for float16 #491)
[Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes ([Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes #466)
[Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes ([Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes #478)

- Add support for cache-hinted load and store operations (NVIDIA#587) - Add more thirdparty tests (NVIDIA#586) - Add sphinx-lint to pre-commit and fix errors (NVIDIA#597) - Add DWARF variant part support for polymorphic variables in CUDA debug info (NVIDIA#544) - chore: clean up dead workaround for unavailable `lru_cache` (NVIDIA#598) - chore(docs): format types docs (NVIDIA#596) - refactor: decouple `Context` from `Stream` and `Event` objects (NVIDIA#579) - Fix freezing in of constant arrays with negative strides (NVIDIA#589) - Update tests to accept variants of generated PTX (NVIDIA#585) - refactor: replace device functionality with `cuda.core` APIs (NVIDIA#581) - Move frontend tests to `cudapy` namespace (NVIDIA#558) - Generalize the concurrency group for main merges (NVIDIA#582) - ci: move pre-commit checks to pre commit action (NVIDIA#577) - chore(pixi): set up doc builds; remove most `build-conda` dependencies (NVIDIA#574) - ci: ensure that python version in ci matches matrix (NVIDIA#575) - Fix the `cuda.is_supported_version()` API (NVIDIA#571) - Fix checks on main (NVIDIA#576) - feat: add `math.nextafter` (NVIDIA#543) - ci: replace conda testing with pixi (NVIDIA#554) - [CI] Run PR workflow on merge to main (NVIDIA#572) - Propose Alternative Module Path for `ext_types` and Maintain `numba.cuda.types.bfloat16` Import API (NVIDIA#569) - test: enable fail-on-warn and clean up resulting failures (NVIDIA#529) - [Refactor][NFC] Vendor-in compiler_lock for future CUDA-specific changes (NVIDIA#565) - Fix registration with Numba, vendor MakeFunctionToJITFunction tests (NVIDIA#566) - [Refactor][NFC][Cleanups] Update imports to upstream numba to use the numba.cuda modules (NVIDIA#561) - test: refactor process-based tests to use concurrent futures in order to simplify tests (NVIDIA#550) - test: revert back to ipc futures that await each iteration (NVIDIA#564) - chore(deps): move to self-contained pixi.toml to avoid mixed-pypi-pixi environments (NVIDIA#551) - [Refactor][NFC] Vendor-in errors for future CUDA-specific changes (NVIDIA#534) - Remove dependencies on target_extension for CUDA target (NVIDIA#555) - Relax the pinning to `cuda-core` to allow it floating across minor releases (NVIDIA#559) - [WIP] Port numpy reduction tests to CUDA (NVIDIA#523) - ci: add timeout to avoid blocking the job queue (NVIDIA#556) - Handle `cuda.core.Stream` in driver operations (NVIDIA#401) - feat: add support for `math.exp2` (NVIDIA#541) - Vendor in types and datamodel for CUDA-specific changes (NVIDIA#533) - refactor: cleanup device constructor (NVIDIA#548) - bench: add cupy to array constructor kernel launch benchmarks (NVIDIA#547) - perf: cache dimension computations (NVIDIA#542) - perf: remove duplicated size computation (NVIDIA#537) - chore(perf): add torch to benchmark (NVIDIA#539) - test: speed up ipc tests by ~6.5x (NVIDIA#527) - perf: speed up kernel launch (NVIDIA#510) - perf: remove context threading in various pointer abstractions (NVIDIA#536) - perf: reduce the number of `__cuda_array_interface__` accesses (NVIDIA#538) - refactor: remove unnecessary custom map and set implementations (NVIDIA#530) - [Refactor][NFC] Vendor-in vectorize decorators for future CUDA-specific changes (NVIDIA#513) - test: add benchmarks for kernel launch for reproducibility (NVIDIA#528) - test(pixi): update pixi testing command to work with the new `testing` directory (NVIDIA#522) - refactor: fully remove `USE_NV_BINDING` (NVIDIA#525) - Draft: Vendor in the IR module (NVIDIA#439) - pyproject.toml: add search path for Pyrefly (NVIDIA#524) - Vendor in numba.core.typing for CUDA-specific changes (NVIDIA#473) - Use numba.config when available, otherwise use numba.cuda.config (NVIDIA#497) - [MNT] Drop NUMBA_CUDA_USE_NVIDIA_BINDING; always use cuda.core and cuda.bindings as fallback (NVIDIA#479) - Vendor in dispatcher, entrypoints, pretty_annotate for CUDA-specific changes (NVIDIA#502) - build: allow parallelization of nvcc testing builds (NVIDIA#521) - chore(dev-deps): add pixi (NVIDIA#505) - Vendor the imputils module for CUDA refactoring (NVIDIA#448) - Don't use `MemoryLeakMixin` for tests that don't use NRT (NVIDIA#519) - Switch back to stable cuDF release in thirdparty tests (NVIDIA#518) - Updating .gitignore with binaries in the `testing` folder (NVIDIA#516) - Remove some unnecessary uses of ContextResettingTestCase (NVIDIA#507) - Vendor in _helperlib cext for CUDA-specific changes (NVIDIA#512) - Vendor in typeconv for future CUDA-specific changes (NVIDIA#499) - [Refactor][NFC] Vendor-in numba.cpython modules for future CUDA-specific changes (NVIDIA#493) - [Refactor][NFC] Vendor-in numba.np modules for future CUDA-specific changes (NVIDIA#494) - Make the CUDA target the default for CUDA overload decorators (NVIDIA#511) - Remove C extension loading hacks (NVIDIA#506) - Ensure NUMBA can manipulate memory from CUDA graphs before the graph is launched (NVIDIA#437) - [Refactor][NFC] Vendor-in core Numba analysis utils for CUDA-specific changes (NVIDIA#433) - Fix Bf16 Test OB Error (NVIDIA#509) - Vendor in components from numba.core.runtime for CUDA-specific changes (NVIDIA#498) - [Refactor] Vendor in _dispatcher, _devicearray, mviewbuf C extension for CUDA-specific customization (NVIDIA#373) - [MNT] Managed UM memset fallback and skip CUDA IPC tests on WSL2 (NVIDIA#488) - Improve debug value range coverage (NVIDIA#461) - Add `compile_all` API (NVIDIA#484) - Vendor in core.registry for CUDA-specific changes (NVIDIA#485) - [Refactor][NFC] Vendor in numba.misc for CUDA-specific changes (NVIDIA#457) - Vendor in optional, boxing for CUDA-specific changes, fix dangling imports (NVIDIA#476) - [test] Remove dependency on cpu_target (NVIDIA#490) - Change dangling imports of numba.core.lowering to numba.cuda.lowering (NVIDIA#475) - [test] Use numpy's tolerance for float16 (NVIDIA#491) - [Refactor][NFC] Vendor-in numba.extending for future CUDA-specific changes (NVIDIA#466) - [Refactor][NFC] Vendor-in more cpython registries for future CUDA-specific changes (NVIDIA#478)

copy-pr-bot · 2025-11-20T13:36:04Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

gmarkall · 2025-11-20T13:36:07Z

/ok to test

greptile-apps · 2025-11-20T13:38:35Z

Greptile Overview

Greptile Summary

This PR bumps the package version from 0.20.0 to 0.21.0, marking a new minor release. The change is a single-line update to the numba_cuda/VERSION file, which is the single source of truth for versioning in this project.

The version is dynamically read by the build system through numba_cuda/_version.py using importlib.resources, and referenced in both setup.py and pyproject.toml. This ensures consistent version information across the package.

According to the PR description, this release consolidates numerous features and improvements including:

Cache-hinted load/store operations
Enhanced third-party library testing
DWARF debug info improvements
Performance optimizations for kernel launches
Extensive refactoring and vendoring of Numba core modules
CI/CD improvements

Confidence Score: 5/5

This PR is safe to merge with minimal risk
Single-file version string update from 0.20.0 to 0.21.0 following semantic versioning. No logic changes, no potential for runtime errors. Version is properly read dynamically by build system with no hardcoded references elsewhere in codebase.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
numba_cuda/VERSION	5/5	Version bumped from 0.20.0 to 0.21.0 for new release

Sequence Diagram

sequenceDiagram
    participant Dev as Developer
    participant PR as PR #602
    participant File as numba_cuda/VERSION
    participant Build as Build System
    participant Package as Package Distribution
    
    Dev->>File: Update version string
    File-->>File: Change "0.20.0" to "0.21.0"
    Dev->>PR: Commit version bump
    PR->>Build: Trigger build process
    Build->>File: Read VERSION file
    File-->>Build: Return "0.21.0"
    Build->>Package: Set package version
    Note over Build,Package: setup.py and pyproject.toml<br/>read version dynamically<br/>from VERSION file
    Package-->>Package: Create distribution<br/>with version 0.21.0

greptile-apps

_{1 file reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

ashermancinelli

This is a monster of a release, thanks for all your work Graham!

gmarkall · 2025-11-20T14:47:52Z

@ashermancinelli Thanks for the review!

thanks for all your work Graham!

It's everybody else's (yourself included, of course)! I just review things and push buttons 🙂

rparolin · 2025-11-20T16:07:35Z

Great to see! Congrats to the team for all their hard work.
@gmarkall @ashermancinelli @atmnp @VijayKandiah

isVoid · 2025-11-20T19:22:40Z

Thanks! I'll start making updating Numbast's dependency with Numba-CUDA.
NVIDIA/numbast#239

gmarkall · 2025-11-21T08:20:53Z

Thanks! I'll start making updating Numbast's dependency with Numba-CUDA.

You might want to wait until the wheels publishing is fixed - I'm looking at this today.

greptile-apps bot reviewed Nov 20, 2025

View reviewed changes

ashermancinelli approved these changes Nov 20, 2025

View reviewed changes

gmarkall merged commit d08e8a9 into NVIDIA:main Nov 20, 2025
71 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bump version to 0.21.0 #602

Bump version to 0.21.0 #602

Uh oh!

gmarkall commented Nov 20, 2025

Uh oh!

copy-pr-bot bot commented Nov 20, 2025

Uh oh!

gmarkall commented Nov 20, 2025

Uh oh!

greptile-apps bot commented Nov 20, 2025

Uh oh!

greptile-apps bot left a comment

Uh oh!

ashermancinelli left a comment

Uh oh!

gmarkall commented Nov 20, 2025

Uh oh!

Uh oh!

rparolin commented Nov 20, 2025

Uh oh!

isVoid commented Nov 20, 2025

Uh oh!

gmarkall commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Bump version to 0.21.0 #602

Bump version to 0.21.0 #602

Uh oh!

Conversation

gmarkall commented Nov 20, 2025

Uh oh!

copy-pr-bot bot commented Nov 20, 2025

Uh oh!

gmarkall commented Nov 20, 2025

Uh oh!

greptile-apps bot commented Nov 20, 2025

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

ashermancinelli left a comment

Choose a reason for hiding this comment

Uh oh!

gmarkall commented Nov 20, 2025

Uh oh!

Uh oh!

rparolin commented Nov 20, 2025

Uh oh!

isVoid commented Nov 20, 2025

Uh oh!

gmarkall commented Nov 21, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants