Releases: NVIDIA/KAI-Scheduler
Releases · NVIDIA/KAI-Scheduler
v0.4.8
Fixed
- Queue order function now takes into account potential victims, resulting in better reclaim scenarios.
CHANGED
- Cached GetDeservedShare and GetFairShare function in the scheduler PodGroupInfo to improve performance.
- Added cache to the binder resource reservation client.
- More Caching and improvements to PodGroupInfo class.
- Update pod labels after scheduling decision concurrently in the background.
v0.5.0
What's Changed
- Fix typo in priority docs by @Sovietaced in #108
- Remove repeated active departments queue creation by @bgedik in #113
- Fix filename for stalegangeviction by @ArmedGuy in #115
- [Bugfix] Prep for queue comparison changes by @itsomri in #114
- Add/amend test logging to use .TestTopologyBasic.Name by @ArmedGuy in #116
- Made resource reservation parameters configurable by @romanbaron in #106
- Add Changelog to track changes by @itsomri in #117
- Changed to regular ubuntu docker image in tests by @romanbaron in #123
- Cluster autoscaler adjustment for GPU sharing pods by @romanbaron in #119
- added contributing, maintainer and owners files by @romanbaron in #74
New Contributors
- @Sovietaced made their first contribution in #108
- @bgedik made their first contribution in #113
- @ArmedGuy made their first contribution in #115
Full Changelog: v0.4.7...v0.5.0
v0.4.7
What's Changed
- fix: snapshot tool cache.Run call by @enoodle in #102
- Docs hotfix: Update and rename pytorch-elasitc.yaml to pytorch-elastic.yaml by @EkinKarabulut in #105
- Adding Issue Templates for bug & feature/enhancement requests by @EkinKarabulut in #103
- fix: gpu resource device count calculation by @enoodle in #107
Full Changelog: v0.4.6...v0.4.7
v0.4.6
What's Changed
- Initialize metrics namespace on scheduler run by @romanbaron in #100
Full Changelog: v0.4.5...v0.4.6
v0.4.5
v0.4.4
What's Changed
- Removed unneeded runai references in the code by @romanbaron in #83
- Bump k8s.io/kubernetes from 1.32.1 to 1.32.4 by @dependabot in #85
- Making scheduler metrics subsystem configurable by @romanbaron in #86
Full Changelog: v0.4.3...v0.4.4
v0.4.3
What's Changed
- Fixed package version expression resolution by @romanbaron in #80
Full Changelog: v0.4.2...v0.4.3
v0.4.2
What's Changed
- Add delayedLauncherCreationPolicy handeling to the pod-grouper by @davidLif in #75
- Bump golang.org/x/net from 0.37.0 to 0.38.0 by @dependabot in #76
- docs: specify legacy v1 version of kubeflow-training-operator by @tgasla in #71
- Add a simple DRA example by @guptaNswati in #59
- BUG fix - handle pod grouper top owner climbing RBAC error by @davidLif in #77
- Changed nvcr to ghcr in all workflows, scripts and docs by @romanbaron in #78
New Contributors
- @dependabot made their first contribution in #76
Full Changelog: v0.4.1...v0.4.2
v0.4.1
What's Changed
- Adding support channels to README.md by @EkinKarabulut in #64
- storing CI artifacts on ghcr.io by @romanbaron in #58
- Main- Bug fix - when syncing the inFlight PG status, override only if the t… by @davidLif in #70
- fix(status-updater): memory leak fix - remove unused inFlight PGs by @enoodle in #73
New Contributors
- @EkinKarabulut made their first contribution in #64
Full Changelog: v0.4.0...v0.4.1