Releases: NVIDIA/KAI-Scheduler
v0.6.2
What's Changed
- Fix typo, formatting and grammar tweaks in MPS docs by @jacobtomlinson in #240
- ignore EINVAL when flushing logs by @slaupster in #238
New Contributors
- @jacobtomlinson made their first contribution in #240
v0.6.0 version introduces two breaking changes, for migration from previous releases please follow the instructions here:
https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/migrationguides
Full Changelog: v0.6.1...v0.6.2
v0.6.1
What's Changed
- Updated changelog to reflect v0.6.0 changes by @romanbaron in #241
- Changed queue-label-key default to kai.scheduler/queue in Queue Contr… by @romanbaron in #243
v0.6.0 version introduces two breaking changes, for migration from previous releases please follow the instructions here:
https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/migrationguides
Full Changelog: v0.6.0...v0.6.1
v0.6.0
What's Changed
- docs: improve snapshot tool doc by @enoodle in #157
- [Refactor] Reclaimable api improvements by @itsomri in #148
- add pr coverage report by @enoodle in #154
- scheduler: Add LastStartTimestamp to PodGroup by @ArmedGuy in #153
- Add min-runtime configuration to queues by @ArmedGuy in #155
- chore: coverage update will open pr by @enoodle in #159
- chore: add missing token in action by @enoodle in #160
- Document GPU Sharing with MPS by @omer-dayan in #158
- fix coverage pr reports and badge generation by @enoodle in #166
- fix update coverage badge by @enoodle in #171
- Run scenario filters on the no potential victims scenario by @davidLif in #164
- Update PodGrouper docs to match the latest implementation by @romanbaron in #177
- Prep changelog for v0.5 version branch by @itsomri in #174
- refactor out reclaimerinfo from API by @ArmedGuy in #172
- Update README.md by @ronendar in #175
- Pre creating binding request, delete any pending status updates for t… by @davidLif in #178
- Don't add nodepool label for empty nodepool by @itsomri in #176
- Scheduler and PodGrouper use configurable nodepool label key by @romanbaron in #179
- Update CONTRIBUTING.md with coverage suggestion by @enoodle in #180
- Updating README.md with biweekly meeting details by @EkinKarabulut in #181
- Use more peek and Fix for the implementation of popNextJob instead of… by @davidLif in #152
- Renamed internally used runai names by @romanbaron in #189
- [Refactor] Encapsulate reclaimer info in proportion plugin by @itsomri in #192
- deploy with snapshot plugin enabled by @enoodle in #203
- Keep updated pod-groups data in a separate syncmap to allow better cleanup by @davidLif in #199
- Changed PodGroup comparison and removed notToUpdateAnnotations by @romanbaron in #202
- binder cdi flag added by @christophemacabiau in #209
- remove redundant
replicasvalue for binder by @slaupster in #200 - Changing Slack channel link to the kai-scheduler channel by @EkinKarabulut in #211
- Fix pod group status sync unitests by @davidLif in #212
- Roman/podgroup controller by @romanbaron in #215
- Removed
runai-job-idandrunai/job-idannotations from pods and p… by @romanbaron in #206 - Easy runai name renames by @romanbaron in #218
- Scheduler status update unitest improvment by @davidLif in #220
- Refactor scenario validators by @itsomri in #191
- Default priority class per workload type, read from configmap by @natasharomm in #216
- Made node role label keys configurable by @romanbaron in #217
- Add "local build" mode to running the e2e over kind by @davidLif in #219
- minruntime Plugin by @ArmedGuy in #162
- add queue controller by @enoodle in #214
- Do not create error events for successfully scheduled podGroups by @davidLif in #229
- add queue controller tests by @enoodle in #231
- Allow the pod-grouper ray plugin to generate a pod-group for 0 workers in a rayCluster by @davidLif in #230
- Fix default_status_updater so Annotation updates are properly applied by @ArmedGuy in #234
- Unitest fix - TestDefaultStatusUpdater_RecordJobStatusEvent by @davidLif in #232
- Fix flaky consolidation e2e test: wait only for undeleted jobs by @itsomri in #235
- Vendor neutrality migration guide by @romanbaron in #224
- propagate scheduler service namespace for leader election in HA by @iris-shain-runai in #236
New Contributors
- @ronendar made their first contribution in #175
- @christophemacabiau made their first contribution in #209
- @slaupster made their first contribution in #200
- @natasharomm made their first contribution in #216
- @iris-shain-runai made their first contribution in #236
⚠️ Migration guides:
This version introduces two breaking changes, for migration from previous releases please follow the instructions here:
https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/migrationguides
Full Changelog: v0.5.1...v0.6.0
v0.5.5
v0.5.4
v0.4.10
v0.5.3
v0.4.9
What's Changed
- chrry-pick to v0.4: Changed to regular ubuntu docker image in tests (#123) by @enoodle in #188
- cherrypick v0.4 - Pre creating binding request, delete any pending status updates for t… by @davidLif in #186
- Don't add nodepol label for empty nodepool by @itsomri in #183
Full Changelog: v0.4.8...v0.4.9
v0.5.2
What's Changed
- docs: improve snapshot tool doc by @enoodle in #157
- [Refactor] Reclaimable api improvements by @itsomri in #148
- add pr coverage report by @enoodle in #154
- scheduler: Add LastStartTimestamp to PodGroup by @ArmedGuy in #153
- Add min-runtime configuration to queues by @ArmedGuy in #155
- chore: coverage update will open pr by @enoodle in #159
- chore: add missing token in action by @enoodle in #160
- Document GPU Sharing with MPS by @omer-dayan in #158
- fix coverage pr reports and badge generation by @enoodle in #166
- fix update coverage badge by @enoodle in #171
- Run scenario filters on the no potential victims scenario by @davidLif in #164
- Update PodGrouper docs to match the latest implementation by @romanbaron in #177
- Prep changelog for v0.5 version branch by @itsomri in #174
- Don't add nodepol label for empty nodepool by @itsomri in #182
Full Changelog: v0.5.1...v0.5.2
v0.5.1
What's Changed
- Scheduling gates by @itsomri in #122
- [Bugfix] Reclaim victim queue order by @itsomri in #62
- make nodeSelector, affinity and tolerations configurable by @gflatters in #127
- Replace run.ai string with kai.scheduler by @omer-dayan in #131
- Made scheduling queue label key configurable by @romanbaron in #129
- Cache GetDeservedShare and GetFairShare by @davidLif in #139
- refactor(binder): add cache to the resource reservation client by @enoodle in #144
- Remove requirement for Worker when using PyTorchJob by @Phlip79 in #149
- PodGroup info caching for some of the results by @davidLif in #138
- refactor(scheduler): patch pod labels concurrently by @enoodle in #147
- [Design] Minimum runtime before preemptions and reclaims by @ArmedGuy in #126
- scheduler: bugfix: Make pod_scenario_builder build scenarios for rest of elastic job by @ArmedGuy in #132
New Contributors
- @gflatters made their first contribution in #127
- @Phlip79 made their first contribution in #149
Full Changelog: v0.5.0...v0.5.1