You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* fix: GPU memory calculation using incorrect unit conversion (#417)
* fix: GPU memory calculation using incorrect unit conversion
Fixes#416
The GPU memory validation was incorrectly converting MiB to MB and back,
causing the scheduler to use inflated memory values (48300 MB instead of
46068 MiB) when checking shared GPU allocations. This allowed pods to be
scheduled beyond the actual GPU memory capacity.
Changes:
- Use original MiB value from nvidia.com/gpu.memory node label directly
- Remove unnecessary MiB->MB->MiB conversion that caused precision loss
- Maintain the flooring logic for memory alignment but in MiB units
* Remove unused convertMibToMb function and explanatory comments
* Remove unused resource import after convertMibToMb cleanup
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <[email protected]>
* test: update GPU memory unit conversion test expectations
Fix test expectations after removing MiB to MB conversion.
Tests now correctly expect 4000 MiB instead of 4200 MB.
Fixes#416
* Format code with go fmt
---------
Co-authored-by: Ngô Quang Hòa <[email protected]>
Co-authored-by: Claude <[email protected]>
* Fix: incorrect scheduling decision and calculation when using MIG (#422)
* fix mig scheduling issue
* clean up the code
---------
Co-authored-by: bmao <[email protected]>
* update changelog
* fix proportion test
* fix concurrency test in v0.4
---------
Co-authored-by: Hoa Ngo <[email protected]>
Co-authored-by: Ngô Quang Hòa <[email protected]>
Co-authored-by: Claude <[email protected]>
Co-authored-by: Billy Mao <[email protected]>
Co-authored-by: bmao <[email protected]>
returngpuMemoryInMb- (gpuMemoryInMb%100), true// Floor the memory count to make sure its divided by 100 so there will not be 2 jobs that get same bytes
624
+
returngpuMemoryLabelValue- (gpuMemoryLabelValue%100), true// Floor the memory count to make sure its divided by 100 so there will not be 2 jobs that get same bytes
0 commit comments