Skip to content

Conversation

@enoodle
Copy link
Collaborator

@enoodle enoodle commented Aug 25, 2025

No description provided.

eveningcafe and others added 5 commits August 25, 2025 09:31
* fix: GPU memory calculation using incorrect unit conversion

Fixes #416

The GPU memory validation was incorrectly converting MiB to MB and back,
causing the scheduler to use inflated memory values (48300 MB instead of
46068 MiB) when checking shared GPU allocations. This allowed pods to be
scheduled beyond the actual GPU memory capacity.

Changes:
- Use original MiB value from nvidia.com/gpu.memory node label directly
- Remove unnecessary MiB->MB->MiB conversion that caused precision loss
- Maintain the flooring logic for memory alignment but in MiB units

* Remove unused convertMibToMb function and explanatory comments

* Remove unused resource import after convertMibToMb cleanup

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>

* test: update GPU memory unit conversion test expectations

Fix test expectations after removing MiB to MB conversion.
Tests now correctly expect 4000 MiB instead of 4200 MB.

Fixes #416

* Format code with go fmt

---------

Co-authored-by: Ngô Quang Hòa <[email protected]>
Co-authored-by: Claude <[email protected]>
* fix mig scheduling issue

* clean up the code

---------

Co-authored-by: bmao <[email protected]>
@enoodle enoodle force-pushed the erez/v0.4/cherry-pick-mig-and-momory-fixes branch from d9f78f2 to 47df9b7 Compare August 26, 2025 10:16
@enoodle enoodle merged commit dacb465 into v0.4 Aug 26, 2025
3 checks passed
@enoodle enoodle deleted the erez/v0.4/cherry-pick-mig-and-momory-fixes branch August 26, 2025 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants