fix: retain terminating pod in cache to prevent premature eviction#1719
fix: retain terminating pod in cache to prevent premature eviction#1719maishivamhoo123 wants to merge 2 commits intoProject-HAMi:masterfrom
Conversation
Signed-off-by: maishivamhoo123 <maishivamhoo@gmail.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: maishivamhoo123 The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Code Review
This pull request addresses issue #1368 by preventing terminating pods from being prematurely removed from the scheduler's cache. It introduces a new IsPodTerminating utility function and includes tests to verify the behavior. Feedback highlights that while the early return preserves the cache entry, it leaves the cached pod object in a stale state because the DeletionTimestamp is not updated, which may cause issues for logic expecting current pod metadata.
Codecov Report✅ All modified and coverable lines are covered by tests.
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Signed-off-by: maishivamhoo123 <maishivamhoo@gmail.com>
|
@archlitchi @Shouren @FouoF and @team can you please review this PR? |
|
@maishivamhoo123 could you tell me how this PR fixes the issue #1368? That issue seems to address that pod in pending state will not try to be scheduled again before around 5 minutes even after pod terminated and release some resources |
|
@archlitchi @Shouren Thank you for the review. Before this PR After this PR Pod A starts terminating and receives a DeletionTimestamp. |
The Fix:
This PR updates the scheduler to retain pods in the device cache while they are in the
Terminatingstate. The cache will now only evict the pod once the Kubelet fully reports it as terminated (reaching aSucceededorFailedphase).Fixes #1368
Verification / Testing Performed
make test(Tests forschedulerandutilpassed).scheduler_test.go.make verify.