-
Notifications
You must be signed in to change notification settings - Fork 432
Fix RayJob preemption and waitForPodsReady when using MultiKueue #6788
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix RayJob preemption and waitForPodsReady when using MultiKueue #6788
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: eric-higgins-ai The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hi @eric-higgins-ai. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
/ok-to-test |
/cc @mszadkow |
a6f433a
to
c5b53ea
Compare
/retest |
/retest |
Eric, so iiiuc the issue is with WaitForPodsReady enablement on the management cluster? I think this is something we likely didn't take into consideration, so likely a bug. Before we dive into the fix, what is the current behaviour, is the waitForPodsReady ignored or it evicts, but the workload ends in a bad state? Also is it a problem if the workload was already admitted to one of the worker clusters, or also when it stays pending on all clusters. However, this seems to go beyond just RayJob and so I would prefer to look for a more generic solution. |
@mimowo the problem arises when the manager cluster evicts a workload. When I wrote the PR I thought this could happen with either preemption or However, the problem this PR is intended to fix also arises when the manager preempts a workload, so it's still useful I think. In this case, the preempted workload gets the |
What type of PR is this?
/kind bug
What this PR does / why we need it:
It seems like Kueue assumes that when a workload becomes suspended it will shut down and then become inactive. For RayJobs, this means
.spec.suspend
is set totrue
, the KubeRay operator deletes the cluster, sets.status.jobDeploymentStatus
toSuspended
, and then Kueue removes the reservation for the workload. However, when using MultiKueue, the KubeRay operator isn't in the loop in the manager cluster. This means the workload never becomes inactive after being preempted (given the definition of IsActive here) and so continues reserving quota in the cluster.The same thing happens in any place where the workload is evicted in the manager cluster - the ones I'm aware of are preemption and waitForPodsReady, but there may be others.
Special notes for your reviewer:
I didn't add unit tests yet because I'm not very confident in this approach. I just wanted to check if this is the right approach and then I'll add unit tests
Does this PR introduce a user-facing change?