Fix RayJob preemption and waitForPodsReady when using MultiKueue #6788

eric-higgins-ai · 2025-09-11T02:00:52Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

It seems like Kueue assumes that when a workload becomes suspended it will shut down and then become inactive. For RayJobs, this means .spec.suspend is set to true, the KubeRay operator deletes the cluster, sets .status.jobDeploymentStatus to Suspended, and then Kueue removes the reservation for the workload. However, when using MultiKueue, the KubeRay operator isn't in the loop in the manager cluster. This means the workload never becomes inactive after being preempted (given the definition of IsActive here) and so continues reserving quota in the cluster.

The same thing happens in any place where the workload is evicted in the manager cluster - the ones I'm aware of are preemption and waitForPodsReady, but there may be others.

Special notes for your reviewer:

I didn't add unit tests yet because I'm not very confident in this approach. I just wanted to check if this is the right approach and then I'll add unit tests

Does this PR introduce a user-facing change?

Fixed preemption and waitForPodsReady for RayJobs when using MultiKueue.

k8s-ci-robot · 2025-09-11T02:01:00Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: eric-higgins-ai
Once this PR has been reviewed and has the lgtm label, please assign gabesaba for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot · 2025-09-11T02:01:02Z

Hi @eric-higgins-ai. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

netlify · 2025-09-11T02:02:04Z

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Name	Link
🔨 Latest commit	`4409273`
🔍 Latest deploy log	https://app.netlify.com/projects/kubernetes-sigs-kueue/deploys/68c2573ad4d7240008044400

mbobrovskyi · 2025-09-11T02:08:47Z

/ok-to-test

mbobrovskyi · 2025-09-11T02:11:21Z

/cc @mszadkow

eric-higgins-ai · 2025-09-11T02:21:34Z

/retest

eric-higgins-ai · 2025-09-11T05:00:08Z

/retest

mimowo · 2025-09-11T05:30:02Z

Eric, so iiiuc the issue is with WaitForPodsReady enablement on the management cluster? I think this is something we likely didn't take into consideration, so likely a bug.

Before we dive into the fix, what is the current behaviour, is the waitForPodsReady ignored or it evicts, but the workload ends in a bad state?

Also is it a problem if the workload was already admitted to one of the worker clusters, or also when it stays pending on all clusters.

However, this seems to go beyond just RayJob and so I would prefer to look for a more generic solution.

eric-higgins-ai · 2025-09-11T05:49:26Z

@mimowo the problem arises when the manager cluster evicts a workload. When I wrote the PR I thought this could happen with either preemption or waitForPodsReady (there are probably other cases but these are the ones we hit in our usage of Kueue), but I just realized it's not actually correct to specify waitForPodsReady on the manager cluster. It's possible that the manager cluster admits a workload and dispatches it to a worker cluster, but the worker doesn't have sufficient quota so the workload remains in the queue there. If waitForPodsReady is set on the manager then the workload will be returned to the queue after the timeout, which is undesirable for us.

However, the problem this PR is intended to fix also arises when the manager preempts a workload, so it's still useful I think. In this case, the preempted workload gets the Preempted and Evicted conditions, and .spec.suspend gets set on the RayJob. It gets stuck at this point though - QuotaReserved is true, the workload is never deleted on the worker cluster, and the "preemptor" workload never gets admitted.

k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 11, 2025

k8s-ci-robot requested review from gabesaba and tenzen-y September 11, 2025 02:01

k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Sep 11, 2025

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Sep 11, 2025

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 11, 2025

k8s-ci-robot requested a review from mszadkow September 11, 2025 02:11

eric-higgins-ai added 2 commits September 10, 2025 19:18

Fix RayJob preemption when using MultiKueue

c33f493

fix build

c5b53ea

eric-higgins-ai force-pushed the fix-multikueue-manager-preemption branch from a6f433a to c5b53ea Compare September 11, 2025 02:21

fix test

4409273

eric-higgins-ai mentioned this pull request Sep 11, 2025

Fix preemption in MultiKueue worker cluster #6790

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix RayJob preemption and waitForPodsReady when using MultiKueue #6788

Fix RayJob preemption and waitForPodsReady when using MultiKueue #6788

eric-higgins-ai commented Sep 11, 2025 •

edited

Loading

Uh oh!

k8s-ci-robot commented Sep 11, 2025

Uh oh!

k8s-ci-robot commented Sep 11, 2025

Uh oh!

netlify bot commented Sep 11, 2025 •

edited

Loading

Uh oh!

mbobrovskyi commented Sep 11, 2025

Uh oh!

mbobrovskyi commented Sep 11, 2025

Uh oh!

eric-higgins-ai commented Sep 11, 2025

Uh oh!

eric-higgins-ai commented Sep 11, 2025

Uh oh!

mimowo commented Sep 11, 2025

Uh oh!

eric-higgins-ai commented Sep 11, 2025

Uh oh!

Uh oh!

Fix RayJob preemption and waitForPodsReady when using MultiKueue #6788

Are you sure you want to change the base?

Fix RayJob preemption and waitForPodsReady when using MultiKueue #6788

Conversation

eric-higgins-ai commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What type of PR is this?

What this PR does / why we need it:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

Uh oh!

k8s-ci-robot commented Sep 11, 2025

Uh oh!

k8s-ci-robot commented Sep 11, 2025

Uh oh!

netlify bot commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for kubernetes-sigs-kueue canceled.

Uh oh!

mbobrovskyi commented Sep 11, 2025

Uh oh!

mbobrovskyi commented Sep 11, 2025

Uh oh!

eric-higgins-ai commented Sep 11, 2025

Uh oh!

eric-higgins-ai commented Sep 11, 2025

Uh oh!

mimowo commented Sep 11, 2025

Uh oh!

eric-higgins-ai commented Sep 11, 2025

Uh oh!

Uh oh!

eric-higgins-ai commented Sep 11, 2025 •

edited

Loading

netlify bot commented Sep 11, 2025 •

edited

Loading