-
Notifications
You must be signed in to change notification settings - Fork 414
fix: resolve NVIDIADriver stuck in NotReady on nodeSelector changes with OnDelete strategy #1868
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: resolve NVIDIADriver stuck in NotReady on nodeSelector changes with OnDelete strategy #1868
Conversation
…ith OnDelete strategy Signed-off-by: Karthik Vetrivel <[email protected]>
3abada7 to
c3b65e2
Compare
| if hash, ok := pod.Labels["controller-revision-hash"]; !ok || hash != dsRevisionHash { | ||
| // Pods have outdated revision - verify they're on nodes matching current nodeSelector | ||
| reqLogger.V(consts.LogLevelInfo).Info("Pods have outdated revision, verifying node placement") | ||
| return s.verifyNodePlacement(ctx, ds, ownedPods, reqLogger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is node placement is the actual thing we care about for "ready" status? If so, can we just not check for node placement regardless of revision hash on the pods?
I don't think we necessarily need to know if a pod was updated in a level-triggered reconciliation. We just need to periodically check if the final condition is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great point. I was originally looking at ways we can see if the pod is updated/not but that's not strictly required. I will look into updating this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this also applies to the UpdatedNumberScheduled check. I'm thinking the entire status check can be reduced to:
DesiredNumberScheduled == NumberAvailable, AND- Pods are placed on the correct nodes (or, this might be more precise:
each node selected by the nodeSelector has a pod scheduled on it)
cc @tariq1890 @cdesiniotis to validate if my assumptions are correct.
|
One issue with this change is that if the driver image spec is updated but the pods are still on the correct nodes, the DaemonSet is marked as Ready. Prior, the DaemonSet was marked as not ready. The only way I think we can get around this is distinguishing between placement (i.e. node selectory, taint tolerations) changes & workload changes (image, command/args, env variables). If a placement change was made but the pod placement is analogous to the prior policy, then we keep the DaemonSet as marked for ready. For all workload changes, we would mark the DaemonSet as not ready. |
|
I'm closing this PR. After discussion, it seems like this requires a larger change to nodeSelector / NVIDIADriver CR. |
Solves #1661.
Problem
When editing the
nodeSelectorfield of an NVIDIADriver CR, the resource enters a permanent NotReady state if the change doesn't result in pod updates (e.g., replacing equivalent labels). This causes infinite reconciliation loops.Root Cause
The readiness check required
UpdatedNumberScheduled == NumberAvailable, but with OnDelete update strategy, pods are never auto-updated even when already on correct nodes.New Logic Flow Diagram
How It Fixes the Bug
Before (Buggy Behavior)
After (Fixed Behavior)
The fix recognizes that for OnDelete strategy, outdated pod revisions are acceptable if pods are already on nodes matching the current nodeSelector. This indicates the change was nodeSelector-only and doesn't require pod recreation.