OCPBUGS-61756: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP #2261

shajmakh · 2025-10-09T12:23:08Z

Having a paused MCP should prevent updating/creating the corresponding
config map for the specified node group. So far, the code wasn't
considering the case of paused MCPs, which leads to creating/updating the config map
a thing that caused a mismatch between the configuration in the config
map and the one reflected on the NRTs.
In this commit, we modify the kubeletconfig controller to skip on
updating RTE config maps that belong to paused MCPs.

openshift-ci · 2025-10-09T12:23:23Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shajmakh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [shajmakh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

shajmakh · 2025-10-14T09:30:43Z

/hold

Tal-or

Good change overall.

In case the MCP is paused, the final update to the configmap will be triggered when the MCP changes back to unpause?

internal/reconcile/event.go

internal/reconcile/step.go

shajmakh · 2025-11-04T16:17:21Z

/retest

shajmakh · 2025-11-04T16:21:01Z

relies on #2324

shajmakh · 2025-11-05T07:43:44Z

depends on #2426

openshift-ci-robot · 2025-11-05T12:10:14Z

@shajmakh: This pull request references Jira Issue OCPBUGS-61756, which is invalid:

expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Having a paused MCP should prevent updating/creating the corresponding
config map for the specified node group. So far, the code wasn't
considering the case of paused MCPs, which leads to creating/updating the config map
a thing that caused a mismatch between the configuration in the config
map and the one reflected on the NRTs.
In this commit, we modify the kubeletconfig controller to skip on
updating RTE config maps that belong to paused MCPs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

…itions The NUMAResourcesScheduler object status condition never reports ObservedGeneration in any status conditions, unlike the scheduler object. Likewise the scheduler change, let's expose it in all the relevant conditions, doing some minimal internal refactoring along the way. Note that we can't test this change using controller tests due both the unique requirements of the operator and the way we (mis)use the controller tests; we should add checks in (existing?) e2e tests, but we postpone this to a later change. Signed-off-by: Francesco Romani <[email protected]>

The old version of the function was updating the condition inplace without considering whether the it's a base condition or not. The difference in base and non-base condition is that the status of one base condition affect all others, thus the rest would need an update too. The new version enhances this and splits the conditions into base and non-base ones and update all conditions accordingly using u/s SetStatusCondition, which also cares to avoid noisy updates. ref: https://github.com/kubernetes/apimachinery/blob/master/pkg/api/meta/conditions.go note that this peice is part of a larger enhancement that will soon be used in the NRO-controller to update the conditions there. Signed-off-by: Shereen Haj <[email protected]>

So far we had only base conditions for the NRO object, but we want to have a more flexible interface to interact with while supporting non-base conditions, just like we have for NRS controller. In this commit: 1. preserve the consistency of updating status conditions for numaresources CRs (operator and scheduler). 2. minimize Status.Update calls on degraded condition updates 3. keep using `conditioninfo` to maintain related commit modifications as much as possible, refactor later (switch to metav1 conditions) Signed-off-by: Shereen Haj <[email protected]>

Having a paused MCP should prevent updating the corresponding config map for the specified node group. So far, the code wasn't considering the case of paused MCPs, which lead to creating/updating the config map to the newest kubeletconfig CR updates,a thing that caused a mismatch between the configuration in the config map vs the one reflected on the NRTs. In this commit, we modify the kubeletconfig controller to handle paused MCPs such that it skips updating existing RTE config maps; and for new node groups whose MCP is paused, the controller will fetch the old machineConfig (before the pause) and creates RTE config map based on the decoded kubeletconfig data from it. Signed-off-by: Shereen Haj <[email protected]>

The situtaion is mainly mitigated in the kubeletconfig controller, however we have a bug in the NRO controller such that it puts the NRO CR in progressing state because paused MCP is not in updated condition. This commit ignores checking whether a paused MCP is up-to-date, and introduces a new status condition to report paused MCPs if exist. Signed-off-by: Shereen Haj <[email protected]>

openshift-ci · 2025-11-05T14:32:26Z

@shajmakh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/ci-must-gather-e2e	`e133797`	link	true	`/test ci-must-gather-e2e`
ci/prow/ci-e2e-compact	`e133797`	link	false	`/test ci-e2e-compact`
ci/prow/ci-e2e	`e133797`	link	true	`/test ci-e2e`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

shajmakh · 2025-11-06T07:51:23Z

pkg/objectstate/rte/machineconfigpool.go


-func (em *ExistingManifests) MachineConfigsState(mf Manifests) ([]objectstate.ObjectState, MCPWaitForUpdatedFunc) {
+func (em *ExistingManifests) MachineConfigsState(mf Manifests) ([]objectstate.ObjectState, MCPWaitForUpdatedFunc, sets.Set[string]) {
+	pausedMCPs := sets.New[string]()


we may want to call this "mcpsToSkip" to include cases where the MCP is empty (solves https://issues.redhat.com/browse/OCPBUGS-52859)

openshift-ci bot requested review from Tal-or and swatisehgal October 9, 2025 12:23

shajmakh changed the title ~~ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP~~ WIP: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP Oct 9, 2025

openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Oct 9, 2025

shajmakh force-pushed the kubeletconfig-fix branch from 92eaeed to 80910ba Compare October 13, 2025 09:08

shajmakh changed the title ~~WIP: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP~~ ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP Oct 13, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 13, 2025

shajmakh force-pushed the kubeletconfig-fix branch from 80910ba to 29846b9 Compare October 13, 2025 09:12

openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2025

Tal-or reviewed Oct 15, 2025

View reviewed changes

internal/reconcile/event.go Outdated Show resolved Hide resolved

internal/reconcile/step.go Show resolved Hide resolved

shajmakh force-pushed the kubeletconfig-fix branch from 29846b9 to 1e86d9a Compare November 4, 2025 16:14

shajmakh changed the title ~~ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP~~ WIP: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP Nov 4, 2025

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2025

shajmakh force-pushed the kubeletconfig-fix branch 4 times, most recently from b413287 to e133797 Compare November 5, 2025 12:08

shajmakh changed the title ~~WIP: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP~~ OCPBUGS-61756: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP Nov 5, 2025

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 5, 2025

ffromani and others added 4 commits November 5, 2025 14:11

shajmakh commented Nov 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OCPBUGS-61756: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP #2261

OCPBUGS-61756: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP #2261

Uh oh!

shajmakh commented Oct 9, 2025 •

edited

Loading

Uh oh!

openshift-ci bot commented Oct 9, 2025

Uh oh!

shajmakh commented Oct 14, 2025

Uh oh!

Tal-or left a comment

Uh oh!

Uh oh!

Uh oh!

shajmakh commented Nov 4, 2025

Uh oh!

shajmakh commented Nov 4, 2025

Uh oh!

shajmakh commented Nov 5, 2025

Uh oh!

openshift-ci-robot commented Nov 5, 2025

Uh oh!

openshift-ci bot commented Nov 5, 2025

Uh oh!

shajmakh Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

OCPBUGS-61756: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP #2261

Are you sure you want to change the base?

OCPBUGS-61756: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP #2261

Uh oh!

Conversation

shajmakh commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Oct 9, 2025

Uh oh!

shajmakh commented Oct 14, 2025

Uh oh!

Tal-or left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

shajmakh commented Nov 4, 2025

Uh oh!

shajmakh commented Nov 4, 2025

Uh oh!

shajmakh commented Nov 5, 2025

Uh oh!

openshift-ci-robot commented Nov 5, 2025

Uh oh!

openshift-ci bot commented Nov 5, 2025

Uh oh!

shajmakh Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

shajmakh commented Oct 9, 2025 •

edited

Loading