Skip to content

Conversation

@shajmakh
Copy link
Member

@shajmakh shajmakh commented Oct 9, 2025

Having a paused MCP should prevent updating/creating the corresponding
config map for the specified node group. So far, the code wasn't
considering the case of paused MCPs, which leads to creating/updating the config map
a thing that caused a mismatch between the configuration in the config
map and the one reflected on the NRTs.
In this commit, we modify the kubeletconfig controller to skip on
updating RTE config maps that belong to paused MCPs.

@openshift-ci openshift-ci bot requested review from Tal-or and swatisehgal October 9, 2025 12:23
@shajmakh shajmakh changed the title ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP WIP: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP Oct 9, 2025
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Oct 9, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: shajmakh

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. labels Oct 9, 2025
@shajmakh shajmakh changed the title WIP: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP Oct 13, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 13, 2025
@shajmakh
Copy link
Member Author

/hold

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 14, 2025
Copy link
Collaborator

@Tal-or Tal-or left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good change overall.

In case the MCP is paused, the final update to the configmap will be triggered when the MCP changes back to unpause?

@shajmakh
Copy link
Member Author

shajmakh commented Nov 4, 2025

/retest

@shajmakh shajmakh changed the title ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP WIP: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP Nov 4, 2025
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 4, 2025
@shajmakh
Copy link
Member Author

shajmakh commented Nov 4, 2025

relies on #2324

@shajmakh
Copy link
Member Author

shajmakh commented Nov 5, 2025

depends on #2426

@shajmakh shajmakh force-pushed the kubeletconfig-fix branch 4 times, most recently from b413287 to e133797 Compare November 5, 2025 12:08
@shajmakh shajmakh changed the title WIP: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP OCPBUGS-61756: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP Nov 5, 2025
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 5, 2025
@openshift-ci-robot
Copy link

@shajmakh: This pull request references Jira Issue OCPBUGS-61756, which is invalid:

  • expected the bug to target the "4.21.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Having a paused MCP should prevent updating/creating the corresponding
config map for the specified node group. So far, the code wasn't
considering the case of paused MCPs, which leads to creating/updating the config map
a thing that caused a mismatch between the configuration in the config
map and the one reflected on the NRTs.
In this commit, we modify the kubeletconfig controller to skip on
updating RTE config maps that belong to paused MCPs.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

ffromani and others added 4 commits November 5, 2025 14:11
…itions

The NUMAResourcesScheduler object status condition never reports
ObservedGeneration in any status conditions, unlike the scheduler
object.

Likewise the scheduler change, let's expose it in all the relevant
conditions, doing some minimal internal refactoring along the way.

Note that we can't test this change using controller tests due both the
unique requirements of the operator and the way we (mis)use the
controller tests; we should add checks in (existing?) e2e tests, but we
postpone this to a later change.

Signed-off-by: Francesco Romani <[email protected]>
The old version of the function was updating the condition inplace
without considering whether the it's a base condition or not.
The difference in base and non-base condition is that the status of one
base condition affect all others, thus the rest would need an update too.

The new version enhances this and splits the conditions into base and
non-base ones and update all conditions accordingly using u/s SetStatusCondition,
 which also cares to avoid noisy updates.
ref:
https://github.com/kubernetes/apimachinery/blob/master/pkg/api/meta/conditions.go

note that this peice is part of a larger enhancement that will soon
be used in the NRO-controller to update the conditions there.

Signed-off-by: Shereen Haj <[email protected]>
So far we had only base conditions for the NRO object, but we want to
have a more flexible interface to interact with while supporting
non-base conditions, just like we have for NRS controller.
In this commit:
1. preserve the consistency of updating status conditions for
   numaresources CRs (operator and scheduler).
2. minimize Status.Update calls on degraded condition updates
3. keep using `conditioninfo` to maintain related commit modifications
   as much as possible, refactor later (switch to metav1 conditions)

Signed-off-by: Shereen Haj <[email protected]>
Having a paused MCP should prevent updating the corresponding
config map for the specified node group. So far, the code wasn't
considering the case of paused MCPs, which lead to creating/updating the config map
to the newest kubeletconfig CR updates,a thing that caused a mismatch
 between the configuration in the config map vs the one reflected on the NRTs.
In this commit, we modify the kubeletconfig controller to handle paused
MCPs such that it skips updating existing RTE config maps; and for new node
groups whose MCP is paused, the controller will fetch the old
machineConfig (before the pause) and creates RTE config map based on the
decoded kubeletconfig data from it.

Signed-off-by: Shereen Haj <[email protected]>
The situtaion is mainly mitigated in the kubeletconfig controller,
however we have a bug in the NRO controller such that it puts the NRO CR
in progressing state because paused MCP is not in updated condition.
This commit ignores checking whether a paused MCP is up-to-date, and
introduces a new status condition to report paused MCPs if exist.

Signed-off-by: Shereen Haj <[email protected]>
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 5, 2025

@shajmakh: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/ci-must-gather-e2e e133797 link true /test ci-must-gather-e2e
ci/prow/ci-e2e-compact e133797 link false /test ci-e2e-compact
ci/prow/ci-e2e e133797 link true /test ci-e2e

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.


func (em *ExistingManifests) MachineConfigsState(mf Manifests) ([]objectstate.ObjectState, MCPWaitForUpdatedFunc) {
func (em *ExistingManifests) MachineConfigsState(mf Manifests) ([]objectstate.ObjectState, MCPWaitForUpdatedFunc, sets.Set[string]) {
pausedMCPs := sets.New[string]()
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want to call this "mcpsToSkip" to include cases where the MCP is empty (solves https://issues.redhat.com/browse/OCPBUGS-52859)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants