-
Notifications
You must be signed in to change notification settings - Fork 21
OCPBUGS-61756: ctrl: kubeletConfig: avoid updating ConfigMap on Paused MCP #2261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: shajmakh The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
92eaeed to
80910ba
Compare
80910ba to
29846b9
Compare
|
/hold |
Tal-or
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good change overall.
In case the MCP is paused, the final update to the configmap will be triggered when the MCP changes back to unpause?
29846b9 to
1e86d9a
Compare
|
/retest |
|
relies on #2324 |
|
depends on #2426 |
b413287 to
e133797
Compare
|
@shajmakh: This pull request references Jira Issue OCPBUGS-61756, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
…itions The NUMAResourcesScheduler object status condition never reports ObservedGeneration in any status conditions, unlike the scheduler object. Likewise the scheduler change, let's expose it in all the relevant conditions, doing some minimal internal refactoring along the way. Note that we can't test this change using controller tests due both the unique requirements of the operator and the way we (mis)use the controller tests; we should add checks in (existing?) e2e tests, but we postpone this to a later change. Signed-off-by: Francesco Romani <[email protected]>
The old version of the function was updating the condition inplace without considering whether the it's a base condition or not. The difference in base and non-base condition is that the status of one base condition affect all others, thus the rest would need an update too. The new version enhances this and splits the conditions into base and non-base ones and update all conditions accordingly using u/s SetStatusCondition, which also cares to avoid noisy updates. ref: https://github.com/kubernetes/apimachinery/blob/master/pkg/api/meta/conditions.go note that this peice is part of a larger enhancement that will soon be used in the NRO-controller to update the conditions there. Signed-off-by: Shereen Haj <[email protected]>
So far we had only base conditions for the NRO object, but we want to have a more flexible interface to interact with while supporting non-base conditions, just like we have for NRS controller. In this commit: 1. preserve the consistency of updating status conditions for numaresources CRs (operator and scheduler). 2. minimize Status.Update calls on degraded condition updates 3. keep using `conditioninfo` to maintain related commit modifications as much as possible, refactor later (switch to metav1 conditions) Signed-off-by: Shereen Haj <[email protected]>
Having a paused MCP should prevent updating the corresponding config map for the specified node group. So far, the code wasn't considering the case of paused MCPs, which lead to creating/updating the config map to the newest kubeletconfig CR updates,a thing that caused a mismatch between the configuration in the config map vs the one reflected on the NRTs. In this commit, we modify the kubeletconfig controller to handle paused MCPs such that it skips updating existing RTE config maps; and for new node groups whose MCP is paused, the controller will fetch the old machineConfig (before the pause) and creates RTE config map based on the decoded kubeletconfig data from it. Signed-off-by: Shereen Haj <[email protected]>
The situtaion is mainly mitigated in the kubeletconfig controller, however we have a bug in the NRO controller such that it puts the NRO CR in progressing state because paused MCP is not in updated condition. This commit ignores checking whether a paused MCP is up-to-date, and introduces a new status condition to report paused MCPs if exist. Signed-off-by: Shereen Haj <[email protected]>
|
@shajmakh: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
|
||
| func (em *ExistingManifests) MachineConfigsState(mf Manifests) ([]objectstate.ObjectState, MCPWaitForUpdatedFunc) { | ||
| func (em *ExistingManifests) MachineConfigsState(mf Manifests) ([]objectstate.ObjectState, MCPWaitForUpdatedFunc, sets.Set[string]) { | ||
| pausedMCPs := sets.New[string]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we may want to call this "mcpsToSkip" to include cases where the MCP is empty (solves https://issues.redhat.com/browse/OCPBUGS-52859)
Having a paused MCP should prevent updating/creating the corresponding
config map for the specified node group. So far, the code wasn't
considering the case of paused MCPs, which leads to creating/updating the config map
a thing that caused a mismatch between the configuration in the config
map and the one reflected on the NRTs.
In this commit, we modify the kubeletconfig controller to skip on
updating RTE config maps that belong to paused MCPs.