-
Notifications
You must be signed in to change notification settings - Fork 69
OCPBUGS-59251: Default cloud.conf if no configmap is found #404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Skipping CI for Draft Pull Request. |
@nrb: This pull request references Jira Issue OCPBUGS-59251, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/jira refresh |
@nrb: This pull request references Jira Issue OCPBUGS-59251, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This is not currently passing units, hence draft status. |
case configv1.AzurePlatformType, | ||
configv1.GCPPlatformType, | ||
configv1.VSpherePlatformType, | ||
configv1.IBMCloudPlatformType, | ||
configv1.PowerVSPlatformType, | ||
configv1.OpenStackPlatformType, | ||
configv1.NutanixPlatformType: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does removing this mean that we will now sync a metal platform configmap? Or does the sync controller just not run at all on metal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does run on metal, and removing this caused their tests to fail. I've added it back and simply moved AWS's entry up. This way we're not changing behavior on platforms that relied on it before.
|
||
// If there was no relevant data in the configmap, make a default entry for it. | ||
cloudConfCm.Data[defaultConfigKey] = "" | ||
|
||
return cloudConfCm, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this one. If the user has specced a custom key and that custom key isn't there, we probably still want to error
I assume our fallback detects an that neither the managed or unmanaged confs exist, so creates a blank entry which exists this function on L185? So this change shouldn't be needed?
OpenShift clusters prior to v4.14 did not explicitly need a cloud.conf file for AWS. In v.4.19, they do. This change makes sure that there is a minimally usable default cloud configuration present. Signed-off-by: Nolan Brubaker <[email protected]>
if len(source) == 0 { | ||
return nil, fmt.Errorf("empty INI file") | ||
source = defaultConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm torn on this - should we expand the CloudConfigTransformer
interface to provide the defaults? Or is it enough for each platform to set their minimum requirements during parsing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect each platform can set their minimum requirements? Can they each handle an empty file? Is that something we need to add testing for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or is it enough for each platform to set their minimum requirements during parsing?
I think for now, this is likely fine
@nrb: This pull request references Jira Issue OCPBUGS-59251, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@nrb: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@@ -188,40 +190,51 @@ func (r *CloudConfigReconciler) isCloudConfigSyncNeeded(platformStatus *configv1 | |||
return false, fmt.Errorf("platformStatus is required") | |||
} | |||
switch platformStatus.Type { | |||
case configv1.AzurePlatformType, | |||
case configv1.AWSPlatformType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's with the reordering here? Not sure I'm following this..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it is not just a reordering, the AWSPlatform now coincides in behaviour with all the other ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dam's correct - AWS is now unconditionally trying to sync. Any platform not specifically called out in the list, like bare metal, will not sync.
Expect(cl.Status().Update(ctx, infraResource.DeepCopy())).To(Succeed()) | ||
|
||
_, err := reconciler.Reconcile(context.TODO(), ctrl.Request{}) | ||
Expect(err).To(Not(BeNil())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we wanna expect a specific error - namely "key %s specified in infra resource does not exist in source configmap %s",
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Left a couple of thoughts about some logic conditions
@@ -188,40 +190,51 @@ func (r *CloudConfigReconciler) isCloudConfigSyncNeeded(platformStatus *configv1 | |||
return false, fmt.Errorf("platformStatus is required") | |||
} | |||
switch platformStatus.Type { | |||
case configv1.AzurePlatformType, | |||
case configv1.AWSPlatformType, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it is not just a reordering, the AWSPlatform now coincides in behaviour with all the other ones
openshiftUnmanagedCMKey := client.ObjectKey{ | ||
Name: infra.Spec.CloudConfig.Name, | ||
Namespace: OpenshiftConfigNamespace, | ||
} | ||
if err := r.Get(ctx, openshiftUnmanagedCMKey, sourceCM); err != nil { | ||
if err := r.Get(ctx, openshiftUnmanagedCMKey, sourceCM); errors.IsNotFound(err) { | ||
klog.Warningf("unmanaged cloud-config is not found, falling back to default cloud config.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unmanaged cloud-config
It looks like above this is referred to as the infrastructure config IIUC
Here we refer to it as unmanaged cloud-config
What is the least surprising name we can use in the logs to refer to this?
Also judging from the code above the default cloud config
is an empty one right?
Should we bring that up in the log?
if err := r.Get(ctx, openshiftUnmanagedCMKey, sourceCM); err != nil { | ||
if err := r.Get(ctx, openshiftUnmanagedCMKey, sourceCM); errors.IsNotFound(err) { | ||
klog.Warningf("unmanaged cloud-config is not found, falling back to default cloud config.") | ||
} else if err != nil { | ||
klog.Errorf("unable to get cloud-config for sync") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like we swallow the error here (I know we were already doing that), does it make sense to propagate it?
klog.Errorf("unable to get cloud-config for sync") | ||
if err := r.setDegradedCondition(ctx); err != nil { | ||
return ctrl.Result{}, fmt.Errorf("failed to set conditions for cloud config controller: %v", err) | ||
} | ||
return ctrl.Result{}, err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also with this change it now means that if we enter this else if err != nil
branch there was an error in getting the "unmanaged" cloud-config and that error is not the IsNotFound one.
So we should keep retrying to get it (likely it is a transient error e.g. network hiccup or similar) rather than just continuing with the "default" and empty cloud-config, as my understanding is that we should only be using the empty one when one doesn't exist.
func (r *CloudConfigReconciler) prepareSourceConfigMap(source *corev1.ConfigMap, infra *configv1.Infrastructure) (*corev1.ConfigMap, error) { | ||
cloudConfCm := source.DeepCopy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can source ever be nil? Should we check it here again for good measure in case the code changes as this is implied?
// If there was no data in the configmap and the user didn't specify their own make a default entry for it. | ||
cloudConfCm.Data[defaultConfigKey] = "" | ||
|
||
return cloudConfCm, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like it is possible to end up here even if the user specified an infraConfigKey
and the len(cloudConfCm.Data) == 0
, so either we need to fix up the comment at line 234, or we need to account for this extra case in the logic.
4.20's [1,2] are still Post as the potential fix is discussed. And once the fix lands in 4.20/dev branches, it would still need backporting to the 4.19 branch. [1]: https://issues.redhat.com/browse/OCPBUGS-59251 [2]: openshift/cluster-cloud-controller-manager-operator#404
Prior to 4.14, OCP's installer did not provide any cloud configuration in the
Infrastructure
resource. This was ok because nothing relied on it.However in 4.19, we introduced a hard requirement for the cloud configuration on AWS.
This change allows us to gracefully handle missing information in
Infrastructure
by providing a minimal defaultcloud.conf
on AWS.