-
Notifications
You must be signed in to change notification settings - Fork 1.5k
OCPBUGS-65893: CORS-4055: configure AWS SDK v2 clients with common config #10112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The SDK v2 introduces a new built-in client rate limiter [0], whose default settings break the install process when under heavy stress. For example, we have seen rate limit issues when runnning in CI with high number of AWS resources, especially IAM. Thus, we explicitly disable the rate limiter in the common client config, defined in pkg/asset/installconfig/aws/sessionv2.go. Any v2 client will need to use this config. References [0] https://docs.aws.amazon.com/sdk-for-go/v2/developer-guide/configure-retries-timeouts.html
|
@tthvo: This pull request references CORS-4055 which is a valid jira issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/cc @barbacbd @yunjiang29 @gpei |
barbacbd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: barbacbd The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Sample analysis: In ci/prow/e2e-aws-ovn-edge-zones, we can see that AWS API is under heavy load. The CCO is failing with IAM rate limiter, but the installer was able to completes it install. These errors are hard to detect, and mostly visible when high merge traffic near freeze window... I am surprised it didn't happen in August during 4.20. |
|
/retest Firing more tests to stress CI a bit more 😅 |
|
Okayy, if we compare the e2e runs here against other PRs... 👇 For example:
Here in this PR, the fix helped all jobs to pass successfully the step Ideally, we should enable rate limiter some day in the future with well-tuned parameters. For now, this is to preserve the behaviour of SDK v1. |
|
@tthvo: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/verified by @tthvo |
|
@sadasu: This PR has been marked as verified by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/lgtm |
|
/skip |
|
/jira refresh |
|
@tthvo: This pull request references Jira Issue OCPBUGS-65893, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@patrickdillon: This pull request references Jira Issue OCPBUGS-65893, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/override ci/prow/e2e-aws-ovn ci/prow/e2e-aws-ovn-edge-zones-manifest-validation |
|
@patrickdillon: Overrode contexts on behalf of patrickdillon: ci/prow/e2e-aws-ovn, ci/prow/e2e-aws-ovn-edge-zones-manifest-validation In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@tthvo: Jira Issue Verification Checks: Jira Issue OCPBUGS-65893 Jira Issue OCPBUGS-65893 has been moved to the MODIFIED state and will move to the VERIFIED state when the change is available in an accepted nightly payload. 🕓 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Description
The SDK v2 introduces a new built-in client rate limiter [0], whose default settings break the install process when under heavy stress. For example, we have seen rate limit issues when runnning in CI with high number of AWS resources, especially IAM.
Thus, we explicitly disable the rate limiter in the common client config, defined in
pkg/asset/installconfig/aws/sessionv2.go. Any v2 client will need to use this config.Background
While attempting to migrate the destroy code, we have run into rate limiting issues previously where IAM API calls are rate limited. We reverted that.
However, there are other IAM calls within the install path to getOrCreate IAM roles and instance profile. Thus, we need to make sure the IAM v2 client disables the rate limiter. Otherwise, we will run into the error such as: