-
Notifications
You must be signed in to change notification settings - Fork 294
add resource limits and requests for pipelines staging #7262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add resource limits and requests for pipelines staging #7262
Conversation
components/pipeline-service/staging/base/main-pipeline-service-configuration.yaml
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in the comment, most of those values feel a bit low to me. I checked the actual usage in the last 2 days and they seem to be around x2 the normal load. Was that your target? I think we could be a bit more "generous", especially on the cpu side.
As a side note, there are some pods which are hardly or not at all used. For instance - pipelines-console-plugin, tekton-triggers-core-interceptors, tkn-cli-serve and there could be more. Maybe a better approach to conserve resources is to remove all unused pods. This should be a separate change and it should go through the architecture committee to approve that.
- name: tekton-chains-controller | ||
resources: | ||
limits: | ||
cpu: 500m |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems low. I see some peaks on one of the clusters going above 800m, so probably 1000m will be a better value.
And in general, it feels most value a bit conservative and we should be able to withstand peaks x2, x3 or even x5 above the normal load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. My reasoning here was generally ~1.5-2x highest spike I saw, rounded to a friendly number. I must have missed this when looking at CPU though. I've revised the numbers now
71a401c
to
81a5c19
Compare
It looks like you did not run the generate script after the latest updates, so the deploy.yaml files are out of sync with the base. |
81a5c19
to
fe4708c
Compare
D'oh. Thanks for catching. Fixed |
/test konflux-e2e-v416-optional |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED Approval requirements bypassed by manually added approval. This pull-request has been approved by: aThorp96, enarha The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
5d3dcda
into
redhat-appstudio:main
Values calculated based on the resources consumed by pods in the past 7 days in stone-prd-rh01, stone-prod-p01, and stone-prod-p02, as seen in this grafana query:
max by(deployment) (label_replace(container_memory_working_set_bytes{namespace="openshift-pipelines", container=""}, "deployment", "$1", "pod", "([-a-z]+)-.*"))
Screenshot of RH01:
