Big mono repo performance improvments #5469
Replies: 3 comments 15 replies
-
|
Hey @UrosSimovic thanks for sharing your changes to Flux. I think this solution is tailor made for a very specific repo structure and Flux configuration which I don't think it belongs in upstream. At first glance I see some issues with this approach: It is very common for a Flux Kustomization to have Flux source-controller is no longer stateless. The cache must be stored in a persistent volume to survive restarts. Without a PVC, after a restart, source-controller removes the artifact from the status, which would make all Kustomizations reconcile once the new artifact is available. |
Beta Was this translation helpful? Give feedback.
-
|
An alternative solution that would not require modifying source-controller nor kustomize-controller would be using the External Artifact API. You would develop a source-transformer controller that watches the GitRepo pulling the monorepo. The transformer would calculate the digests of dirs and create/update ExternalArtifacts used by Flux Kustomizations. When a dir changes, only the Flux Kustomizations referencing the ExternalArtifact produced from that dir would reconcile. |
Beta Was this translation helpful? Give feedback.
-
|
With the introduction of the The |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'd like to start a discussion about the big mono repo performance issues and how we overcame it and if the solution makes sense to you.
The setup
Our 10 big production, staging, and sandbox k8s clusters in multiple regions and clouds are all tied to a single git repo (single
gitsourceresource). That monorepo contains everything a cluster needs. From namespaces, infra related resources (various policies, rbac, istio), as well as all the apps - whatever developers send through the app repos' CI/CD (there is an API sitting between the app CI and the monorepo).In every cluster there is a single
gitsourceresource, as well as a rootkustomizationresource that points to specific cluster related directory in that monorepo (cluster/<name>/...). It syncs cluster wide resources, namespaces and then per-appkustomizationresources that points to a different part in the monorepo (apps/<name>/...) as they can be deployed to multiple clusters (e.g. system apps, apps deployed to multiple regions, etc.).We are talking here 1500+ apps (or more precisely app environments).
The problem
A single change to the monorepo (e.g. a dev deploys the app, usually just a container image tag changes) will cause the
gitsourceto be updated (in our case via webhook). And whengitsourceis updated, all thekustomizationreffering to it will be reconciled as well.So a single deployment reconciles all our
kustomizationsreffering to that single git source. Just for a single app change. That makes it slow, consumes a lot of cpu (btw, we use sharding in kustomization controller), not to say that it smashed kubeapis in all our clusters. Checked the gitlog from yesterday where there was 1000+ commits to the monorepo (one commit is one deployment), syncing all the kustomizations would not work or we'd need a lot of cpu power here.Yes, maybe that is not the ideal setup, and app manifests should be deployed via their own app gitsources etc. But the thing is we create a lot more k8s resource manifests that we abstract away from devs, mostly Istio resources which you may know are hard to configure and can affect whole cluster and we simply can't leave its configuration to devs.
Nevertheless, that is the setup we started with and it has grown to the point we needed to do something.
The solution - directory change-set
I'd like to show you a solution we implemented we currently run in production, and works really well. And would like to know if it make sense to you, discuss it and maybe add it to the upstream code. The changes affect source (git) and kustomization controllers.
On source controller start, a tar archive of a git source (in our case that is the monorepo) is created, but alongside we save hash sums of all the directories (by hashing all the files in corresponding dir) and save it alongisde .tar.gz on the disk.
In the next iteration same is done, but now we can compare the new dir hash with previous one to determine all the directories that were changed (tweaked garbage collect to leave at least N-1 files on the disk). We save that dir changeset to to git source status. As we didn't want to change the git source CRD, we simply put it under artifact status (map[string]string)
.status.artifact.metadata["fluxcd.io/changeset"]by joining all the dirs that changed:Now that changeset is used by kustomization controller to reconcile the kusomizations that have configured path matching a dir from changeset. In that case everything is very efficient and fast with low average cpu consumption.
Please see the code with the implementation. It is not ideal, lacks comments, tests etc. But I guess you get the point from the whole thread what we want to achieve here. They are based on versions 1.4.
Now my question is, would something like this be considered useful for broader audience (I think yes, maybe as an opt-in) and if so, lets then discuss the implementation. We are willing to prepare the proper PRs with everything necessary, but just after we agree on the everything.
Thanks and have nice day!
Beta Was this translation helpful? Give feedback.
All reactions