Big mono repo performance improvments #5469

UrosSimovic · 2025-08-05T09:56:06Z

UrosSimovic
Aug 5, 2025

I'd like to start a discussion about the big mono repo performance issues and how we overcame it and if the solution makes sense to you.

The setup

Our 10 big production, staging, and sandbox k8s clusters in multiple regions and clouds are all tied to a single git repo (single gitsource resource). That monorepo contains everything a cluster needs. From namespaces, infra related resources (various policies, rbac, istio), as well as all the apps - whatever developers send through the app repos' CI/CD (there is an API sitting between the app CI and the monorepo).

In every cluster there is a single gitsource resource, as well as a root kustomization resource that points to specific cluster related directory in that monorepo (cluster/<name>/...). It syncs cluster wide resources, namespaces and then per-app kustomization resources that points to a different part in the monorepo (apps/<name>/...) as they can be deployed to multiple clusters (e.g. system apps, apps deployed to multiple regions, etc.).

We are talking here 1500+ apps (or more precisely app environments).

The problem

A single change to the monorepo (e.g. a dev deploys the app, usually just a container image tag changes) will cause the gitsource to be updated (in our case via webhook). And when gitsource is updated, all the kustomization reffering to it will be reconciled as well.

So a single deployment reconciles all our kustomizations reffering to that single git source. Just for a single app change. That makes it slow, consumes a lot of cpu (btw, we use sharding in kustomization controller), not to say that it smashed kubeapis in all our clusters. Checked the gitlog from yesterday where there was 1000+ commits to the monorepo (one commit is one deployment), syncing all the kustomizations would not work or we'd need a lot of cpu power here.

Yes, maybe that is not the ideal setup, and app manifests should be deployed via their own app gitsources etc. But the thing is we create a lot more k8s resource manifests that we abstract away from devs, mostly Istio resources which you may know are hard to configure and can affect whole cluster and we simply can't leave its configuration to devs.

Nevertheless, that is the setup we started with and it has grown to the point we needed to do something.

The solution - directory change-set

I'd like to show you a solution we implemented we currently run in production, and works really well. And would like to know if it make sense to you, discuss it and maybe add it to the upstream code. The changes affect source (git) and kustomization controllers.

On source controller start, a tar archive of a git source (in our case that is the monorepo) is created, but alongside we save hash sums of all the directories (by hashing all the files in corresponding dir) and save it alongisde .tar.gz on the disk.

In the next iteration same is done, but now we can compare the new dir hash with previous one to determine all the directories that were changed (tweaked garbage collect to leave at least N-1 files on the disk). We save that dir changeset to to git source status. As we didn't want to change the git source CRD, we simply put it under artifact status (map[string]string) .status.artifact.metadata["fluxcd.io/changeset"] by joining all the dirs that changed:

kind: GitRepository
metadata:
  name: cluster
  namespace: flux-system
spec:
  interval: 10m
  ref:
    branch: main
  secretRef:
    name: git-ssh
  timeout: 60s
  url: ssh://[email protected]/clusters.git
status:
  artifact:
    digest: sha256:cd3ea9ffb8bbedc7fb0468a78afcaab0daf3bf111d7a84ee74c6f75ac1bf2b96
    lastUpdateTime: "2025-08-05T08:20:20Z"
    path: gitrepository/flux-system/cluster/REVISON.tar.gz
    revision: main@sha1:REVISON
    size: 4419288
    url: http://source-controller.flux-system.svc.cluster.local./gitrepository/flux-system/cluster-catalog/REVISON.tar.gz
    metadata:
      fluxcd.io/changeset: clusters/cluster/namespace/app|apps/app1/production/main|apps/app2/staging/main

Now that changeset is used by kustomization controller to reconcile the kusomizations that have configured path matching a dir from changeset. In that case everything is very efficient and fast with low average cpu consumption.

Please see the code with the implementation. It is not ideal, lacks comments, tests etc. But I guess you get the point from the whole thread what we want to achieve here. They are based on versions 1.4.

Now my question is, would something like this be considered useful for broader audience (I think yes, maybe as an opt-in) and if so, lets then discuss the implementation. We are willing to prepare the proper PRs with everything necessary, but just after we agree on the everything.

Thanks and have nice day!

stefanprodan · 2025-08-05T11:25:54Z

stefanprodan
Aug 5, 2025
Maintainer

Hey @UrosSimovic thanks for sharing your changes to Flux. I think this solution is tailor made for a very specific repo structure and Flux configuration which I don't think it belongs in upstream.

At first glance I see some issues with this approach:

It is very common for a Flux Kustomization to have .spec.path pointing to an overlay that includes resources from other paths that don't have a common parent. In this case, the Flux Kustomizations will not reconcile.

Flux source-controller is no longer stateless. The cache must be stored in a persistent volume to survive restarts. Without a PVC, after a restart, source-controller removes the artifact from the status, which would make all Kustomizations reconcile once the new artifact is available.

14 replies

stefanprodan Aug 5, 2025
Maintainer

This should purely be an optimization where the kustomize reconciler realizes it has no work to do.

There is no way to reliably do this without building the manifests, even if nothing changes in the source, the resulting manifests can differ due to remote bases or Flux var sub.

IMO users of monorepos should adopt Gitless and push Flux OCI Artifacts per app/tenant/team/etc. The D2 reference architecture we published this year solves the challenges with scaling Flux to thousands of apps all originating from the same repo. https://fluxcd.control-plane.io/guides/d2-architecture-reference/

UrosSimovic Aug 5, 2025
Author

There is no way to reliably do this without building the manifests, even if nothing changes in the source, the resulting manifests can differ due to remote bases or Flux var sub.

That does not affect remote bases, they will no be reconciled even now as the source will stay at the same revision. In that case only way to get remote changes is to wait for interval or reconcile it manually.

recording the dependent changesets after downloading the source initially in the kustomization status could prevent having to download new changes on the next kustomization if source-controller is publishing the changesets like you have implemented.

Aha! In that case I agree and see the point where kustomization would still be reconciled to the latest revision, just the easy way not smashing kubeapi. I like that!

stefanprodan Aug 5, 2025
Maintainer

If the ConfigMap/Secret with post build substitutions changes in the repo, a change in the revision will trigger a reconcile, Flux will pick up the new variables, will run the build and the resulting resources will change, even if nothing changes in the path or in the kustomize base.

UrosSimovic Aug 5, 2025
Author

Other ways to approach this problem would be to provide some sort of explicit ignore-files field in Kustomization, or to provide some sort of CLI that generates sources with .spec.ignore to split up a gitrepo per each kustomizations' expected dependent files.
I honestly don't know how well this would work either from a performance perspective.

Our first setup improvment was done like that. As said at the beginning we have api sitting between cicd and the monorepo. And creating a git source per app env kustomization with proper ignore field was easy. The only change we did in code was one in the notification controller for webhooks to reconcile just the git sources the files changed (gitlab sends the changeset in the webhook payload).

That worked nice most of time, but sometimes it happened that it wanted to reconcile all those git sources at the same time even if interval was set to over 9000h and the only trigger should be webhook (or I am missing smth?). Of course it was set up with persistent volume. That was some bug IMO that should not happen, but was hard to investigate as it happened just from time to time. Meaning fetching the same monorepo for every git source in the cluster. Smashed a bit our gitlab, but more problematic was the time it needed to finish everything - for those 30 min we locked the cluster for new deployment.

And the other problem was similar to the first. Doing that changes to the whole monorepo (e.g. adding some labels to all our deployments). All of those git sources pointing to the same monorpo needed to be reconciled. At least was predictive and we usually did if off hours.

UrosSimovic Aug 5, 2025
Author

If the ConfigMap/Secret with post build substitutions changes in the repo, a change in the revision will trigger a reconcile, Flux will pick up the new variables, will run the build and the resulting resources will change, even if nothing changes in the path or in the kustomize base.

I may not understand that case. Flux will reconcile the secret/config, but not the kustomization referring to it for post build susbst.? So you have one kustomization that syncs the configmap, and the other kustomization that refers to it in the post build substitution. I guess you need at least two kustomization for such case, as single kustomization cannot reconcile the resource it depends on it.

What then happens when you have two different sources, one for configmap, another one with post build substitution, and only one with configmap changes? Nothing, you need to reconcile it manually or wait for interval. Or set the dependsOn. But would probably be better to watch for configmap/secret changes, which would work when they are comming from outside the flux world (some secret manager, cert manager, etc.).

stefanprodan · 2025-08-05T20:06:15Z

stefanprodan
Aug 5, 2025
Maintainer

An alternative solution that would not require modifying source-controller nor kustomize-controller would be using the External Artifact API. You would develop a source-transformer controller that watches the GitRepo pulling the monorepo. The transformer would calculate the digests of dirs and create/update ExternalArtifacts used by Flux Kustomizations. When a dir changes, only the Flux Kustomizations referencing the ExternalArtifact produced from that dir would reconcile.

1 reply

UrosSimovic Aug 5, 2025
Author

Thx, I will look into it. As well as the D2 reference architecture you posted above.

stefanprodan · 2025-09-08T10:56:24Z

stefanprodan
Sep 8, 2025
Maintainer

With the introduction of the ExternalArtifact API in RFC-0012 we can now offer a way to decompose monorepos into separate artifacts.

The ArtifactGenerator API developed in source-watcher#255 offers a solution to you issues. Please take a look, feedback welcomed!

0 replies

Big mono repo performance improvments #5469

Uh oh!

Uh oh!

UrosSimovic Aug 5, 2025

The setup

The problem

The solution - directory change-set

Replies: 3 comments · 15 replies

Uh oh!

stefanprodan Aug 5, 2025 Maintainer

Uh oh!

Uh oh!

stefanprodan Aug 5, 2025 Maintainer

Uh oh!

UrosSimovic Aug 5, 2025 Author

Uh oh!

Uh oh!

stefanprodan Aug 5, 2025 Maintainer

Uh oh!

UrosSimovic Aug 5, 2025 Author

Uh oh!

UrosSimovic Aug 5, 2025 Author

Uh oh!

Uh oh!

stefanprodan Aug 5, 2025 Maintainer

Uh oh!

UrosSimovic Aug 5, 2025 Author

Uh oh!

Uh oh!

stefanprodan Sep 8, 2025 Maintainer

UrosSimovic
Aug 5, 2025

Replies: 3 comments 15 replies

stefanprodan
Aug 5, 2025
Maintainer

stefanprodan Aug 5, 2025
Maintainer

UrosSimovic Aug 5, 2025
Author

stefanprodan Aug 5, 2025
Maintainer

UrosSimovic Aug 5, 2025
Author

UrosSimovic Aug 5, 2025
Author

stefanprodan
Aug 5, 2025
Maintainer

UrosSimovic Aug 5, 2025
Author

stefanprodan
Sep 8, 2025
Maintainer