Skip to content

Conversation

@theMagicalKarp
Copy link

@theMagicalKarp theMagicalKarp commented Mar 7, 2025

Describe the bug

I currently manage hundreds of Kubernetes clusters, which are configured using Flux from a single GitOps repository. We utilize flux_bootstrap_git to manage Flux installations for each cluster. On average, this repository receives new commits every minute during day time hours.

This high frequency of commits has caused issues when updating the flux_bootstrap_git resource. Specifically, whenever flux terraform provider attempts to push a commit to our GitOps repository, the Terraform provider almost always times out with the following error:

│ failed to push manifests: failed to push to remote: command error on
│ refs/heads/main: cannot lock ref 'refs/heads/main': is at
│ da2267035aa50139f41df052947da4e85202c0f0 but expected
│ 71853c6197a6a7f222db0f1978c7cb232b87c5ee

To mitigate this, we’ve increased the timeouts, which has helped to some extent. However, we’ve observed that on every retry, the Terraform provider performs a full clone of the entire repository. This process is time-consuming, given that the repository has over 300,000 commits, and new commits are often added within the retry window.

A potential improvement could involve modifying the func (prd *providerResourceData) CloneRepository(ctx context.Context) function in internal/provider/provider_resource_data.go to use a shallow clone. Here’s an example of the proposed change:

func (prd *providerResourceData) CloneRepository(ctx context.Context) (*gogit.Client, error) {
	tmpDir, err := manifestgen.MkdirTempAbs("", "flux-bootstrap-")
	if err != nil {
		return nil, fmt.Errorf("could not create temporary working directory for git repository: %w", err)
	}
	gitClient, err := prd.GetGitClient(tmpDir)
	if err != nil {
		return nil, fmt.Errorf("could not create git client: %w", err)
	}
	// TODO: Need to conditionally clone here. If repository is empty this will fail.
	_, err = gitClient.Clone(ctx, prd.GetRepositoryURL().String(), repository.CloneConfig{
		CheckoutStrategy: repository.CheckoutStrategy{
			Branch: prd.git.Branch.ValueString(),
		},
+               ShallowClone: true,
	})
	if err != nil {
		return nil, fmt.Errorf("could not clone git repository: %w", err)
	}
	return gitClient, nil
}

Testing this change locally has shown an improvement in performance. It reduces the time required to clone the repository and should decrease the likelihood of timeouts when applying our Terraform configuration.

OG Issue: #735

How has this been tested?

  • Have you added an acceptance test for the functionality being added?
  • Have you run the acceptance tests on this branch?

Output from acceptance testing:

$ make testmacos
?   	github.com/fluxcd/terraform-provider-flux	[no test files]
?   	github.com/fluxcd/terraform-provider-flux/internal/framework/types	[no test files]
?   	github.com/fluxcd/terraform-provider-flux/internal/framework/validators	[no test files]
=== RUN   TestAccBootstrapGit_Drift
=== PAUSE TestAccBootstrapGit_Drift
=== CONT  TestAccBootstrapGit_Drift
--- PASS: TestAccBootstrapGit_Drift (132.63s)
PASS
ok  	github.com/fluxcd/terraform-provider-flux/internal/provider	133.669s
testing: warning: no tests to run
PASS
ok  	github.com/fluxcd/terraform-provider-flux/internal/utils	(cached) [no tests to run]

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Documentation

  • I have updated the documentation (if required) with make docs

Checklist:

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I've read the CONTRIBUTION guide
  • I have signed-off my commits with git commit -s

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritise this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

This helps ensure that the terraform provider can clone bigger repos in a
reasonable amount of time.

Signed-off-by: Robert Sheehy <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant