Skip to content

Gzip config#79

Draft
zricethezav wants to merge 3 commits intomainfrom
gzip-config
Draft

Gzip config#79
zricethezav wants to merge 3 commits intomainfrom
gzip-config

Conversation

@zricethezav
Copy link
Copy Markdown
Member

removes 300KBs but adds 600 microseconds to startup. Worth it? Prob since 0.6ms is imperceivable to a human

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 2, 2026

Greptile Summary

This PR compresses the embedded default config (betterleaks.toml) with gzip, reducing the binary size by ~300 KB at the cost of a one-time ~600 µs decompression on first use. The config generator is updated to write a .toml.gz file, config.go now embeds the compressed bytes and lazily decompresses them via sync.Once, and all call sites are updated from config.DefaultConfig (var) to config.DefaultConfig() (func).

  • The lazy-decompression design using sync.Once is correct and thread-safe.
  • In cmd/generate/config/main.go, defer gw.Close() silently discards the error from the gzip writer's Close(), which writes the GZIP footer/checksum. A failure here would produce a silently corrupt .toml.gz without any non-zero exit code. Additionally, because logging.Fatal() calls os.Exit, the deferred gw.Close() is never reached on the error path at all — the file would be left incomplete on disk.
  • In config/config.go, defer gz.Close() on the reader also ignores the CRC32 checksum verification that Close() performs; this is lower risk but worth addressing for correctness.
  • All other call-site updates (cmd/root.go, detect/detect.go) are clean and correct.

Confidence Score: 3/5

  • Safe to merge for runtime behavior, but the generator has a silent-corruption risk that could produce a bad .toml.gz if re-run after a template error.
  • The runtime path (decompressing the embedded .gz at startup) is correct. The risk lives entirely in the code-generator: defer gw.Close() both silently discards errors and is skipped entirely on Fatal/os.Exit paths, meaning a failed generation run could leave a corrupt .gz on disk with no visible signal. This doesn't affect the current committed binary, but it's a latent hazard for the next time the generator is run.
  • cmd/generate/config/main.go — gzip writer close error handling needs to be explicit.

Important Files Changed

Filename Overview
cmd/generate/config/main.go Generator now writes a gzip-compressed TOML file; defer gw.Close() silently discards errors and won't run if logging.Fatal() (os.Exit) is called.
config/config.go DefaultConfig changed from embedded string var to a lazily-decompressed function backed by sync.Once; gz.Close() error is silently ignored.
cmd/root.go All call sites updated from config.DefaultConfig (var) to config.DefaultConfig() (func); no issues.
detect/detect.go Single call site updated from config.DefaultConfig to config.DefaultConfig(); no issues.
config/betterleaks.toml.gz New binary artifact: gzip-compressed version of betterleaks.toml; replaces the plain-text embed.
config/betterleaks.toml Plain-text TOML config kept as a reference/source file but no longer embedded directly into the binary.

Reviews (1): Last reviewed commit: "rm replace in go.mod" | Re-trigger Greptile

Comment on lines +419 to +420
gw := gzip.NewWriter(f)
defer gw.Close()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 gw.Close() error silently discarded

gzip.Writer.Close() flushes any buffered data and writes the GZIP footer (including the CRC32 checksum). If this call fails, the resulting .gz file will be silently truncated or corrupt — but the generator will exit with code 0, making the problem invisible until a downstream consumer tries to decompress it.

The fix is to check the error explicitly instead of deferring:

	gw := gzip.NewWriter(f)

	cfg := base.CreateGlobalConfig()
	cfg.Rules = ruleLookUp
	for _, allowlist := range cfg.Allowlists {
		slices.Sort(allowlist.Commits)
		slices.Sort(allowlist.StopWords)
	}
	if err = tmpl.Execute(gw, cfg); err != nil {
		logging.Fatal().Err(err).Msg("could not execute template")
	}
	if err = gw.Close(); err != nil {
		logging.Fatal().Err(err).Msg("could not close gzip writer")
	}

Note: this also sidesteps the separate (but related) issue that defer calls are skipped when logging.Fatal() calls os.Exit, so the current defer gw.Close() would never fire on the error path anyway — meaning the file on disk would be incomplete after a template failure.

Comment thread config/config.go
gz, err := gzip.NewReader(bytes.NewReader(defaultCfgZipped))
if err != nil {
panic(err)
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 gz.Close() error ignored — checksum not verified

For a gzip.Reader, Close() validates the CRC32 checksum of the decompressed data and the ISIZE field in the GZIP footer. Ignoring its return value means a footer-only corruption (e.g., a truncated file where all bytes were still read) would go undetected.

While Go's gzip reader does verify the checksum during the last Read() that returns io.EOF (so io.ReadAll would surface most corruption), relying solely on that is fragile. Better to check the Close() error explicitly:

		if err := gz.Close(); err != nil {
			panic(fmt.Errorf("gzip close: %w", err))
		}

@zricethezav
Copy link
Copy Markdown
Member Author

@twpayne I'm trying to reduce some of the betterleaks binary bloat here. Is 0.6ms acceptable for your use case?

@twpayne
Copy link
Copy Markdown
Contributor

twpayne commented Apr 2, 2026

As long as the 0.6ms is only paid when the functionality is used (and not on every startup), that's fine.

As an addition/alternative, I wonder if https://github.com/lemire/constmap would be a good fit here? It claims to be fast and use less memory, and you can pre-compute the map ahread of time (e.g. in a go:generate result that you then embed).

@zricethezav
Copy link
Copy Markdown
Member Author

As long as the 0.6ms is only paid when the functionality is used (and not on every startup), that's fine.

It only gets paid when using the default config at startup (when DefaultConfig() is called)

@zricethezav zricethezav marked this pull request as draft April 2, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants