- How I Built the World’s Fastest Whitespace Trimmer
- lefthook-driven workflow; editor auto-trim usually good enough.
- Recent surge of generated/AI-written code sneaking in trailing spaces and missing final newlines.
- Existing solutions (bash script, Python pre-commit hook, npm package, other Rust tool) would work but feel unsatisfying.
- Reading Bun’s performance blog post rekindled the idea: build something absurdly fast for fun and for the blog.
- DocSpring focus: useful open source tools/blog posts for developers; no interest in marketing fluff.
- Personal curiosity about algorithms/performance despite limited formal CS background.
- Renamify boilerplate: copied repo to
trim-trailing-whitespace, used Renamify to rewrite identifiers. - Initial cleanup: delete MCP server, VS Code extension, legacy docs, all original Rust code; keep Taskfiles, lefthook, CI scaffolding.
- Bloom filter cache: auto-sized, tuned on DocSpring repo; handle false positives via resizes.
- Where to store cache? Options: temp dir,
.git, OS cache dirs. - Compiling
.gitignore: convert patterns into compact machine-friendly representation; rebuild on mtime changes/additions. - Per-file delta detection: subdivide files into segments; can we avoid reading every byte each run?
- Quick research (ChatGPT): deleting arbitrary bytes without rewriting is mostly impossible; file rewrite is still best.
- Deliverables: Rust core crate + CLI only, whitespace transformations only.
- Respect Git’s ignore configuration (
.gitignore,.git/info/exclude, global excludes). - No IDE/MCP integrations; project is a one-off experiment tied to the blog post.
- Performance-first mindset: compile-time feature flags to toggle optimisations, benchmarking pipeline to show improvements.
whitespace-core: transcode/trim logic, line ending normalization, tab/space conversion, SIMD implementations.whitespace-cli: Clap-based binary, recursive walker, binary detection,--checkmode, feature toggles.- Cache design:
- Cache paths per OS; repo ID via blake3-128 of canonical root + volume info.
- Files:
keys.bin(sorted u128 keys),meta.bin(header, OS journal checkpoints, repo stats),lockguard. - Key formula: blake3-128(path, dev, ino, size, mtime_ns, ctime_ns, mode).
- Racy guard: rescan when
mtime_sec==index_write_time_sec. - Change detection: macOS FSEvents, Windows USN Journal, Linux fallback walk.
- Build naive baseline implementation first (byte-by-byte, no cache, always rewrite).
- Add SIMD and other optimisations behind cargo features.
- Introduce compile-time feature matrix (baseline → simd → parallel walk → cache → mmap) for benchmarks.
- Use hyperfine to benchmark each tier on
/Users/ndbroadbent/code/docspring(cold vs warm cache). - Document results with charts comparing stages (baseline vs optimised).
- Introduction + why the tool exists.
- Scaffolding and cleaning the repo.
- Exploring performance ideas in the car-side brain dump (Bloom filters, gitignore compilation, cache placement).
- Lessons from talking to AI (file rewrite reality, SIMD suggestions, other micro-optimisations).
- Designing the definitive cache (keys, meta, OS journals, racy guard).
- Naive baseline implementation and first measurements.
- Iterative optimisations with feature flags + benchmark plots.
- Integrating with lefthook and everyday workflow.
- Final thoughts (was it worth it? fun factor? what’s next optional, but keep minimal until work is done).
- Insert references/links: Bun performance blog, API client post, lefthook, DocSpring.
- Collect benchmark data with hyperfine once implementations land.
- Generate diagrams for cache structure (
keys.bin,meta.bin). - Keep tone conversational, emphasise experimentation over production dogma.