Skip to content

Multiple improvements: seekable tests, zstdless CLI, streaming dict example, seqBench hardening, DiB early termination#4617

Open
BhavyaBibra wants to merge 1 commit intofacebook:devfrom
BhavyaBibra:contrib/improvements-batch
Open

Multiple improvements: seekable tests, zstdless CLI, streaming dict example, seqBench hardening, DiB early termination#4617
BhavyaBibra wants to merge 1 commit intofacebook:devfrom
BhavyaBibra:contrib/improvements-batch

Conversation

@BhavyaBibra
Copy link

Summary

A batch of improvements across tests, examples, documentation, and contrib tools. Each change is independent and addresses existing TODOs or gaps in the codebase.

Changes

1. Seekable format: add unit tests & fix FIXME (contrib/seekable_format/tests/seekable_tests.c)

  • Added 4 new test cases:
    • Zero-byte round trip — compress/decompress empty input
    • Checksum-enabled round trip — exercises checksumFlag=1 path
    • Out-of-bounds offset — decompress at offset past data returns error/zero
    • Invalid frame indexgetFrameCompressedSize() etc. return errors for bad indices
  • Replaced /* Github issue #FIXME */ with descriptive regression test comment (variation of ZSTD_seekable_decompress() can hang #2335)
  • Removed /* TODO: Add more tests */

2. Improve zstdless script (programs/zstdless)

  • Added --help / -h with usage message
  • Added --version / -V to print zstd version
  • Added ZSTDLESS_FLAGS environment variable for passing custom flags to zstd
  • Removed the TODO comment

3. Add streaming dictionary compression example (examples/)

  • New streaming_dictionary_compression.c demonstrating ZSTD_CCtx_loadDictionary() + ZSTD_compressStream2()
  • Fills the gap between existing dictionary_compression and streaming_compression examples
  • Updated Makefile and README.md

4. Document memset engineering decisions (lib/compress/zstd_compress.c)

  • Replaced two /* TODO: avoid memset? */ comments (LDM hash table and bucket offsets) with documented analysis:
    • LDM is only enabled with --long (rare default path)
    • Generation counter approach would add branch overhead to every LDM lookup
    • memset cost is dominated by table allocation

5. Early termination in DiB_fileStats() (programs/dibio.c)

  • Added early break when totalSizeToLoad >= MAX_SAMPLES_SIZE (2GB)
  • Avoids unnecessary stat() syscalls when training dictionaries on large file sets
  • Resolves /* TODO: there is opportunity to stop DiB_fileStats() early */

6. Harden contrib/seqBench (contrib/seqBench/seqBench.c)

  • Added fopen() error check (previously segfaulted on missing files)
  • Added all malloc() error checks
  • Added fread() return value validation
  • Added proper resource cleanup via goto cleanup
  • Improved usage message with description
  • Added compression ratio output and decompression size validation

Testing

  • make -C lib libzstd.a
  • make -C contrib/seekable_format/tests test — all 9 tests pass ✅
  • make -C examples all — all 10 examples build ✅
  • make -C programs zstd — full binary builds with dibio.c changes ✅
  • seqBench compiles successfully ✅
  • zstdless --help prints usage ✅

…le, seqBench hardening, DiB early termination
@meta-cla meta-cla bot added the CLA Signed label Mar 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant