-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Implement ZStandard Stream, Encoder, Decoder #119575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Tagging subscribers to this area: @dotnet/area-system-io-compression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be upstreamed. You can link the PR in -version.txt file (see libunwind-version.txt). We've avoided adding patch files like these to the repo in the past.
|
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements ZStandard stream, encoder, and decoder functionality for .NET by integrating the zstd native library. The changes primarily consist of adding the external zstd library source files and build infrastructure to support compression and decompression capabilities.
Key Changes:
- Integration of zstd native library source code (version metadata suggests 0.9.0)
- Build system configuration files for Meson and CMake
- Single-file library generation scripts and examples
- Test infrastructure for the zstd implementation
Reviewed Changes
Copilot reviewed 142 out of 237 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| src/native/external/zstd/lib/common/fse_decompress.c | FSE (Finite State Entropy) decompression implementation |
| src/native/external/zstd/lib/common/fse.h | FSE codec public API and data structures |
| src/native/external/zstd/lib/common/error_private.h | Internal error handling macros and definitions |
| src/native/external/zstd/lib/common/error_private.c | Error code to string mapping implementation |
| src/native/external/zstd/lib/common/entropy_common.c | Common entropy encoding/decoding functions |
| src/native/external/zstd/lib/common/debug.h | Debug logging and assertion macros |
| src/native/external/zstd/lib/common/debug.c | Debug level global variable definition |
| src/native/external/zstd/lib/common/cpu.h | CPU feature detection (BMI2, AVX, etc.) |
| src/native/external/zstd/lib/common/compiler.h | Compiler-specific macros and attributes |
| src/native/external/zstd/lib/common/bitstream.h | Bitstream encoding/decoding utilities |
| src/native/external/zstd/lib/common/bits.h | Bit manipulation helper functions |
| src/native/external/zstd/lib/common/allocations.h | Custom memory allocation wrappers |
| src/native/external/zstd/lib/README.md | Documentation for library structure and build options |
| src/native/external/zstd/lib/Makefile | Build configuration for the zstd library |
| src/native/external/zstd/lib/BUCK | Buck build system configuration |
| src/native/external/zstd/lib/.gitignore | Git ignore rules for build artifacts |
| src/native/external/zstd/build/single_file_libs/* | Scripts and examples for single-file library generation |
| src/native/external/zstd/build/meson/* | Meson build system configuration files |
| src/native/external/zstd/build/cmake/tests/.gitignore | Git ignore rules for CMake test artifacts |
|
I think this PR is ready for the first round of reviews |
|
|
||
| nuint result = Interop.Zstd.ZSTD_decompressStream(_context!, ref output, ref input); | ||
|
|
||
| if (Interop.Zstd.ZSTD_isError(result) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (Interop.Zstd.ZSTD_isError(result) != 0) | |
| if (ZstandardUtils.IsError(result)) |
(same in other places given that you added a helper for it)
| if (_context.IsInvalid) | ||
| throw new IOException(SR.ZstandardDecoder_Create); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: The if blocks vs. single statements are inconsistent around this project
| public int TargetBlockSize { get { throw null; } set { } } | ||
| public int Window { get { throw null; } set { } } | ||
| } | ||
| public partial class ZstandardDecoder : System.IDisposable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZstandardDecoder and ZstandardEncoder look like they should be sealed - was this just an API review oversight when switching these to class?
| (IntPtr)destPtr, (nuint)destination.Length, | ||
| (IntPtr)sourcePtr, (nuint)source.Length); | ||
|
|
||
| if (Interop.Zstd.ZSTD_isError(result) != 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if (Interop.Zstd.ZSTD_isError(result) != 0) | |
| if (ZstandardUtils.IsError(result)) |
| /// <param name="source">The compressed data to decompress.</param> | ||
| /// <param name="destination">The buffer to write the decompressed data to.</param> | ||
| /// <param name="bytesWritten">The number of bytes written to the destination.</param> | ||
| /// <returns>True if decompression was successful; otherwise, false.</returns> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We like Try methods like this to only return false for a single type of failure - generally whether the destination buffer has enough capacity.
This way the caller can have logic along the lines of
while (!ZstandardDecoder.TryDecompress(source, dest.AvailableSpan, out int bytesWritten))
{
dest.Grow();
}It looks like the current impl is also returning false for invalid source inputs (e.g. empty)?
Should we be throwing instead?
| { | ||
| public SafeZstdCDictHandle() : base(IntPtr.Zero, true) { } | ||
|
|
||
| internal GCHandle _pinnedData; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: consider using GCHandle<byte[]>/PinnedGCHandle<byte[]> instead (slightly cheaper newer types)
| throw new ArgumentException(SR.ZstandardDictionary_EmptyBuffer, nameof(samples)); | ||
| } | ||
|
|
||
| if (sampleLengths.Length < 5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this minimum something we're picking or the underlying lib?
It feels rather arbitrary.
| /// Recommended maximum dictionary size is 100KB, and that the size of the training data | ||
| /// should be approximately 100 times the size of the resulting dictionary. | ||
| /// </remarks> | ||
| public static ZstandardDictionary Train(ReadOnlySpan<byte> samples, ReadOnlySpan<long> sampleLengths, int maxDictionarySize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should sampleLengths be ints instead?
| <value>ZstandardStream.BaseStream returned more bytes than requested in Read.</value> | ||
| </data> | ||
| <data name="ZstandardStream_Compress_InvalidData" xml:space="preserve"> | ||
| <value>Encoder ran into invalid data.</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "invalid data" mean when encoding?
| private int _bufferOffset; | ||
| private int _bufferCount; | ||
| private bool _nonEmptyInput; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could ArrayBuffer simplify managing these?
Implements #59591
Ignore native code in src/native/external/zstd (verbatim copy of latest zstd release, included in the very first commit)
There are some patches applied to vendored zstd sources which are listed in src/native/external, neither of them should be necessary once we update to next release (maintainers didn't commit to a release date, but hopefully they will release something early next year).
Rest of the implementation mirrors the approach done for Brotli.