Skip to content

Conversation

@rzikm
Copy link
Member

@rzikm rzikm commented Sep 11, 2025

Implements #59591

Ignore native code in src/native/external/zstd (verbatim copy of latest zstd release, included in the very first commit)

There are some patches applied to vendored zstd sources which are listed in src/native/external, neither of them should be necessary once we update to next release (maintainers didn't commit to a release date, but hopefully they will release something early next year).

Rest of the implementation mirrors the approach done for Brotli.

@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-io-compression
See info in area-owners.md if you want to be subscribed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be upstreamed. You can link the PR in -version.txt file (see libunwind-version.txt). We've avoided adding patch files like these to the repo in the past.

@dotnet-policy-service
Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@rzikm rzikm marked this pull request as ready for review November 13, 2025 14:43
Copilot AI review requested due to automatic review settings November 13, 2025 14:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements ZStandard stream, encoder, and decoder functionality for .NET by integrating the zstd native library. The changes primarily consist of adding the external zstd library source files and build infrastructure to support compression and decompression capabilities.

Key Changes:

  • Integration of zstd native library source code (version metadata suggests 0.9.0)
  • Build system configuration files for Meson and CMake
  • Single-file library generation scripts and examples
  • Test infrastructure for the zstd implementation

Reviewed Changes

Copilot reviewed 142 out of 237 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/native/external/zstd/lib/common/fse_decompress.c FSE (Finite State Entropy) decompression implementation
src/native/external/zstd/lib/common/fse.h FSE codec public API and data structures
src/native/external/zstd/lib/common/error_private.h Internal error handling macros and definitions
src/native/external/zstd/lib/common/error_private.c Error code to string mapping implementation
src/native/external/zstd/lib/common/entropy_common.c Common entropy encoding/decoding functions
src/native/external/zstd/lib/common/debug.h Debug logging and assertion macros
src/native/external/zstd/lib/common/debug.c Debug level global variable definition
src/native/external/zstd/lib/common/cpu.h CPU feature detection (BMI2, AVX, etc.)
src/native/external/zstd/lib/common/compiler.h Compiler-specific macros and attributes
src/native/external/zstd/lib/common/bitstream.h Bitstream encoding/decoding utilities
src/native/external/zstd/lib/common/bits.h Bit manipulation helper functions
src/native/external/zstd/lib/common/allocations.h Custom memory allocation wrappers
src/native/external/zstd/lib/README.md Documentation for library structure and build options
src/native/external/zstd/lib/Makefile Build configuration for the zstd library
src/native/external/zstd/lib/BUCK Buck build system configuration
src/native/external/zstd/lib/.gitignore Git ignore rules for build artifacts
src/native/external/zstd/build/single_file_libs/* Scripts and examples for single-file library generation
src/native/external/zstd/build/meson/* Meson build system configuration files
src/native/external/zstd/build/cmake/tests/.gitignore Git ignore rules for CMake test artifacts

@rzikm
Copy link
Member Author

rzikm commented Nov 13, 2025

I think this PR is ready for the first round of reviews


nuint result = Interop.Zstd.ZSTD_decompressStream(_context!, ref output, ref input);

if (Interop.Zstd.ZSTD_isError(result) != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Interop.Zstd.ZSTD_isError(result) != 0)
if (ZstandardUtils.IsError(result))

(same in other places given that you added a helper for it)

Comment on lines +109 to +111
if (_context.IsInvalid)
throw new IOException(SR.ZstandardDecoder_Create);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The if blocks vs. single statements are inconsistent around this project

public int TargetBlockSize { get { throw null; } set { } }
public int Window { get { throw null; } set { } }
}
public partial class ZstandardDecoder : System.IDisposable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ZstandardDecoder and ZstandardEncoder look like they should be sealed - was this just an API review oversight when switching these to class?

(IntPtr)destPtr, (nuint)destination.Length,
(IntPtr)sourcePtr, (nuint)source.Length);

if (Interop.Zstd.ZSTD_isError(result) != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (Interop.Zstd.ZSTD_isError(result) != 0)
if (ZstandardUtils.IsError(result))

/// <param name="source">The compressed data to decompress.</param>
/// <param name="destination">The buffer to write the decompressed data to.</param>
/// <param name="bytesWritten">The number of bytes written to the destination.</param>
/// <returns>True if decompression was successful; otherwise, false.</returns>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We like Try methods like this to only return false for a single type of failure - generally whether the destination buffer has enough capacity.
This way the caller can have logic along the lines of

while (!ZstandardDecoder.TryDecompress(source, dest.AvailableSpan, out int bytesWritten))
{
    dest.Grow();
}

It looks like the current impl is also returning false for invalid source inputs (e.g. empty)?
Should we be throwing instead?

{
public SafeZstdCDictHandle() : base(IntPtr.Zero, true) { }

internal GCHandle _pinnedData;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: consider using GCHandle<byte[]>/PinnedGCHandle<byte[]> instead (slightly cheaper newer types)

throw new ArgumentException(SR.ZstandardDictionary_EmptyBuffer, nameof(samples));
}

if (sampleLengths.Length < 5)
Copy link
Member

@MihaZupan MihaZupan Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this minimum something we're picking or the underlying lib?
It feels rather arbitrary.

/// Recommended maximum dictionary size is 100KB, and that the size of the training data
/// should be approximately 100 times the size of the resulting dictionary.
/// </remarks>
public static ZstandardDictionary Train(ReadOnlySpan<byte> samples, ReadOnlySpan<long> sampleLengths, int maxDictionarySize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should sampleLengths be ints instead?

<value>ZstandardStream.BaseStream returned more bytes than requested in Read.</value>
</data>
<data name="ZstandardStream_Compress_InvalidData" xml:space="preserve">
<value>Encoder ran into invalid data.</value>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "invalid data" mean when encoding?

Comment on lines +14 to +16
private int _bufferOffset;
private int _bufferCount;
private bool _nonEmptyInput;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could ArrayBuffer simplify managing these?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants