-
Notifications
You must be signed in to change notification settings - Fork 5.4k
Description
Description
Opening an existing zip in ZipArchiveMode.Update and adding new entries produces a corrupt archive when any existing entry has the data descriptor flag (bit 3 / 0x0008) set in its general purpose bit flag. The new entry's local file header overwrites the data descriptor of the last unchanged entry because the internal offset calculation does not account for the data descriptor bytes that follow compressed data.
Reproduction Steps
// dotnet add package SharpZipLib
using System.IO;
using System.IO.Compression;
using System.Text;
using ICSharpCode.SharpZipLib.Zip;
// Step 1: Create a zip on a non-seekable stream (forces bit 3 / data descriptor)
var ms = new MemoryStream();
using (var wrapper = new ForwardOnlyWriteStream(ms))
using (var create = new ZipArchive(wrapper, ZipArchiveMode.Create, leaveOpen: true))
{
var e = create.CreateEntry("original.txt");
using var s = e.Open();
s.Write("Hello world"u8);
}
// Step 2: Open in Update mode and add a new entry
ms.Seek(0, SeekOrigin.Begin);
using (var update = new ZipArchive(ms, ZipArchiveMode.Update, leaveOpen: true))
{
var e = update.CreateEntry("added.txt");
using var s = e.Open();
s.Write("New content"u8);
}
// Step 3: Verify with SharpZipLib — throws on .NET 10
ms.Seek(0, SeekOrigin.Begin);
using (var zipIn = new ZipInputStream(ms))
{
ZipEntry entry;
while ((entry = zipIn.GetNextEntry()) != null)
{
using var reader = new StreamReader(zipIn, leaveOpen: true);
Console.WriteLine($"{entry.Name}: {reader.ReadToEnd()}");
}
}
// Minimal non-seekable wrapper that forces data descriptor usage
sealed class ForwardOnlyWriteStream(Stream inner) : Stream
{
public override bool CanRead => false;
public override bool CanSeek => false;
public override bool CanWrite => true;
public override long Length => inner.Length;
public override long Position { get => inner.Position; set => throw new NotSupportedException(); }
public override void Flush() => inner.Flush();
public override int Read(byte[] b, int o, int c) => throw new NotSupportedException();
public override long Seek(long o, SeekOrigin s) => throw new NotSupportedException();
public override void SetLength(long v) => inner.SetLength(v);
public override void Write(byte[] b, int o, int c) => inner.Write(b, o, c);
public override void Write(ReadOnlySpan<byte> b) => inner.Write(b);
}Expected behavior
All entries in the output zip are readable. The verify step prints:
original.txt: Hello world
added.txt: New content
Actual behavior
SharpZipLib's ZipInputStream throws or returns garbled data for original.txt. The first existing entry's data descriptor is overwritten by the new entry's local file header. Any strict zip reader that trusts the data descriptor flag (bit 3) in the local file header will fail.
Regression?
Yes. This works correctly on .NET 8. The regression was introduced in .NET 10 by PR #102704 ("Reduce memory usage when updating ZipArchives"), which added a ChangeState-based selective rewriting strategy that does not account for data descriptor bytes when computing where to append new entries.
Known Workarounds
Avoid ZipArchiveMode.Update when adding entries to an existing archive that may contain data descriptors. Instead, read all existing entries into memory and rewrite the archive from scratch using ZipArchiveMode.Create.
Configuration
- .NET version: .NET 10
- SharpZipLib version: 1.4.2
- OS: Any (not OS-specific)
- Architecture: Any (not architecture-specific)
- Specific to configuration: No — reproducible on all platforms whenever the source zip contains entries with the data descriptor flag set (bit 3). This commonly occurs with zips created on non-seekable streams or by external tools (Java
ZipOutputStream, Azure blob storage SDKs, etc.).
Other information
Root cause: WriteFileCalculateOffsets in ZipArchive.cs computes the end of an unchanged entry as entry.GetOffsetOfCompressedData() + entry.CompressedLength. CompressedLength is the compressed data size from the central directory and does not include the 16–24 byte data descriptor that follows compressed data when bit 3 is set. A secondary issue exists in WriteLocalFileHeaderAndDataIfNeeded: the metadata-only rewrite path seeks past _compressedSize without skipping the data descriptor, misaligning subsequent entries.
Related issues / PRs:
- Reduce memory usage when updating ZipArchives #102704 — PR that introduced the selective rewriting optimization (regression source)
- Update of zip file with DataDescriptor using ZipArchive results in corrupted file #26256 — Original data descriptor / Update mode bug (fixed in .NET 3.0 via corefx#37601)
- [.NET 10 Preview 1] System.IO.Compression.ZipArchive produces subtly incorrect zip headers #112017 — Separate .NET 10 zip header mismatch issue (fixed before RTM)
I would love to tackle this problem to get it moving, with the approval/review of other developers