Skip to content

ZipArchive Update mode corrupts archive when adding entries to zip with data descriptors (bit 3 set) #126344

@bwinsley

Description

@bwinsley

Description

Opening an existing zip in ZipArchiveMode.Update and adding new entries produces a corrupt archive when any existing entry has the data descriptor flag (bit 3 / 0x0008) set in its general purpose bit flag. The new entry's local file header overwrites the data descriptor of the last unchanged entry because the internal offset calculation does not account for the data descriptor bytes that follow compressed data.

Reproduction Steps

// dotnet add package SharpZipLib
using System.IO;
using System.IO.Compression;
using System.Text;
using ICSharpCode.SharpZipLib.Zip;

// Step 1: Create a zip on a non-seekable stream (forces bit 3 / data descriptor)
var ms = new MemoryStream();
using (var wrapper = new ForwardOnlyWriteStream(ms))
using (var create = new ZipArchive(wrapper, ZipArchiveMode.Create, leaveOpen: true))
{
    var e = create.CreateEntry("original.txt");
    using var s = e.Open();
    s.Write("Hello world"u8);
}

// Step 2: Open in Update mode and add a new entry
ms.Seek(0, SeekOrigin.Begin);
using (var update = new ZipArchive(ms, ZipArchiveMode.Update, leaveOpen: true))
{
    var e = update.CreateEntry("added.txt");
    using var s = e.Open();
    s.Write("New content"u8);
}

// Step 3: Verify with SharpZipLib — throws on .NET 10
ms.Seek(0, SeekOrigin.Begin);
using (var zipIn = new ZipInputStream(ms))
{
    ZipEntry entry;
    while ((entry = zipIn.GetNextEntry()) != null)
    {
        using var reader = new StreamReader(zipIn, leaveOpen: true);
        Console.WriteLine($"{entry.Name}: {reader.ReadToEnd()}");
    }
}

// Minimal non-seekable wrapper that forces data descriptor usage
sealed class ForwardOnlyWriteStream(Stream inner) : Stream
{
    public override bool CanRead => false;
    public override bool CanSeek => false;
    public override bool CanWrite => true;
    public override long Length => inner.Length;
    public override long Position { get => inner.Position; set => throw new NotSupportedException(); }
    public override void Flush() => inner.Flush();
    public override int Read(byte[] b, int o, int c) => throw new NotSupportedException();
    public override long Seek(long o, SeekOrigin s) => throw new NotSupportedException();
    public override void SetLength(long v) => inner.SetLength(v);
    public override void Write(byte[] b, int o, int c) => inner.Write(b, o, c);
    public override void Write(ReadOnlySpan<byte> b) => inner.Write(b);
}

Expected behavior

All entries in the output zip are readable. The verify step prints:

original.txt: Hello world
added.txt: New content

Actual behavior

SharpZipLib's ZipInputStream throws or returns garbled data for original.txt. The first existing entry's data descriptor is overwritten by the new entry's local file header. Any strict zip reader that trusts the data descriptor flag (bit 3) in the local file header will fail.

Regression?

Yes. This works correctly on .NET 8. The regression was introduced in .NET 10 by PR #102704 ("Reduce memory usage when updating ZipArchives"), which added a ChangeState-based selective rewriting strategy that does not account for data descriptor bytes when computing where to append new entries.

Known Workarounds

Avoid ZipArchiveMode.Update when adding entries to an existing archive that may contain data descriptors. Instead, read all existing entries into memory and rewrite the archive from scratch using ZipArchiveMode.Create.

Configuration

  • .NET version: .NET 10
  • SharpZipLib version: 1.4.2
  • OS: Any (not OS-specific)
  • Architecture: Any (not architecture-specific)
  • Specific to configuration: No — reproducible on all platforms whenever the source zip contains entries with the data descriptor flag set (bit 3). This commonly occurs with zips created on non-seekable streams or by external tools (Java ZipOutputStream, Azure blob storage SDKs, etc.).

Other information

Root cause: WriteFileCalculateOffsets in ZipArchive.cs computes the end of an unchanged entry as entry.GetOffsetOfCompressedData() + entry.CompressedLength. CompressedLength is the compressed data size from the central directory and does not include the 16–24 byte data descriptor that follows compressed data when bit 3 is set. A secondary issue exists in WriteLocalFileHeaderAndDataIfNeeded: the metadata-only rewrite path seeks past _compressedSize without skipping the data descriptor, misaligning subsequent entries.

Related issues / PRs:

I would love to tackle this problem to get it moving, with the approval/review of other developers

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions