Skip to content

ouuan/ZipDiff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ZipDiff

A differential fuzzer for ZIP parsers.

This is the source code for the USENIX Security '25 paper My ZIP isn’t your ZIP: Identifying and Exploiting Semantic Gaps Between ZIP Parsers.

Permanent link and Docker image files: https://doi.org/10.5281/zenodo.15526863

Environment

  • Linux
  • Rust (tested on 1.86, any version is fine as long as the code compiles successfully)
  • Docker and Docker Compose plugin
  • Python 3 with numpy and matplotlib to generate tables and figures
  • The full fuzzing process is resource-intensive, as it runs many ZIP parsers in parallel. It is recommended to have at least 128 GB of RAM and 300 GB of disk space. While it can also run on systems with fewer RAM, you may encounter significant performance degration, primarily due to uncached disk I/O, since the unzipped outputs can be quite large.

The exact environment used by the authors:

  • Ubuntu 23.10 with Linux 6.5.0-44
  • Rust 1.86.0
  • Docker 27.1.1 with Docker Compose 2.33.1
  • Python 3.13.3 with numpy 2.3.0 and matplotlib 3.10.3
  • CPU: Intel(R) Xeon(R) Gold 6330 CPU @ 2.00GHz with 112 logical CPUs
  • Memory and storage: 944G RAM + 44T disk (less than 1T was used)

File Structure

  • parsers
    • Subdirectories: Source files to build Docker images of the tested parsers. Each Docker image correspond to a tested ZIP parser.
    • parsers.json: Information of the parsers.
  • zip-diff: Rust code
    • The library crate: A helper ZIP library.
    • The fuzz binary crate: The differential fuzzer ZipDiff.
    • The construction binary crate: Construction of ambiguous ZIP files corresponding to the types and variants described in the paper.
    • The count binary crate: Count the types of ambiguities between each parser pair.
  • tools:
    • prepare.sh: Copy common scripts (unzip-all.sh, parallel-unzip-all.sh, testcase.sh) into the parser subdirectories (into their Docker build contexts) and generate the docker-compose.yml config file.
    • run-parsers.sh: Test the parsers against specified ZIP files (for manual testing).
    • ablation-study.sh: Reproduce the ablation study in the paper.
    • fuzz-stats.py: Draw the ablation study graph and summarize the stats.
    • inconsistency-table.py: Generate the parser inconsistency LaTeX table.
    • parsers-to-table.py: Retrieve GitHub stargazer counts and generate the LaTeX parser list.
  • constructions: This directory is used to place the constructed ambiguous ZIP files. The inconsistency-types.json file is generated by the count component and records the list of inconsistency types between each pair of parsers.

Preparation

  • Build ZIP parser Docker images:

    tools/prepare.sh
    cd parsers
    sudo docker compose build

    Alternatively, if you want to save some time or make sure the versions match the evaluation in the paper, you can load the images from files on Zenodo:

    for i in *.tar.bz2; do
        docker load -i "$i"
    done
  • Build the Rust binaries:

    cd zip-diff
    cargo build --release

Minimal Working Example

You can test if parsers are working by testing them on a ZIP file: (assuming that the zip command is installed)

pushd /tmp
echo test > test.txt
zip -0 test.zip test.txt
popd
tools/run-parsers.sh /tmp/test.zip

If everything goes well, you will see logs from Docker compose and the parsers, and then the results will be available at evaluation/results/tmp/test.zip:

01-infozip
└── test.txt
02-7zip
└── test.txt
……
50-swift-zipfoundation
└── test.txt

You can verify that all parsers successfully extracted test.txt from the ZIP archive.

A short 2-min fuzzing can be used to test that the fuzzer is working well: sudo target/release/fuzz -b 10 -s 120. This will run fuzzing for two minutes with only ten samples per batch. The fuzzer will print logs for each iteration. The log text should contain ok: 50, indicating that all parsers are working fine. The results will be available at evaluation/stats.json, evaluation/samples and evaluation/results.

Running the Fuzzer

cd zip-diff
sudo target/release/fuzz

Here root permission is required because the outputs are written inside Docker and are owned by root. Sometimes the outputs have incorrect permission bits and cannot be read by regular users even if the user is the file owner.

By default, the parser will run indefinitely and the results will be stored at evaluation/stats.json, evaluation/samples, and evaluation/results.

The fuzzer can be terminated at any time by Ctrl+C. You can also tell the fuzzer to stop after a specific time by setting the -s, --stop-after-seconds option.

The fuzzer does not automatically clear data from previous execution and the files might be mixed together. You should either remove the files left from previous execution if they are not needed, or specify different --samples-dir, --results-dir, and --stats-file locations. The ZIP file samples generated by the fuzzer are stored in --samples-dir, and the corresponding parser outputs are stored in --results-dir. You can check the outputs to see that the parsers produce inconsistent outputs for the same input samples.

The -b, --batch-size option can be reduced when there are not enough RAM or disk space.

Reproducing the ablation study

  1. Run sudo tools/ablation-study.sh. It will run five 24-hour fuzzing sessions for each of the three setups, for a total of 15 days.
  2. Run python3 tools/fuzz-stats.py evaluation/stats/* to draw the graph at inconsistent-pair-cdf.pdf (Figure 4 in the paper).

The full results took around 100GB of disk space for the authors. At runtime it may temporarily take another ~500GB of disk space. You can lower the $BATCH_SIZE in ablation-study.sh to reduce the required amount of RAM and disk space.

Testing the constructed ambiguous ZIP files

cd zip-diff
target/release/construction
sudo target/release/count

The construction crate provides constructions of the ZIP parsing ambiguities described in the paper Section 5.2.

The count step summarizes the number of inconsistencies between each pair of ZIP parsers. It took about 40 minutes for the authors.

The inconsistency details are stored at constructions/inconsistency-types.json. You can run tools/inconsistency-table.py to generate the LaTeX table (Table 4 in the paper).

About

[USENIX Security '25] My ZIP isn’t your ZIP: Identifying and Exploiting Semantic Gaps Between ZIP Parsers

Resources

License

Security policy

Stars

Watchers

Forks

Sponsor this project