This repository provides scripts and tools for evaluating the performance of decompilation processes using both traditional decompilers and large language models (LLMs). It is used in the paper "DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World Scenarios".
LLVM 18, install from LLVM Debian/Ubuntu nightly packages
To begin, clone the oss-fuzz project.
git clone https://github.com/google/oss-fuzz.gitThen we modify the base-builder Dockerfile to include bear and clang-extract to support the function extraction.
# Download prebuilt clang-extract
wget 'https://cloud.vul337.team:8443/public.php/dav/files/br9qNTzwnmGgagF/clang-extract.tar.gz' -O oss-fuzz/infra/base-images/base-builder/clang-extract.tar.gz
# Add bear and clang-extract to base-builder Dockerfile
cd oss-fuzz
git checkout 4bca88f3a369679336485181961db305161fe240
git apply ../oss-fuzz-patch/*.diffThen we load the Docker image.
curl https://cloud.vul337.team:8443/public.php/dav/files/br9qNTzwnmGgagF/base-runner.tar.gz -o - | docker load
curl https://cloud.vul337.team:8443/public.php/dav/files/br9qNTzwnmGgagF/base-builder.tar.gz -o - | docker loadIt is worth noting that although we pin the commit of oss-fuzz and load the Docker images we provided, the fuzzer building scripts always pull the latest commit of each project. Therefore, the extracted functions may vary slightly from those used in our paper, and some projects may not be able to be built.
Then we compile the dummy library for linking with the fuzzer.
docker run -it --rm -w /work -v $(pwd):/work gcr.io/oss-fuzz-base/base-builder bash -c "clang dummy.c -o libfunction.so -O2 -fPIC -shared && clang ld.c -o ld.so -shared -fPIC -O2"This repository includes a patched llvm-cov binary. It is identical to the llvm-cov bundled with official LLVM apt source, but contains a binary-level patch to prevent formatting counters in the output. The patch is shown below:
.text:00000000000A15D8 cmp r13d, 3
.text:00000000000A15DC nop ; Keypatch modified this from:
.text:00000000000A15DC ; jg short loc_A1608
.text:00000000000A15DC ; Keypatch padded NOP to next boundary: 1 bytes
.text:00000000000A15DD nop
.text:00000000000A15DE lea rax, [rbx+10h]
.text:00000000000A15E2 mov [rbx], rax
.text:00000000000A15E5 mov rcx, [rsp+88h+src]
The default configuration file is located at config.yaml, containing:
oss_fuzz_path: The path to theoss-fuzzproject.decompilers: A list of decompilers to be evaluated.opts: A list of optimization levels to be evaluated.
Many scripts contain the --config parameter to specify the configuration file.
python extract_functions.pyOptionally, extract only several selected projects with 96 workers
python3 extract_functions.py --worker-count 96 --project file,libprotobuf-mutatorInitially, execute the fuzzers to collect covered functions, including their names and corresponding files. Coverage information is recorded in {oss_fuzz_path}/build/stats/{project}/{fuzzer}_result.json.
For each function covered by the fuzzer, use clang and clang-extract to extract functions with external dependencies from each project, storing them in {oss_fuzz_path}/functions/{project}.
To compile the extracted functions, ensure that LLVM and Clang are installed on your system.
Specify the libclang library file path in LIBCLANG_PATH, for example, export LIBCLANG_PATH=/usr/lib/llvm-16/lib/libclang-16.so.1, adjusting it to match your installation path.
Set the oss_fuzz_path in the config.yaml and the desired output path, then execute the following command:
export LIBCLANG_PATH="/usr/lib/llvm-18/lib/libclang-18.so.1"
export dataset_path=path/to/the/dataset
python compile_ossfuzz.py --output $dataset_pathThis script organizes all functions into a dataset, formatted as datasets. It compiles these functions using clang, applying optimization levels from O0 to Os.
The resulting binaries are stored in $dataset_path/binary.
The dataset containing the metadata is located in $dataset_path/compiled_ds. The metadata includes the function name, the prolouge for the function (macro, structure definition), the address of the target function to be decompiled, and the path to the binary file.
The dataset acts as the ground truth for evaluating and is stored in $dataset_path/eval. It contains the function name, the prolouge for the function (macro, structure definition), and the original source code. The columns inside this dataset are a subset of the columns in the compiled_ds dataset.
This section outlines the scripts used for decompilation, utilizing both traditional decompilers and large language models (LLMs).
We utilize a decompiler-service to perform scalable decompilation. The service is hosted on a server.
cd decompiler-service
pip install -r requirements.txtThen we need to provide the necessary binaries and licenses for the decompilers. For Hex-Rays, BinaryNinja, Dewolf, and etc, you need to have a license for the respective decompiler. Refer to decompiler-service/README.md for more information.
Build the decompiler images with the following command:
enabled_decompilers="--with-angr --with-ghidra --with-recstudio --with-reko --with-retdec --with-binja --with-dewolf --with-hexrays --with-mlm"
python manage.py $enabled_decompilers buildTo start the decompiler service, run:
python manage.py $enabled_decompilers startWe use a dedicated client named declient to interact with the decompiler-service. Install the client by:
pip install -e ./decompiler-service/src/declientTo warmup the decompiler service (which is necessary for each time restart the decompiler service), run:
python decompiler-service/scripts/test_decompile_async.pyThis should return a successful response from the decompiler-service. And the result will be stored in ./my_task_queue.json
To obtain decompiled code from traditional decompilers (Make sure the decompiler-service is running and warmed up), execute:
# use hexrays to decompile
python decompile.py --base-dataset-path $dataset_path --output $dataset_path/decompiled_ds_hexrays --decompilers hexrays
# use ghidra to decompile
python decompile.py --base-dataset-path $dataset_path --output $dataset_path/decompiled_ds_ghidra --decompilers ghidra
# or use both hexrays and ghidra to decompile simultaneously
python decompile.py --base-dataset-path $dataset_path --output $dataset_path/decompiled_ds_ghidra_hexrays --decompilers ghidra,hexraysdataset: Path to the dataset from the previous compilation step, it should containcompiled_dsandbinary.output: Path where the decompiled code dataset will be stored.
This script interfaces with a server hosting six traditional decompilers, such as Hex-Rays, to request decompiled code asynchronously.
python refine.py --dataset $dataset_path/decompiled_ds_hexrays --model gpt-4o-mini --output-file $dataset_path/gpt-4o-mini.jsonl --concurrency 30python merge.py --base-dataset-path $dataset_path/ --decompiled-datasets $dataset_path/gpt-4o-mini.jsonl $dataset_path/decompiled_ds_ghidra/ $dataset_path/decompiled_ds_hexrays/ --output $dataset_path/decompiled_dsThis section describes the evaluation of decompiled code.
Before evaluation, integrate all decompiler outputs, including those from LLMs, into a single dataset saved at ./decompiled_ds_all. Then, execute:
python evaluate_rsr.py --decompiled-dataset $dataset_path/decompiled_ds --decompilers hexraysEnable the debug parameter to print error messages for specific data. This script recompiles the specified decompiler outputs in Docker, applies fixes, and reports success rates across different optimization levels. Successfully compiled functions are stored as shared libraries in {oss_fuzz_path}/build/challenges for further evaluation.
To assess coverage differences before and after replacing with decompiled code, run:
python evaluate_cer.py --dataset $dataset_path/decompiled_dsThis script generates coverage reports for each function by linking the reference (base) shared object and the decompiled function's shared object separately.
Finally, evaluate code quality: Before running, you can set the model's URL (OPENAI_BASE_URL) and API key (OPENAI_API_KEY) in the environment variables.
python code_quality.py --run --model your_model --dataset ./decompiled_ds_all --output your_output_pathThis script conducts an LLM arena evaluation across 12 dimensions, computing Elo scores to assess code quality. The output path will contain all scoring information in PKL files. Use the rate parameter instead of run to calculate Elo scores for different aspects and overall performance.
If this work is helpful for your research, please consider citing the following BibTeX entry.
@misc{gao2025decompilebenchcomprehensivebenchmarkevaluating,
title={DecompileBench: A Comprehensive Benchmark for Evaluating Decompilers in Real-World Scenarios},
author={Zeyu Gao and Yuxin Cui and Hao Wang and Siliang Qin and Yuanda Wang and Bolun Zhang and Chao Zhang},
year={2025},
eprint={2505.11340},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2505.11340},
}