DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion
Qingcheng Zhao*,1,† · Xiang Zhang*,✉,2 · Haiyang Xu2 · Zeyuan Chen2 · Jianwen Xie3 · Yuan Gao4 · Zhuowen Tu2
1ShanghaiTech University · 2UC San Diego · 3Lambda, Inc. · 4Stanford University
ICCV 2025
* equal contribution ✉ corresponding author
† Project done while Qingcheng Zhao interned at UC San Diego.
Project Page | Paper | arXiv
We provide a pre-built Docker image at zx1239856/depr based on PyTorch 2.7.1 and CUDA 12.6. You can also build the image locally:
docker build -f Dockerfile . -t deprAlternatively, you can install dependencies based on commands listed in Dockerfile.
Please download processed 3D-FRONT dataset from https://huggingface.co/datasets/zx1239856/DepR-3D-FRONT. Extract the downloaded files into datasets/front3d_pifu/data. The result folder structure should look like
data/
|-- metadata/ (Scene metadata)
| |-- 0.jsonl
| |-- ...
|-- pickled_data/ (Raw data processed by InstPIFu)
| |-- test/
| |-- rendertask3000.pkl
| |-- ...
|-- sdf_layout/ (GT layouts)
| |-- 10000.npy
| |-- ...
|- 3D-FUTURE-watertight/ (GT meshes, required for evaluation)
| |-- 0004ae9a-1d27-4dbd-8416-879e9de1de8d/
| |-- raw_watertight.obj
| |-- ...
|-- instpifu_mask/ (Instance masks provided by InstPIFu)
|-- panoptic/ (Panoptic segmentation maps we rendered)
|-- img/ (Optional, can be extracted from pickled data)
|-- depth/depth_pro/ (Optional)
`-- grounded_sam/ (Optional)
Alternatively, you may generate depth / segmentation yourself based on instructions below.
Generate Segmentation
Please prepare Grounded SAM weights in checkpoint/grounded_sam.
grounded_sam/
|-- GroundingDINO_SwinB.py
|-- groundingdino_swinb_cogcoor.pth
|-- groundingdino_swint_ogc.pth
`-- sam_vit_h_4b8939.pth
python -m scripts.run_grounded_samGenerate Depth
Please put Depth Pro weights in checkpoint/.
python -m scripts.run_depth_pro --output depth_proPlease download our weights from https://huggingface.co/zx1239856/DepR and put everything in the checkpoint folder.
We provide a demo.ipynb notebook for inference demo on real-world images.
Object-level Evaluation
You may change 8 to the actual number of GPUs as needed.
bash launch.sh 8 all(Optional) Guided Sampling
bash launch.sh 8 all --guidedScene-level Evaluation
# Generate shapes
bash launch.sh 8 sample --metadata datasets/front3d_pifu/meta/test_scene.jsonl --use-sam
# Layout optim
bash launch.sh 8 scene --use-sam
# Prepare GT scene
python -m scripts.build_gt --out-dir output/gt
# Calculate scene-level CD/F1
accelerate launch --num_processes=8 --multi_gpu -m scripts.eval_scene --gt-pcd-dir output/gt/pcds --pred-dir output/infer/sam_3dproj_attn_dino_c9_augdep_augmask_nocfg_model_0074999/ --save-dir output/evaluation/results --method deprThis repository is released under the CC-BY-SA 4.0 license.
Our framework utilizes pre-trained models including Grounded-Segment-Anything, Depth Pro, and DINO v2.
Our code is built upon diffusers, Uni-3D, and BlockFusion.
We use physically based renderings of 3D-FRONT scenes provided by InstPIFu. Additionally, we rendered panoptic segmentation maps ourselves.
We thank all these authors for their nicely open sourced code/datasets and their great contributions to the community.
If you find our work useful, please consider citing:
@InProceedings{Zhao_2025_ICCV_DepR,
author = {Zhao, Qingcheng and Zhang, Xiang and Xu, Haiyang and Chen, Zeyuan and Xie, Jianwen and Gao, Yuan and Tu, Zhuowen},
title = {DepR: Depth Guided Single-view Scene Reconstruction with Instance-level Diffusion},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {5722-5733}
}