Depth Anything 3: Recovering the Visual Space from Any Views

DAV3 GUI Setup Instructions:

# run these commands one at a time
git clone https://github.com/enoky/Depth-Anything-3-GUI.git
cd Depth-Anything-3-GUI
python -m venv venv
venv\scripts\activate
pip install -r requirements.txt
pip install gsplat # might not be required for da3mono model
pip install -U xformers --index-url https://download.pytorch.org/whl/cu129
pip install torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu129
pip install -e .

# to start the gui... (make sure venv is activated)
python gui.py

Depth Anything 3: Recovering the Visual Space from Any Views

Haotong Lin^* · Sili Chen^* · Jun Hao Liew^* · Donny Y. Chen^* · Zhenyu Li · Guang Shi · Jiashi Feng
Bingyi Kang^*†

†project lead *Equal Contribution

This work presents Depth Anything 3 (DA3), a model that predicts spatially consistent geometry from arbitrary visual inputs, with or without known camera poses. In pursuit of minimal modeling, DA3 yields two key insights:

💎 A single plain transformer (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization,
✨ A singular depth-ray representation obviates the need for complex multi-task learning.

🏆 DA3 significantly outperforms DA2 for monocular depth estimation, and VGGT for multi-view depth estimation and pose estimation. All models are trained exclusively on public academic datasets.

📰 News

2025-11-14: 🎉 Paper, project page, code and models are all released.

✨ Highlights

🏆 Model Zoo

We release three series of models, each tailored for specific use cases in visual geometry.

🌟 DA3 Main Series (DA3-Giant, DA3-Large, DA3-Base, DA3-Small) These are our flagship foundation models, trained with a unified depth-ray representation. By varying the input configuration, a single model can perform a wide range of tasks:
- 🌊 Monocular Depth Estimation: Predicts a depth map from a single RGB image.
- 🌊 Multi-View Depth Estimation: Generates consistent depth maps from multiple images for high-quality fusion.
- 🎯 Pose-Conditioned Depth Estimation: Achieves superior depth consistency when camera poses are provided as input.
- 📷 Camera Pose Estimation: Estimates camera extrinsics and intrinsics from one or more images.
- 🟡 3D Gaussian Estimation: Directly predicts 3D Gaussians, enabling high-fidelity novel view synthesis.
📐 DA3 Metric Series (DA3Metric-Large) A specialized model fine-tuned for metric depth estimation in monocular settings, ideal for applications requiring real-world scale.
🔍 DA3 Monocular Series (DA3Mono-Large). A dedicated model for high-quality relative monocular depth estimation. Unlike disparity-based models (e.g., Depth Anything 2), it directly predicts depth, resulting in superior geometric accuracy.

🔗 Leveraging these available models, we developed a nested series (DA3Nested-Giant-Large). This series combines a any-view giant model with a metric model to reconstruct visual geometry at a real-world metric scale.

🛠️ Codebase Features

Our repository is designed to be a powerful and user-friendly toolkit for both practical application and future research.

🎨 Interactive Web UI & Gallery: Visualize model outputs and compare results with an easy-to-use Gradio-based web interface.
⚡ Flexible Command-Line Interface (CLI): Powerful and scriptable CLI for batch processing and integration into custom workflows.
💾 Multiple Export Formats: Save your results in various formats, including glb, npz, depth images, ply, 3DGS videos, etc, to seamlessly connect with other tools.
🔧 Extensible and Modular Design: The codebase is structured to facilitate future research and the integration of new models or functionalities.

🚀 Quick Start

📦 Installation

pip install -e . # Basic
pip install -e ".[gs]" # Gaussians Estimation and Rendering
pip install -e ".[app]" # Gradio, python>=3.10
pip install -e ".[all]" # ALL

For detailed model information, please refer to the Model Cards section below.

💻 Basic Usage

import glob, os, torch
from depth_anything_3.api import DepthAnything3
device = torch.device("cuda")
model = DepthAnything3.from_pretrained("depth-anything/DA3NESTED-GIANT-LARGE")
model = model.to(device=device)
example_path = "assets/examples/SOH"
images = sorted(glob.glob(os.path.join(example_path, "*.png")))
prediction = model.inference(
    images,
)
# prediction.processed_images : [N, H, W, 3] uint8   array
print(prediction.processed_images.shape)
# prediction.depth            : [N, H, W]    float32 array
print(prediction.depth.shape)  
# prediction.conf             : [N, H, W]    float32 array
print(prediction.conf.shape)  
# prediction.extrinsics       : [N, 3, 4]    float32 array # opencv w2c or colmap format
print(prediction.extrinsics.shape)
# prediction.intrinsics       : [N, 3, 3]    float32 array
print(prediction.intrinsics.shape)

export MODEL_DIR=depth-anything/DA3NESTED-GIANT-LARGE
# This can be a Hugging Face repository or a local directory
# If you encounter network issues, consider using the following mirror: export HF_ENDPOINT=https://hf-mirror.com
# Alternatively, you can download the model directly from Hugging Face
export GALLERY_DIR=workspace/gallery
mkdir -p $GALLERY_DIR

# CLI auto mode with backend reuse
da3 backend --model-dir ${MODEL_DIR} --gallery-dir ${GALLERY_DIR} # Cache model to gpu
da3 auto assets/examples/SOH \
    --export-format glb \
    --export-dir ${GALLERY_DIR}/TEST_BACKEND/SOH \
    --use-backend

# CLI video processing with feature visualization
da3 video assets/examples/robot_unitree.mp4 \
    --fps 15 \
    --use-backend \
    --export-dir ${GALLERY_DIR}/TEST_BACKEND/robo \
    --export-format glb-feat_vis \
    --feat-vis-fps 15 \
    --process-res-method lower_bound_resize \
    --export-feat "11,21,31"

# CLI auto mode without backend reuse
da3 auto assets/examples/SOH \
    --export-format glb \
    --export-dir ${GALLERY_DIR}/TEST_CLI/SOH \
    --model-dir ${MODEL_DIR}

The model architecture is defined in DepthAnything3Net, and specified with a Yaml config file located at src/depth_anything_3/configs. The input and output processing are handled by DepthAnything3. To customize the model architecture, simply create a new config file (e.g., path/to/new/config) as:

__object__:
  path: depth_anything_3.model.da3
  name: DepthAnything3Net
  args: as_params

net:
  __object__:
    path: depth_anything_3.model.dinov2.dinov2
    name: DinoV2
    args: as_params

  name: vitb
  out_layers: [5, 7, 9, 11]
  alt_start: 4
  qknorm_start: 4
  rope_start: 4
  cat_token: True

head:
  __object__:
    path: depth_anything_3.model.dualdpt
    name: DualDPT
    args: as_params

  dim_in: &head_dim_in 1536
  output_dim: 2
  features: &head_features 128
  out_channels: &head_out_channels [96, 192, 384, 768]

Then, the model can be created with the following code snippet.

from depth_anything_3.cfg import create_object, load_config

Model = create_object(load_config("path/to/new/config"))

📚 Useful Documentation

🖥️ Command Line Interface
📑 Python API

🗂️ Model Cards

Generally, you should observe that DA3-LARGE achieves comparable results to VGGT.

🗃️ Model Name	📏 Params	📊 Rel. Depth	📷 Pose Est.	🧭 Pose Cond.	🎨 GS	📐 Met. Depth	☁️ Sky Seg	📄 License
Nested
DA3NESTED-GIANT-LARGE	1.40B	✅	✅	✅	✅	✅	✅	CC BY-NC 4.0
Any-view Model
DA3-GIANT	1.15B	✅	✅	✅	✅			CC BY-NC 4.0
DA3-LARGE	0.35B	✅	✅	✅				CC BY-NC 4.0
DA3-BASE	0.12B	✅	✅	✅				Apache 2.0
DA3-SMALL	0.08B	✅	✅	✅				Apache 2.0

Monocular Metric Depth
DA3METRIC-LARGE	0.35B	✅				✅	✅	Apache 2.0

Monocular Depth
DA3MONO-LARGE	0.35B	✅					✅	Apache 2.0

📝 Citations

If you find Depth Anything 3 useful in your research or projects, please cite our work:

@article{depthanything3,
  title={Depth Anything 3: Recovering the visual space from any views},
  author={Haotong Lin and Sili Chen and Jun Hao Liew and Donny Y. Chen and Zhenyu Li and Guang Shi and Jiashi Feng and Bingyi Kang},
  journal={arXiv preprint arXiv:2511.10647},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
assets/examples		assets/examples
docs		docs
notebooks		notebooks
src/depth_anything_3		src/depth_anything_3
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
gui.py		gui.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DAV3 GUI Setup Instructions:

Depth Anything 3: Recovering the Visual Space from Any Views

📰 News

✨ Highlights

🏆 Model Zoo

🛠️ Codebase Features

🚀 Quick Start

📦 Installation

💻 Basic Usage

📚 Useful Documentation

🗂️ Model Cards

📝 Citations

About

Uh oh!

Releases

Packages

Languages

License

enoky/Depth-Anything-3-GUI

Folders and files

Latest commit

History

Repository files navigation

DAV3 GUI Setup Instructions:

Depth Anything 3: Recovering the Visual Space from Any Views

📰 News

✨ Highlights

🏆 Model Zoo

🛠️ Codebase Features

🚀 Quick Start

📦 Installation

💻 Basic Usage

📚 Useful Documentation

🗂️ Model Cards

📝 Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages