Fork of FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

This is a fork of the FoundationPose repository.

Contributors: Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield

We present FoundationPose, a unified foundation model for 6D object pose estimation and tracking, supporting both model-based and model-free setups. Our approach can be instantly applied at test-time to a novel object without fine-tuning, as long as its CAD model is given, or a small number of reference images are captured. We bridge the gap between these two setups with a neural implicit representation that allows for effective novel view synthesis, keeping the downstream pose estimation modules invariant under the same unified framework. Strong generalizability is achieved via large-scale synthetic training, aided by a large language model (LLM), a novel transformer-based architecture, and contrastive learning formulation. Extensive evaluation on multiple public datasets involving challenging scenarios and objects indicate our unified approach outperforms existing methods specialized for each task by a large margin. In addition, it even achieves comparable results to instance-level methods despite the reduced assumptions.

🤖 For ROS version, please check Isaac ROS Pose Estimation, which enjoys TRT fast inference and C++ speed up.

🥇 No. 1 on the world-wide BOP leaderboard (as of 2024/03) for model-based novel object pose estimation.

Demos

Robotic Applications:

robot_mustard.mp4

AR Applications:

ar_maze_c.mp4

Results on YCB-Video dataset:

ycbv_tracking_c.mp4

Bibtex

@InProceedings{foundationposewen2024,
author        = {Bowen Wen, Wei Yang, Jan Kautz, Stan Birchfield},
title         = {{FoundationPose}: Unified 6D Pose Estimation and Tracking of Novel Objects},
booktitle     = {CVPR},
year          = {2024},
}

If you find the model-free setup useful, please also consider cite:

@InProceedings{bundlesdfwen2023,
author        = {Bowen Wen and Jonathan Tremblay and Valts Blukis and Stephen Tyree and Thomas M\"{u}ller and Alex Evans and Dieter Fox and Jan Kautz and Stan Birchfield},
title         = {{BundleSDF}: {N}eural 6-{DoF} Tracking and {3D} Reconstruction of Unknown Objects},
booktitle     = {CVPR},
year          = {2023},
}

TensorRT & Onnx Inference

Follow the instructions for data and model file structure from Data prepare section above
Download onnx weights from FoundationPose NGC Catalog and rename them as:

2024-01-11-20-02-45 -> score_model.onnx -> model_best.onnx

2023-10-28-18-33-37 -> refine_model.onnx -> model_best.onnx

Data prepare

Download all network weights from here and put them under the folder weights/. For the refiner, you will need 2023-10-28-18-33-37. For scorer, you will need 2024-01-11-20-02-45.
Download demo data and extract them under the folder demo_data/
[Optional] Download our large-scale training data: "FoundationPose Dataset"
[Optional] Download our preprocessed reference views here in order to run model-free few-shot version.

Docker Installation (recommended)

docker build --network host -f docker/dockerfile -t foundationpose .
bash docker/run_container.sh

# inside container
cd FoundationPose
bash build_all.sh

cd ..
git clone https://github.com/onnx/onnx-tensorrt.git
cd onnx-tensorrt
python3 setup.py install
  
# converting to tensorrt
# refine_model
cd ../FoundationPose
cd weights/2023-10-28-18-33-37
trtexec --onnx=./model_best.onnx --saveEngine=./model_best.plan --minShapes=input1:1x160x160x6,input2:1x160x160x6 --optShapes=input1:252x160x160x6,input2:252x160x160x6 --maxShapes=input1:252x160x160x6,input2:252x160x160x6

# score_model
cd ../2024-01-11-20-02-45
trtexec --onnx=./model_best.onnx --saveEngine=./model_best.plan --fp16 --minShapes=input1:1x160x160x6,input2:1x160x160x6 --optShapes=input1:252x160x160x6,input2:252x160x160x6 --maxShapes=input1:252x160x160x6,input2:252x160x160x6

# back to root dir
cd ../../

Run model-based demo

The paths have been set in argparse by default. If you need to change the scene, you can pass the args accordingly. By running on the demo data, you should be able to see the robot manipulating the mustard bottle. Pose estimation is conducted on the first frame, then it automatically switches to tracking mode for the rest of the video. The resulting visualizations will be saved to the debug_dir specified in the argparse. (Note the first time running could be slower due to online compilation)

# pytorch
python run_demo.py

# onnx
python run_demo.py --use_onnx

# tensorrt
python run_demo.py --use_tensorrt

Feel free to try on other objects (no need to retrain) such as driller, by changing the paths in argparse.

Run on public datasets (LINEMOD, YCB-Video)

For this you first need to download LINEMOD dataset and YCB-Video dataset.

To run model-based version on these two datasets respectively, set the paths based on where you download. The results will be saved to debug folder

python run_linemod.py --linemod_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/LINEMOD --use_reconstructed_mesh 0

python run_ycb_video.py --ycbv_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB_Video --use_reconstructed_mesh 0

To run model-free few-shot version. You first need to train Neural Object Field. ref_view_dir is based on where you download in the above "Data prepare" section. Set the dataset flag to your interested dataset.

python bundlesdf/run_nerf.py --ref_view_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB_Video/bowen_addon/ref_views_16 --dataset ycbv

Then run the similar command as the model-based version with some small modifications. Here we are using YCB-Video as example:

python run_ycb_video.py --ycbv_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB_Video --use_reconstructed_mesh 1 --ref_view_dir /mnt/9a72c439-d0a7-45e8-8d20-d7a235d02763/DATASET/YCB_Video/bowen_addon/ref_views_16

Troubleshooting

For more recent GPU such as 4090, refer to this.
For setting up on Windows, refer to this.
If you are getting unreasonable results, check this and this

Try following commands if above installation gives errors later on:

pip install onnxruntime-gpu --extra-index-url https://aiinfra.pkgs.visualstudio.com/PublicPackages/_packaging/onnxruntime-cuda-12/pypi/simple/
apt install ./nv-tensorrt-local-repo-ubuntu2204-10.11.0-cuda-12.9_1.0-1_amd64.deb

git clone https://github.com/onnx/onnx-tensorrt.git
cd onnx-tensorrt
pip install tensorrt

Training data download

Our training data include scenes using 3D assets from GSO and Objaverse, rendered with high quality photo-realism and large domain randomization. Each data point includes RGB, depth, object pose, camera pose, instance segmentation, 2D bounding box. [Google Drive].

To parse the camera params including extrinsics and intrinsics

glcam_in_cvcam = np.array([[1,0,0,0],
                        [0,-1,0,0],
                        [0,0,-1,0],
                        [0,0,0,1]]).astype(float)
W, H = camera_params["renderProductResolution"]
with open(f'{base_dir}/camera_params/camera_params_000000.json','r') as ff:
  camera_params = json.load(ff)
world_in_glcam = np.array(camera_params['cameraViewTransform']).reshape(4,4).T
cam_in_world = np.linalg.inv(world_in_glcam)@glcam_in_cvcam
world_in_cam = np.linalg.inv(cam_in_world)
focal_length = camera_params["cameraFocalLength"]
horiz_aperture = camera_params["cameraAperture"][0]
vert_aperture = H / W * horiz_aperture
focal_y = H * focal_length / vert_aperture
focal_x = W * focal_length / horiz_aperture
center_y = H * 0.5
center_x = W * 0.5

fx, fy, cx, cy = focal_x, focal_y, center_x, center_y
K = np.eye(3)
K[0,0] = fx
K[1,1] = fy
K[0,2] = cx
K[1,2] = cy

Notes

Due to the legal restrictions of Stable-Diffusion that is trained on LAION dataset, we are not able to release the diffusion-based texture augmented data, nor the pretrained weights using it. We thus release the version without training on diffusion-augmented data. Slight performance degradation is expected.

Acknowledgement

We would like to thank Jeff Smith for helping with the code release; NVIDIA Isaac Sim and Omniverse team for the support on synthetic data generation; Tianshi Cao for the valuable discussions. Finally, we are also grateful for the positive feebacks and constructive suggestions brought up by reviewers and AC at CVPR.

License

Contact

For questions, please contact Bowen Wen.

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.devcontainer		.devcontainer
assets		assets
bundlesdf		bundlesdf
docker		docker
learning		learning
mycpp		mycpp
.gitignore		.gitignore
LICENSE		LICENSE
Utils.py		Utils.py
build_all.sh		build_all.sh
build_all_conda.sh		build_all_conda.sh
datareader.py		datareader.py
estimater.py		estimater.py
offscreen_renderer.py		offscreen_renderer.py
readme.md		readme.md
requirements.txt		requirements.txt
run_demo.py		run_demo.py
run_linemod.py		run_linemod.py
run_ycb_video.py		run_ycb_video.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fork of FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Demos

Bibtex

TensorRT & Onnx Inference

Data prepare

Docker Installation (recommended)

Run model-based demo

Run on public datasets (LINEMOD, YCB-Video)

Troubleshooting

Training data download

Notes

Acknowledgement

License

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fork of FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects

Demos

Bibtex

TensorRT & Onnx Inference

Data prepare

Docker Installation (recommended)

Run model-based demo

Run on public datasets (LINEMOD, YCB-Video)

Troubleshooting

Training data download

Notes

Acknowledgement

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages