Toyota Motor Europe NV/SA and its affiliates retain all intellectual property and proprietary rights in and to this software, related documentation and any modifications thereto. Any use, reproduction, disclosure or distribution of this software and related documentation without an express license agreement from Toyota Motor Europe NV/SA is strictly prohibited.
Repository providing the source code for the paper
Robotic Task Ambiguity Resolution via Natural Language Interaction
Eugenio Chisari, Jan Ole von Hartz, Fabien Despinoy, Abhinav Valada
Please cite the paper as follows:
@article{chisari2025robotic,
title={Robotic Task Ambiguity Resolution via Natural Language Interaction},
author={Chisari, Eugenio and von Hartz, Jan Ole and Despinoy, Fabien and Valada, Abhinav},
journal = {IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)},
year={2025}
}
conda create --name ambres_env python=3.10
conda activate ambres_env
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
pip install -e .bash bash/download_weights.sh
pip install git+https://github.com/facebookresearch/sam2.gitDownload the dataset from this link. Then extract it under the folder ~/datasets/.
To download the pre-trained checkpoints run
bash bash/download_ckpts.shNote that inference will require a GPU with at least 20 GB of RAM. To reproduce all results reported in Table I of the paper, run the following evaluations:
python scripts/evaluate_ambres.py --env real --model_type prompt
python scripts/evaluate_ambres.py --env sim --model_type prompt
python scripts/evaluate_ambres.py --env real --model_type finetune
python scripts/evaluate_ambres.py --env sim --model_type finetune
python scripts/evaluate_knowno.py --env real
python scripts/evaluate_knowno.py --env simNote that training will require GPUs with at least 46 GB of RAM. To start a training run use the following command:
deepspeed --include localhost:0,1 scripts/train.py --env real-
The
--include localhost:0,1flag is used to limit training to GPUs 0 and 1. Leave this out if you wish to use all GPUs available. See this doc for more information. -
The
--envflag can take argumentsimorreal. This flag determines which dataset you will train the model on: either on the simplified simulated images, or the real world images. -
If someone else is using distributed training, you will need to change the port:
deepspeed --master_port 12344 --include localhost:0,1 scripts/train.py
-
Note that to evaluate your own models, you will have to change the checkpoint name at
ambiguity_resolution/ambres/__init__.py, in theCKPTclass.
See the LICENSE file for details about the license under which this code is made available.