Skip to content

showlab/Mitty

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mitty

Mitty: Diffusion-based Human-to-Robot Video Generation
Yiren Song, Cheng Liu, Weijia Mao, and Mike Zheng Shou
Show Lab, National University of Singapore

arXiv HuggingFace HuggingFace HuggingFace HuggingFace


🔧 Environment & Installation

1. Create environment

conda create -n mitty python=3.10 -y
conda activate mitty

2. Install dependencies

pip install -r requirements.txt

📦 HuggingFace Models & Datasets

1. Pretrained model

The fine-tuned Mitty models will be available at:

  • Model:
    • https://huggingface.co/showlab/Mitty_Model

2. Dataset

The paired human–robot dataset will be released as a HuggingFace dataset:

  • Dataset:
    • https://huggingface.co/datasets/showlab/Mitty_Dataset

A recommended format is:

dataset/
  ├── human/
  │   ├── xxx_00001.mp4
  │   ├── xxx_00001.txt # prompt
  │   └── ...
  ├── robot/
  │   ├── xxx_00001.mp4
  │   └── ...

🚀 Training & 🎬 Inference

We provide simple shell scripts to launch training and inference.

1. Training

Edit _scripts/train.sh to set your dataset paths, output directory, and training hyperparameters.
Then run:

_scripts/train.sh

This will start training Mitty on the paired human–robot dataset.

2. Inference

Edit _scripts/inference.sh to point to your trained checkpoint (or the released pretrained model) and specify the input human video / prompt and output directory.
Then run:

_scripts/inference.sh

This will generate corresponding robot videos from the human inputs using the Mitty model.


📚 Citation

If you use this codebase or the released models / dataset in your research, please cite the Mitty paper.

@article{mitty2025,
    title  = {Mitty: Diffusion-based Human-to-Robot Video Generation},
    author = {Yiren Song and Cheng Liu and Weijia Mao and Mike Zheng Shou},
    year   = {2025},
}

About

Official code implementation of "Mitty: Diffusion-based Human-to-Robot Video Generation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published