WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation

Quanjian Song^1,2*, Yiren Song^1,2*, Kelly Peng², Yuan Gao², Mike Zheng Shou^1,†

¹Show Lab, National University of Singapore ²First Intelligence
_{^*Equal contribution.
^†Corresponding author.}

🎉 News

• 2025.12: 🔥 Our paper, training code, and project page are released.

🎬 Teaser

TL;DR: We propose WorldWander, an in-context learning framework for translating between egocentric and exocentric worlds in video generation. We also release EgoExo-8K, a large-scale dataset containing synchronized egocentric–exocentric triplets. The teaser is shown below:

📖 Overview

Video diffusion models have recently achieved remarkable progress in realism and controllability. However, achieving seamless video translation across different perspectives, such as first-person (egocentric) and third-person (exocentric), remains underexplored. Bridging these perspectives is crucial for filmmaking, embodied AI, and world models. Motivated by this, we present WorldWander, an in-context learning framework tailored for translating between egocentric and exocentric worlds in video generation. Building upon advanced video diffusion transformers, WorldWander integrates (i) In-Context Perspective Alignment and (ii) Collaborative Position Encoding to efficiently model cross-view synchronization. Overall framework is shown below:

🤗 Datasets

To further support our task, we curate EgoExo-8K, a large-scale dataset containing synchronized egocentric–exocentric triplets from both synthetic and real-world scenarios. We show some examples below:

🔧 Environment

git clone https://github.com/showlab/WorldWander.git
# Installation with the requirement.txt
conda create -n WorldWander python=3.10
conda activate WorldWander
pip install -r requirements.txt
# Installation with environment.yml
conda env create -f environment.yml
conda activate WorldWander

🚀 Try Inference

WorldWander is trained on the wan2.2-TI2V-5B model using 4 H200 GPUs, with a batch size of 4 per GPU. To make it easier for you to use directly, we provide the following checkpoints for different tasks:

Models	Links	configs
wan2.2-TI2V-5B_three2one_synthetic	🤗 Huggingface	configs/wan2-2_lora_three2one_synthetic.yaml
wan2.2-TI2V-5B_one2three_synthetic	🤗 Huggingface	configs/wan2-2_lora_one2three_synthetic.yaml
wan2.2-TI2V-5B_three2one_realworld	🤗 Huggingface	configs/wan2-2_lora_three2one_realworld.yaml
wan2.2-TI2V-5B_one2three_realworld	🤗 Huggingface	configs/wan2-2_lora_one2three_realworld.yaml

You can download the specific checkpoint above and specify the corresponding config file for inference. For convenience, we have provided the following example script:

bash scripts/inference_wan2.sh

Note that the parameter ckpt_path needs to be updated to the path of the checkpoint you downloaded. It is recommended to run this code on a GPU with 80GB of VRAM to avoid out of memory.

🔥 Custom Training

You can also train on your custom dataset. To achieve this, you first need to adjust the first_video_root, third_video_root, ref_image_root, and other parameters in corresponding config file. If necessary, you may need to modify the CustomTrainDataset class in dataset/custom_dataset.py according to the attributes of your own dataset. For convenience, we have also provided the following training script:

bash scripts/train_wan2.sh

🤝 Acknowledgements

🙏 This codebase borrows parts from DiffSynth-Studio and the Wan2.2. Many thanks to them for their open-source contributions. I also want to thank my co–first author for his trust and support; and to anonymously thank the senior who taught me PyTorch Lightning, enabling me to build training code from scratch on my own.

🎓 Bibtex

👋 If you find this code useful for your research, we would appreciate it if you could cite:

@article{song2025worldwander,
  title={WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation},
  author={Song, Quanjian and Song, Yiren and Peng, Kelly and Gao, Yuan and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2511.22098},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
assets		assets
configs		configs
datasets		datasets
examples		examples
models/wan2		models/wan2
scripts		scripts
src		src
tools		tools
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation

🎉 News

🎬 Teaser

📖 Overview

🤗 Datasets

🔧 Environment

🚀 Try Inference

🔥 Custom Training

🤝 Acknowledgements

🎓 Bibtex

About

Uh oh!

Releases

Packages

Languages

showlab/WorldWander

Folders and files

Latest commit

History

Repository files navigation

WorldWander: Bridging Egocentric and Exocentric Worlds in Video Generation

🎉 News

🎬 Teaser

📖 Overview

🤗 Datasets

🔧 Environment

🚀 Try Inference

🔥 Custom Training

🤝 Acknowledgements

🎓 Bibtex

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages