Improved Visual-Spatial Reasoning via R1-Zero-Like Training

Zhenyi Liao, Qingsong Xie, Yanhao Zhang, Zijian Kong, Haonan Lu, Zhenyu Yang, Zhijie Deng

📅 News

🚀 [04/02/2025] We release VSI-100k on Huggingface.
🚀 [04/02/2025] We release our paper on arxiv.

🌞 Highlights

🔔 We incorporate GRPO training for improved visual-spatial reasoning, using the carefully curated VSI-100k dataset.

🔔 With GRPO training, our vsGRPO-2B outperforms GPT-4o, and the vsGRPO-7B demonstrates performance comparable to the best open-source model, LLaVA-Video-Next-72B.

🤗 VSI-100k

To combat the data scarity, we build VSI-100k. Specifically, with the ScanNet 3D annotation information, we construct approximately 100k question-answer pairs.

🔍 Experiments

Our vsGRPO-2B outperforms GPT-4o, and the vsGRPO-7B demonstrates performance comparable to the best open-source model, LLaVA-Video-Next-72B.

✒️Citation

If you find our work and the dataset useful, please cite:

@article{liao2025improved, title={Improved visual-spatial reasoning via r1-zero-like training}, author={Liao, Zhenyi and Xie, Qingsong and Zhang, Yanhao and Kong, Zijian and Lu, Haonan and Yang, Zhenyu and Deng, Zhijie}, journal={arXiv preprint arXiv:2504.00883}, year={2025} }

📄 License

Usage and License Notices: The data and code are intended and licensed for research use only. License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use

Acknowledgement

We sincerely thank projects R1-V and ScanNet, based on which we build our project. We also thank trl, Qwen2-VL, vllm for their open-source techniques.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

📅 News

🌞 Highlights

🤗 VSI-100k

🔍 Experiments

✒️Citation

📄 License

Acknowledgement

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

License

zhijie-group/R1-Zero-VSI

Folders and files

Latest commit

History

Repository files navigation

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

📅 News

🌞 Highlights

🤗 VSI-100k

🔍 Experiments

✒️Citation

📄 License

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Packages