Skip to content

zhijie-group/R1-Zero-VSI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Improved Visual-Spatial Reasoning via R1-Zero-Like Training

Zhenyi Liao, Qingsong Xie, Yanhao Zhang, Zijian Kong, Haonan Lu, Zhenyu Yang, Zhijie Deng

📅 News

  • 🚀 [04/02/2025] We release VSI-100k on Huggingface.
  • 🚀 [04/02/2025] We release our paper on arxiv.

🌞 Highlights

🔔 We incorporate GRPO training for improved visual-spatial reasoning, using the carefully curated VSI-100k dataset.

🔔 With GRPO training, our vsGRPO-2B outperforms GPT-4o, and the vsGRPO-7B demonstrates performance comparable to the best open-source model, LLaVA-Video-Next-72B.

🤗 VSI-100k

To combat the data scarity, we build VSI-100k. Specifically, with the ScanNet 3D annotation information, we construct approximately 100k question-answer pairs.

🔍 Experiments

Our vsGRPO-2B outperforms GPT-4o, and the vsGRPO-7B demonstrates performance comparable to the best open-source model, LLaVA-Video-Next-72B.

✒️Citation

If you find our work and the dataset useful, please cite:

@article{liao2025improved,
  title={Improved visual-spatial reasoning via r1-zero-like training},
  author={Liao, Zhenyi and Xie, Qingsong and Zhang, Yanhao and Kong, Zijian and Lu, Haonan and Yang, Zhenyu and Deng, Zhijie},
  journal={arXiv preprint arXiv:2504.00883},
  year={2025}
}

📄 License

Code License Data License Usage and License Notices: The data and code are intended and licensed for research use only. License: Attribution-NonCommercial 4.0 International It should abide by the policy of OpenAI: https://openai.com/policies/terms-of-use

Acknowledgement

We sincerely thank projects R1-V and ScanNet, based on which we build our project. We also thank trl, Qwen2-VL, vllm for their open-source techniques.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •