📄 [Paper] | 🤗 [Hugging Face Math checkpoints] 🤗 [Hugging Face Code checkpoints] 💻 [Code] | 📊 [Log] |
- Logs
- Training Scripts
Follow the environment setup instructions of NVIDIA/Megatron-LM.
All training data used in this work are publicly available. Please refer to the paper for details. We sincerely thank the contributors who made these datasets publicly accessible.
Follow the environment setup instructions of volcengine/verl
-
We use lm-evaluation-harness.
- For code tasks, we use commit
82a9936. - For other evaluations, we use commit
1044db9.
- For code tasks, we use commit
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness/
git checkout 1044db9
pip install -e .
pip install vllmYou can reproduce the results by running scripts/lm-evaluation-harness/math_eval.sh
For code evaluation, use scripts/lm-evaluation-harness/code_eval.sh
-
For task loss evaluation, please follow the README in the
taskloss-evaldirectory:
taskloss-eval/README.md -
For test-time compute, please refer to the following script:
evaluate_gsm8k.sh
We would like to express our sincere gratitude to the developers and maintainers of the following open-source libraries. Their contributions and the fact that these codebases are publicly available have been essential for conducting this research.
- NVIDIA/Megatron-LM
- volcengine/verl
- EleutherAI/lm-evaluation-harness
- ScalingIntelligence/large_language_monkeys
@article{nakamura2025optimalsparsitymixtureofexpertslanguage,
title={Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks},
author={Taishi Nakamura and Satoki Ishikawa and Masaki Kawamura and Takumi Okamoto and Daisuke Nohara and Jun Suzuki and Rio Yokota},
year={2025},
eprint={2508.18672},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2508.18672},
}