Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

📄 [Paper] | 🤗 [Hugging Face Math checkpoints] 🤗 [Hugging Face Code checkpoints] 💻 [Code] | 📊 [Log] |

TODOs

Logs
Training Scripts

Quickstart

Pre-training

Follow the environment setup instructions of NVIDIA/Megatron-LM.

All training data used in this work are publicly available. Please refer to the paper for details. We sincerely thank the contributors who made these datasets publicly accessible.

Post-training

Follow the environment setup instructions of volcengine/verl

Evaluation

We use lm-evaluation-harness.
- For code tasks, we use commit 82a9936.
- For other evaluations, we use commit 1044db9.

git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness/
git checkout 1044db9
pip install -e .
pip install vllm

You can reproduce the results by running scripts/lm-evaluation-harness/math_eval.sh For code evaluation, use scripts/lm-evaluation-harness/code_eval.sh

For task loss evaluation, please follow the README in the taskloss-eval directory:
taskloss-eval/README.md
For test-time compute, please refer to the following script:
evaluate_gsm8k.sh

Acknowledgement

We would like to express our sincere gratitude to the developers and maintainers of the following open-source libraries. Their contributions and the fact that these codebases are publicly available have been essential for conducting this research.

Citation

@article{nakamura2025optimalsparsitymixtureofexpertslanguage,
      title={Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks},
      author={Taishi Nakamura and Satoki Ishikawa and Masaki Kawamura and Takumi Okamoto and Daisuke Nohara and Jun Suzuki and Rio Yokota},
      year={2025},
      eprint={2508.18672},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.18672},
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
post-training		post-training
pre-training		pre-training
scripts		scripts
taskloss-eval		taskloss-eval
test-time-compute		test-time-compute
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

TODOs

Quickstart

Pre-training

Post-training

Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

License

rioyokotalab/optimal-sparsity

Folders and files

Latest commit

History

Repository files navigation

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

TODOs

Quickstart

Pre-training

Post-training

Evaluation

Acknowledgement

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages