Skip to content

rioyokotalab/optimal-sparsity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks

📄 [Paper] | 🤗 [Hugging Face Math checkpoints] 🤗 [Hugging Face Code checkpoints] 💻 [Code] | 📊 [Log] |

TODOs

  • Logs
  • Training Scripts

Quickstart

Pre-training

Follow the environment setup instructions of NVIDIA/Megatron-LM.

All training data used in this work are publicly available. Please refer to the paper for details. We sincerely thank the contributors who made these datasets publicly accessible.

Post-training

Follow the environment setup instructions of volcengine/verl

Evaluation

  • We use lm-evaluation-harness.

    • For code tasks, we use commit 82a9936.
    • For other evaluations, we use commit 1044db9.
git clone https://github.com/EleutherAI/lm-evaluation-harness
cd lm-evaluation-harness/
git checkout 1044db9
pip install -e .
pip install vllm

You can reproduce the results by running scripts/lm-evaluation-harness/math_eval.sh For code evaluation, use scripts/lm-evaluation-harness/code_eval.sh

Acknowledgement

We would like to express our sincere gratitude to the developers and maintainers of the following open-source libraries. Their contributions and the fact that these codebases are publicly available have been essential for conducting this research.

Citation

@article{nakamura2025optimalsparsitymixtureofexpertslanguage,
      title={Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks},
      author={Taishi Nakamura and Satoki Ishikawa and Masaki Kawamura and Takumi Okamoto and Daisuke Nohara and Jun Suzuki and Rio Yokota},
      year={2025},
      eprint={2508.18672},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2508.18672},
}

About

Official implementation of "Optimal Sparsity of Mixture-of-Experts Language Models for Reasoning Tasks"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published