This release expands SimuMax from a pure estimator into a more complete, workflow-friendly platform. It introduces a new end-user application, adds strategy search capabilities, and provides a new system-config generation pipeline with compute/communication efficiency modeling. In addition, it improves compatibility with Megatron-LM 0.14 (notably for MoE) and enhances communication modeling for hybrid parallel setups.
Highlights
-
NEW! SimuMax App (User Application):
- Added a user-facing application to SimuMax to improve usability and streamline common workflows.
-
NEW! Strategy Search:
- Introduced strategy search support to help users explore and identify better parallelization and execution strategies automatically.
-
NEW! System Config Pipeline:
- Added a pipeline to generate system configuration files, including computing efficiency and communication efficiency characterization, enabling more realistic system-level modeling.
Compatibility & Modeling Improvements
-
Megatron-LM 0.14 Support (MoE Updates):
-
Added support for Megatron-LM v0.14.
-
Updated MoE communication behavior: router probabilities are transferred via a separate all-to-all, which:
- introduces a small additional communication cost,
- but reduces GPU memory usage.
-
-
Improved Bandwidth Contention Modeling (Hybrid Parallelism):
- For cases using EP/TP + DP simultaneously, added modeling of inter-node bandwidth contention caused by multiple DP groups competing for network bandwidth.