Skip to content

v1.1

Latest

Choose a tag to compare

@sherry-huang-997 sherry-huang-997 released this 08 Jan 13:35
· 1 commit to main since this release
6f4cb1f

This release expands SimuMax from a pure estimator into a more complete, workflow-friendly platform. It introduces a new end-user application, adds strategy search capabilities, and provides a new system-config generation pipeline with compute/communication efficiency modeling. In addition, it improves compatibility with Megatron-LM 0.14 (notably for MoE) and enhances communication modeling for hybrid parallel setups.

Highlights

  • NEW! SimuMax App (User Application):

    • Added a user-facing application to SimuMax to improve usability and streamline common workflows.
  • NEW! Strategy Search:

    • Introduced strategy search support to help users explore and identify better parallelization and execution strategies automatically.
  • NEW! System Config Pipeline:

    • Added a pipeline to generate system configuration files, including computing efficiency and communication efficiency characterization, enabling more realistic system-level modeling.

Compatibility & Modeling Improvements

  • Megatron-LM 0.14 Support (MoE Updates):

    • Added support for Megatron-LM v0.14.

    • Updated MoE communication behavior: router probabilities are transferred via a separate all-to-all, which:

      • introduces a small additional communication cost,
      • but reduces GPU memory usage.
  • Improved Bandwidth Contention Modeling (Hybrid Parallelism):

    • For cases using EP/TP + DP simultaneously, added modeling of inter-node bandwidth contention caused by multiple DP groups competing for network bandwidth.