Skip to content

Feature Request: Support (Huawei) Pangu Pro 72B MoE Model #14486

Open
@Downtown-Case

Description

@Downtown-Case

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

Support for a new model from Huawei: https://huggingface.co/IntervitensInc/pangu-pro-moe-model

https://gitcode.com/ascend-tribe/pangu-pro-moe-model

Motivation

It's seemingly optimized for even multi-device inference:

We proposed a new type of Mixture of Grouped Experts (MoGE), which groups experts in the expert selection stage and constrains tokens to activate equal experts in each group, thereby achieving natural load balancing between devices. Based on the MoGE architecture, we built a Pangu Pro MoE model with a total parameter size of 72B and an activation parameter size of 16B:

MoGE configuration: 4 shared experts, 64 routing experts divided into 8 groups, and 1 expert activated in each group

Pre-training: 15T

So a consistent 1 expert per devices across 8 devices, which is neat.

New models just wont stop coming!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions