Feature Request: Support (Huawei) Pangu Pro 72B MoE Model

### Prerequisites

- [x] I am running the latest code. Mention the version if possible as well.
- [x] I carefully followed the [README.md](https://github.com/ggml-org/llama.cpp/blob/master/README.md).
- [x] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/ggml-org/llama.cpp/discussions), and have a new and useful enhancement to share.

### Feature Description

Support for a new model from Huawei: https://huggingface.co/IntervitensInc/pangu-pro-moe-model

https://gitcode.com/ascend-tribe/pangu-pro-moe-model




### Motivation

It's seemingly optimized for even multi-device inference: 

> We proposed a new type of Mixture of Grouped Experts (MoGE), which groups experts in the expert selection stage and constrains tokens to activate equal experts in each group, thereby achieving natural load balancing between devices. Based on the MoGE architecture, we built a Pangu Pro MoE model with a total parameter size of 72B and an activation parameter size of 16B:

> MoGE configuration: 4 shared experts, 64 routing experts divided into 8 groups, and 1 expert activated in each group

> Pre-training: 15T

So a consistent 1 expert per devices across 8 devices, which is neat.

New models just wont stop coming!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: Support (Huawei) Pangu Pro 72B MoE Model #14486

Prerequisites

Feature Description

Motivation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature Request: Support (Huawei) Pangu Pro 72B MoE Model #14486

Description

Prerequisites

Feature Description

Motivation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions