DeepSeek-V3 routing bias doesn't get updated?

### Bug report

I see where the routing bias is declared [here](https://github.com/AI-Hypercomputer/maxtext/blob/07205e24f770501214735cf80ff6358d6c831915/src/MaxText/layers/moe.py#L196-L199), but I haven't been able to find where it gets updated.  [This](https://github.com/AI-Hypercomputer/maxtext/blob/07205e24f770501214735cf80ff6358d6c831915/src/MaxText/layers/moe.py#L156) describes the bias as "learnable", which as far as I can tell is inaccurate.  Since it's only used in the input to top-k, it doesn't get a gradient.

I would have expected to see the number of tokens processed by each expert stored in the intermediates, and then in train_step update the bias based on that.  Is there another way it's getting updated that I missed?

### Logs/Output

_No response_

### Environment Information

_No response_

### Additional Context

Essential AI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DeepSeek-V3 routing bias doesn't get updated? #2494

Bug report

Logs/Output

Environment Information

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DeepSeek-V3 routing bias doesn't get updated? #2494

Description

Bug report

Logs/Output

Environment Information

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions