Skip to content

DeepSeek-V3 routing bias doesn't get updated? #2494

@philip-essential

Description

@philip-essential

Bug report

I see where the routing bias is declared here, but I haven't been able to find where it gets updated. This describes the bias as "learnable", which as far as I can tell is inaccurate. Since it's only used in the input to top-k, it doesn't get a gradient.

I would have expected to see the number of tokens processed by each expert stored in the intermediates, and then in train_step update the bias based on that. Is there another way it's getting updated that I missed?

Logs/Output

No response

Environment Information

No response

Additional Context

Essential AI

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions