-
Notifications
You must be signed in to change notification settings - Fork 431
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Bug report
I see where the routing bias is declared here, but I haven't been able to find where it gets updated. This describes the bias as "learnable", which as far as I can tell is inaccurate. Since it's only used in the input to top-k, it doesn't get a gradient.
I would have expected to see the number of tokens processed by each expert stored in the intermediates, and then in train_step update the bias based on that. Is there another way it's getting updated that I missed?
Logs/Output
No response
Environment Information
No response
Additional Context
Essential AI
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working