Skip to content

Conversation

finbarrtimbers
Copy link
Collaborator

@finbarrtimbers finbarrtimbers commented Sep 19, 2025

Originally, we didn't scale by the number of tokens. We then added support to our mean loss implementation to track the number of tokens. Since our mean loss implementation now does track the number of tokens correctly, I have removed the (incorrect) sum loss implementation.

Basically we found this during the prior summer and @hamishivi added reduce_loss="sum" as a cheap hack around it. But now it's all supported much nicer.

Fixes #995.

@finbarrtimbers finbarrtimbers changed the base branch from main to refactor-tests September 19, 2025 17:00
@finbarrtimbers finbarrtimbers changed the base branch from refactor-tests to main September 19, 2025 17:01
@finbarrtimbers finbarrtimbers marked this pull request as ready for review September 19, 2025 17:09
Copy link
Collaborator

@hamishivi hamishivi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@finbarrtimbers finbarrtimbers added this pull request to the merge queue Sep 19, 2025
Merged via the queue into main with commit bb98dbc Sep 19, 2025
3 checks passed
@finbarrtimbers finbarrtimbers deleted the scale-reduce-loss branch September 19, 2025 18:09
@finbarrtimbers finbarrtimbers restored the scale-reduce-loss branch September 19, 2025 18:09
@finbarrtimbers finbarrtimbers deleted the scale-reduce-loss branch September 19, 2025 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

reduce_sum == "sum" missing scaling factor
2 participants