Skip to content

Add Support for Frequency Penalties in On Device Sampling #523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

quic-sanising
Copy link
Contributor

✨ Add Frequency Penalty Support to On Device Sampling

This PR adds support for the frequency_penalty parameter in On Device Sampling for QEffForCausalLM models. This parameter adjusts token selection based on how often tokens have already appeared in the generated output:

  • Positive values discourage repetition and promote diversity.
  • Negative values encourage repetition.
  • Zero disables the penalty.

The implementation tracks token frequencies directly on the QAIC device using optimized scratch buffers, ensuring minimal overhead and maintaining high throughput. This feature integrates seamlessly with the existing include_sampler=True workflow and complements other supported strategies like repetition and presence penalties.

quic-sanising and others added 17 commits June 18, 2025 13:38
Signed-off-by: quic-sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: sanising <[email protected]>
Signed-off-by: quic-sanising <[email protected]>
@quic-sanising
Copy link
Contributor Author

quic-sanising commented Jul 24, 2025

Depends on PR #463.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant