Min P style sampling - an alternative to Top P/TopK

### Feature request

This is a sampler method already present in other LLM inference backends that aims to simplify the truncation process & help accomodate for the flaws/failings of Top P & Top K. 
**Min P**.

![image](https://github.com/huggingface/transformers/assets/66376113/53113071-20ed-43bf-a8ae-c9f083840d96)

What Min P is doing is simple: we are setting a minimum percentage value that a token must reach to be considered during sampling. However, this is not a hard limit. The minimum will 'scale' based on the top token's probability. So, if you have a Min P value of 0.1 (for example), that would mean your base Min P requirement is 10%. So if your top token is 25%, that means it will only consider tokens that have at least 2.5% probability.

This method subjectively seems to improve results across the board with no noticeable downside, and has been merged into the following FOSS LLM backends:
- [llama.cpp](https://github.com/ggerganov/llama.cpp/pull/3841)
- [vllm](https://github.com/vllm-project/vllm/pull/1642)
- [text-generation-webui](https://github.com/oobabooga/text-generation-webui/pull/4701) (through both the HF loaders and llama-cpp-python)

I would suggest a default of 0.05.

### Motivation

I noticed certain 'flaws' in the popular Top P sampling method:
- When the model does not have sufficient confidence/concentration on the next token candidate(s), it's possible for the sampler to consider many tokens that are _highly_ unlikely compared to the few choices it has confidence in.
- Top K helps limit the amount of 'low confidence' tokens period as a supplement to Top P, but this often comes at a cost of token choice diversity (often arbitrarily).
- In addition to this, Top P can sometimes cut reasonable tokens. What if there's a 90.1% probability token, followed by a 9% probability token? A Top P value of 0.90 would completely gloss over the 9% token in this instance.

![image](https://github.com/huggingface/transformers/assets/66376113/42bb3624-2600-41d7-a9b9-e2c4a25cab52)

For this reason I made Min P which seems to have positive reception across the board.

### Your contribution

I may consider making a PR for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Min P style sampling - an alternative to Top P/TopK #27670

Feature request

Motivation

Your contribution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Min P style sampling - an alternative to Top P/TopK #27670

Description

Feature request

Motivation

Your contribution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions