Skip to content

Commit bc4be79

Browse files
committed
docs: add note about filtered token id map in gpu sampling
1 parent d28f622 commit bc4be79

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

notes/llama.cpp/gpu-sampling.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -510,3 +510,33 @@ And this could then be use as follows:
510510
-ngl 99 \
511511
-v
512512
```
513+
514+
### Filtered tokens id map
515+
We should probably rename the following to:
516+
```c++
517+
std::unordered_map<llama_seq_id, ggml_tensor*> t_backend_sampled_logits;
518+
std::unordered_map<llama_seq_id, ggml_tensor*> t_backend_filtered_token_ids;
519+
std::unordered_map<llama_seq_id, ggml_tensor*> t_backend_sampled_tokens;
520+
std::unordered_map<llama_seq_id, ggml_tensor*> t_backned_sampled_probs;
521+
```
522+
The filtered token id is used when a backend sampler reduces/sorts the logits
523+
and we need to be able to map back to the original token id. For example top-k
524+
sampling will select the top k logits and sort them. The distribution (dist)
525+
backend sampler computes an index into logits, which may have been filtered. If
526+
this is the case then the index is no longer an index into the models vocabulary
527+
but rather an index into the filtered logits array.
528+
For example:
529+
- Full vocab has 32000 tokens
530+
- Top-k filters to k=40 tokens with vocab IDs: [15234, 892, 25631, ...]
531+
- Dist sampler picks index 2
532+
- Need filtered_ids[2] = 25631 to get the actual token
533+
534+
And we need this map both for the dist sampler but also for when a sampler
535+
like top-k filters the logit and we need to pass these to the CPU sampler chain
536+
537+
1. Backend dist sampler:
538+
Maps the sampled index [0, k) → actual vocab token ID using ggml_get_rows(filtered_ids, idx)
539+
540+
2. CPU sampler chain:
541+
Uses sampled_ids[i] to associate each filtered logit with its corresponding
542+
vocabulary token ID, so CPU samplers can work with the correct token IDs

0 commit comments

Comments
 (0)