@@ -510,3 +510,33 @@ And this could then be use as follows:
510510 -ngl 99 \
511511 -v
512512```
513+
514+ ### Filtered tokens id map
515+ We should probably rename the following to:
516+ ``` c++
517+ std::unordered_map<llama_seq_id, ggml_tensor*> t_backend_sampled_logits;
518+ std::unordered_map<llama_seq_id, ggml_tensor*> t_backend_filtered_token_ids;
519+ std::unordered_map<llama_seq_id, ggml_tensor*> t_backend_sampled_tokens;
520+ std::unordered_map<llama_seq_id, ggml_tensor*> t_backned_sampled_probs;
521+ ```
522+ The filtered token id is used when a backend sampler reduces/sorts the logits
523+ and we need to be able to map back to the original token id. For example top-k
524+ sampling will select the top k logits and sort them. The distribution (dist)
525+ backend sampler computes an index into logits, which may have been filtered. If
526+ this is the case then the index is no longer an index into the models vocabulary
527+ but rather an index into the filtered logits array.
528+ For example:
529+ - Full vocab has 32000 tokens
530+ - Top-k filters to k=40 tokens with vocab IDs: [ 15234, 892, 25631, ...]
531+ - Dist sampler picks index 2
532+ - Need filtered_ids[ 2] = 25631 to get the actual token
533+
534+ And we need this map both for the dist sampler but also for when a sampler
535+ like top-k filters the logit and we need to pass these to the CPU sampler chain
536+
537+ 1 . Backend dist sampler:
538+ Maps the sampled index [ 0, k) → actual vocab token ID using ggml_get_rows(filtered_ids, idx)
539+
540+ 2 . CPU sampler chain:
541+ Uses sampled_ids[ i] to associate each filtered logit with its corresponding
542+ vocabulary token ID, so CPU samplers can work with the correct token IDs
0 commit comments