Skip to content

fix: keep sampling args per token#2603

Draft
samsja wants to merge 1 commit into
feat/top-k-p-samplingfrom
fix/per-token-sampling-args
Draft

fix: keep sampling args per token#2603
samsja wants to merge 1 commit into
feat/top-k-p-samplingfrom
fix/per-token-sampling-args

Conversation

@samsja
Copy link
Copy Markdown
Member

@samsja samsja commented May 23, 2026

Summary

  • Keep rollout sampling args per token by carrying completion_temperatures, completion_top_ks, and completion_top_ps through TrainingSample / MicroBatch.
  • Allow packed microbatches to concatenate samples with different sampling args while preserving per-token replay values in the trainer.
  • Add exact top-k replay support to the chunked LM head path, including the Gemma softcapped variant; chunked heads still reject top_p < 1.0.

Stacked on #2601.

Verification

  • uv run pytest tests/unit/orchestrator/test_batch.py tests/unit/orchestrator/test_trajectories.py tests/unit/orchestrator/test_sft_trajectories.py tests/unit/orchestrator/test_teacher_logprobs.py tests/unit/train/rl/test_packer.py tests/unit/train/rl/test_fused_lm_head.py -q
  • uv run ruff check src/prime_rl/transport/types.py src/prime_rl/orchestrator/trajectories.py src/prime_rl/trainer/batch.py src/prime_rl/trainer/rl/data.py src/prime_rl/trainer/rl/loss.py src/prime_rl/trainer/rl/train.py src/prime_rl/trainer/rl/packer.py src/prime_rl/trainer/models/layers/lm_head.py src/prime_rl/trainer/models/layers/lm_head_gemma.py tests/unit/orchestrator/test_batch.py tests/unit/orchestrator/test_trajectories.py tests/unit/orchestrator/test_teacher_logprobs.py tests/unit/train/rl/test_packer.py tests/unit/train/rl/test_fused_lm_head.py
  • git diff --check
  • Direct chunked top-k sanity probe against full-logits replay

@samsja samsja changed the title Fix/per token sampling args fix: keep sampling args per token May 23, 2026
@samsja samsja changed the base branch from main to feat/top-k-p-sampling May 23, 2026 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant