R3 delta replay picks (no configs)#2648
Open
samsja wants to merge 3 commits into
Open
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9a99ff5. Configure here.
Squashed from origin/r3-delta (tip 5c94833, which extends the earlier 3799bda with 'Support branched routed expert deltas' for cases where the routed-experts payload diverges across siblings in a group). Adapts delta replay to main's deferred routed-experts chunk concat: first step starts at 0; extended steps use prefix_len - 1; row 0 fills the boundary, remaining rows append as the new suffix. Bumps router wheel pin to local-path. Bumps deps/verifiers gitlink to d39cc5876. Co-Authored-By: S1ro1 <matej.sirovatka@gmail.com>
The first-match-wins loop over active_samples picks the wrong sample when one active prefix is a strict prefix of another. This can happen after a compaction/rollback step whose prompt is shorter than an existing sample's prefix and whose completion re-generates the same tokens and extends past them: the new sample's prefix then starts with the older sample's prefix, and any later step that extends the new sample also satisfies the slice check against the older one. When that happens, extend_sample folds the newer sample's generated tokens into the older sample as user-input tokens (mask=False, logprob=0) and leaves the newer sample stale -- a silent Exact-Prefix invariant violation. Switch to longest-match: strictly more specific, never worse than first-match when only one prefix matches. Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit 0e239d1)
When more than one active prefix matches a step's prompt, log a warning with the example id, step index, set of matching prefix lengths, total active prefixes, and the prompt length. Longest-match still picks the correct extension; the warning just surfaces the rare ambiguous case so it's debuggable if it starts showing up in real rollouts (e.g. from compaction/rollback turns). Co-authored-by: Cursor <cursoragent@cursor.com> (cherry picked from commit ca38614)
9a99ff5 to
fc9fbaf
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Rebased version of #2647 on top of main, with unrelated changes removed.
Commits
Removed from the original PR
configs/debug/qwen3_30b_a3b_pd_*_router_replay*.tomldebug configs.Handle token export mkdir racescommit — unrelated concern, belongs in its own PR.skills/configs/SKILL.mdRLM-SWE note — unrelated.rlm-swe/wordleworkspace entries inpyproject.toml— unrelated.uv.lockchurn from the above (reverted to main).Dependencies
deps/verifiersbumped tod39cc5876(as in R3 delta replay picks #2647).vllm-routerpin switched to a local-path wheel that contains the router-sidestartfield plumbing (counterpart to Merge routed experts deltas with start offsets router#37). This needs to flip back to a release URL once router#37 is merged and a wheel is published.Note
High Risk
Changes how training samples merge tokens, masks, and MoE routing tensors; wrong prefix or delta stitching would silently corrupt RL training data, though behavior is heavily covered by new unit tests.
Overview
Adds routed-experts delta replay end-to-end: compact payloads now carry a
startindex (fromrouted_experts_prompt_startat inference capture through orchestrator merge), so multi-step and branched rollouts can stitch partial vLLM expert tensors instead of mis-aligning rows.interleave_rolloutis updated to extend the longest matching active token prefix (fixes compaction/rollback where a shorter prefix wrongly absorbed a longer sample’s completions) and to warn when multiple prefixes match. New-branch steps can replay routed-expert chunks from a matching prior prefix state whenstart > 0.Dependency:
vllm-routeris wired from a local wheel underthird_party/router/dist/instead of a release URL.Unit tests cover longest-prefix matching, multi-step/branch expert alignment, and the
startfield on serialized experts.Reviewed by Cursor Bugbot for commit fc9fbaf. Bugbot is set up for automated code reviews on this repo. Configure here.