Skip to content

fix(orchestrator): match longest active prefix in interleave_rollout#2634

Merged
S1ro1 merged 1 commit into
r3-deltafrom
fix/interleave-longest-prefix
May 27, 2026
Merged

fix(orchestrator): match longest active prefix in interleave_rollout#2634
S1ro1 merged 1 commit into
r3-deltafrom
fix/interleave-longest-prefix

Conversation

@mikasenghaas
Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas commented May 25, 2026

Summary

  • Fix interleave_rollout's prefix-matching loop to pick the longest matching active prefix instead of the first one (break-on-first).
  • Warn when more than one active prefix matches the same step, with the example id, step index, matching prefix lengths, total active prefixes, and step prompt length — surfaces the rare ambiguous case so it's auditable in real rollouts.
  • Add a regression test that constructs a 4-step trajectory exposing the bug.

Targeted at #2632 (r3-delta).

Bug

The loop at src/prime_rl/orchestrator/trajectories.py walked active_samples in creation order and took the first prefix that step_prompt_ids started with. The slice check step_prompt_ids[:len(P)] == P is not mutually exclusive across active samples: if one active prefix is a strict prefix of another, both satisfy the check and the loop returns the older/shorter one.

This is reachable from a compaction/rollback turn whose prompt is shorter than an existing sample's prefix and whose completion re-generates the same tokens and then keeps going. Walking through it:

  • step 0: prompt=[1,2], completion=[3,4] -> sample A, P_A=[1,2,3,4]
  • step 1 (extends A): prompt=[1,2,3,4,5], completion=[6] -> P_A=[1,2,3,4,5,6]
  • step 2 (rollback): prompt=[1,2], completion=[3,4,5,6,7]. The prompt is shorter than P_A so the slice check is false-by-length -> new sample B, P_B=[1,2,3,4,5,6,7]. Note P_B starts with P_A by construction.
  • step 3 (extends B): prompt=[1,2,3,4,5,6,7,8], completion=[9].
    • Loop hits P_A=[1,2,3,4,5,6] first: step_prompt[:6] == P_A -> match, break. Wrong — P_B is the actual extension.

When the bug triggers, extend_sample appends the leftover tokens (B's generated tokens [7] plus step 3's new prompt [8]) to A's completion_ids with completion_mask=False and completion_logprobs=0.0. Token 7 was generated by the model in step 2 with a real logprob, but it ends up filed in A as user input. B is left stale and never receives step 3.

This is a silent violation of the Exact-Prefix invariant documented in docs/trajectories.md — the very class of corruption interleaving is supposed to refuse by starting a new sample.

Fix

Track the longest matching prefix instead of breaking on the first match. Longest-match is strictly better:

  • When only one prefix matches, both strategies pick it.
  • When two match (i.e. shorter is a prefix of longer), the longer is the unambiguously more specific extension.
matched_idx = None
matched_len = -1
matching_prefix_lens: list[int] = []
for idx, (prefix_tokens, _, _) in enumerate(active_samples):
    pl = len(prefix_tokens)
    if step_prompt_ids[:pl] == prefix_tokens:
        matching_prefix_lens.append(pl)
        if pl > matched_len:
            matched_idx = idx
            matched_len = pl

if len(matching_prefix_lens) > 1:
    logger.warning(
        f"Ambiguous prefix match at step {step_idx} for example {output['example_id']}: "
        f"{len(matching_prefix_lens)} of {len(active_samples)} active prefixes match "
        f"(lens={sorted(matching_prefix_lens)}, step_prompt_len={len(step_prompt_ids)}). "
        f"Extending the longest (len={matched_len})."
    )

Example warning produced by the regression test:

Ambiguous prefix match at step 3 for example 2: 2 of 2 active prefixes match (lens=[6, 7], step_prompt_len=8). Extending the longest (len=7).

Verification

  • New regression test (test_interleave_rollout_picks_longest_matching_prefix) constructs the 4-step trajectory above and asserts:
    • sample A is [3, 4, 5, 6] with mask [T, T, F, T] (steps 0+1 only),
    • sample B is [3, 4, 5, 6, 7, 8, 9] with mask [T, T, T, T, T, F, T] (steps 2+3 merged, token 7 stays a generated token).
  • Confirmed it fails on unpatched r3-delta: sample A gets [3, 4, 5, 6, 7, 8, ...] (B's generated 7 folded in as user input).
  • uv run pytest tests/unit/orchestrator/test_trajectories.py -q: 22 passed.
  • uv run ruff check src/prime_rl/orchestrator/trajectories.py tests/unit/orchestrator/test_trajectories.py: clean.

Note

Medium Risk
Changes how rollout steps are merged into training samples on the orchestrator path; wrong behavior previously corrupted masks/logprobs silently, but impact is limited to rare multi-prefix cases.

Overview
Fixes interleave_rollout so when a trajectory step matches more than one active token prefix, it extends the longest match instead of stopping at the first. That matters after compaction/rollback, where one sample’s prefix can be a strict prefix of another’s—first-match could merge later steps into the wrong sample and treat regenerated tokens as non-trainable prompt tokens.

When multiple prefixes match, the orchestrator now logs a warning (example id, step, matching lengths) so ambiguous cases are visible in production rollouts. A regression test covers the rollback + overlapping-prefix scenario and asserts correct completion_ids, masks, and logprobs per sample.

Reviewed by Cursor Bugbot for commit b62f65e. Bugbot is set up for automated code reviews on this repo. Configure here.

@S1ro1 S1ro1 force-pushed the fix/interleave-longest-prefix branch from ca38614 to b62f65e Compare May 27, 2026 23:47
@S1ro1 S1ro1 marked this pull request as ready for review May 27, 2026 23:49
@S1ro1 S1ro1 merged commit b62f65e into r3-delta May 27, 2026
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants