Skip to content

Add early-finalize continuation for truncated reasoning rollouts#475

Open
taivu1998 wants to merge 3 commits into
rllm-org:mainfrom
taivu1998:tdv/issue-267-early-finalize
Open

Add early-finalize continuation for truncated reasoning rollouts#475
taivu1998 wants to merge 3 commits into
rllm-org:mainfrom
taivu1998:tdv/issue-267-early-finalize

Conversation

@taivu1998

Copy link
Copy Markdown
Contributor

Summary

This PR adds opt-in support for early-finalizing truncated long-form generations so we can reserve answer budget inside a single completion window.

Closes #267.

What changed

  • add a workflow-level early-finalize helper that:
    • reserves a configurable tail budget from max_tokens
    • runs an initial generation with the reduced budget
    • if that generation stops due to length, optionally appends a synthetic suffix only when reasoning was cut off inside an unfinished <think>...</think> block
    • continues generation from token input for the reserved tail budget
  • add token-in/token-out continuation support to the rollout stack and timing layer for the Verl path
  • wire the new helper into the standard workflow implementations and the FinQA workflow
  • add Step.response_mask so synthetic suffix tokens remain in completion_ids for continuity while being excluded from Verl loss
  • update the Verl transform to preserve the explicit step-level response mask instead of always assuming an all-ones loss mask
  • add config knobs for rllm.early_finalize
  • add focused tests for early-finalize behavior and response-mask propagation

Design notes

  • the feature is opt-in and defaults to disabled
  • v1 stays tightly scoped to the workflow/Verl path that the issue targets
  • synthetic answer forcing is narrow by design: for non-thinking truncated outputs we simply continue from the partial completion without injecting a prefix
  • existing prompt-length guards remain unchanged; this only handles the case where a single completion runs out of response budget

Testing

  • python -m pytest tests/engine/test_early_finalize.py tests/unified_trainer/test_verl_transform.py tests/rewards/test_math_reward.py -q
  • ruff check rllm/engine/rollout/rollout_engine.py rllm/engine/rollout/verl_engine.py rllm/workflows/early_finalize.py rllm/workflows/timing_mixin.py rllm/workflows/single_turn_workflow.py rllm/workflows/multi_turn_workflow.py rllm/workflows/cumulative_workflow.py rllm/agents/agent.py rllm/experimental/verl/transform.py rllm/experimental/verl/__init__.py projects/finqa/train_finqa.py tests/engine/test_early_finalize.py tests/unified_trainer/test_verl_transform.py
  • python -m py_compile rllm/engine/rollout/rollout_engine.py rllm/engine/rollout/verl_engine.py rllm/workflows/early_finalize.py rllm/workflows/timing_mixin.py rllm/workflows/single_turn_workflow.py rllm/workflows/multi_turn_workflow.py rllm/workflows/cumulative_workflow.py rllm/agents/agent.py rllm/experimental/verl/transform.py rllm/experimental/verl/__init__.py projects/finqa/train_finqa.py

@kylemontgomery1

Copy link
Copy Markdown
Collaborator

I lean towards this being implemented at the workflow level (e.g., SingleTurnWorkflowWithEarlyFinalize) instead of a global feature in rLLM. @listar2000 What do you think?

@listar2000

Copy link
Copy Markdown
Collaborator

Will take a look thx @kylemontgomery1 for pointing this to me.

@taivu1998 taivu1998 force-pushed the tdv/issue-267-early-finalize branch from cc26d63 to 1f5ce35 Compare April 19, 2026 15:56
@taivu1998

Copy link
Copy Markdown
Contributor Author

Hi @kylemontgomery1, @listar2000, could you help review again? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[new feature request] support for truncating early if running out of context

3 participants