feat(orchestrator): validate renderer auto-resolution at config time (#2537)#2540
Merged
Conversation
246b575 to
ac97830
Compare
Closes #2537. When `use_renderer=True` with `renderer.name='auto'` and `model.name` isn't in `MODEL_RENDERER_MAP`, `create_renderer` silently falls back to `DefaultRenderer`. That fallback (a) doesn't fix the position-dependent chat-template bug the renderer client exists to solve, and (b) rejects envs that pass tools (rollout dies with "RendererPool does not support tools") unless `renderer.tool_parser` is set. Today this only surfaces mid-rollout. Add a config-time `@model_validator(mode="after")` on `OrchestratorConfig` that rejects this combination at parse time, so `--dry-run` reports it. Lazy-imports `MODEL_RENDERER_MAP` so the slim `prime-rl-configs` package still parses configs when `renderers` isn't installed. Sweep 25 existing configs to opt into a renderer explicitly: - 20 PI-vendored / fine-tuned / R1-distilled configs get `[orchestrator.renderer] name = "default"` (the right choice — their templates are customized, so `apply_chat_template` is correct; forcing a model-specific renderer would emit canonical tokens that don't match the vendored template). R1 distills also get `reasoning_parser = "think"`. - 5 configs whose model is template-identical (md5-confirmed) to an already-mapped sibling get the model-specific renderer name explicitly: GLM-5-FP8 → `glm-5`, Qwen3-30B-A3B-Thinking-2507 → `qwen3`, PrimeIntellect/MiniMax-M2.5-bf16 → `minimax-m2`. Those three models will also be added to MODEL_RENDERER_MAP upstream (PrimeIntellect-ai/renderers#50). Once that lands and the submodule bumps, the explicit names in those 5 configs become redundant and can be removed in a follow-up — but the PR here is self-contained and doesn't depend on the renderers PR landing first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ac97830 to
72760e5
Compare
…RL configs
Replace PI mirror id in the two public RL configs that previously had
the model name in MODEL_RENDERER_MAP-incompatible form:
- gsm8k/rl.toml: PrimeIntellect/Qwen3-0.6B -> Qwen/Qwen3-0.6B
- ci/integration/alphabet_sort.toml: same
PI's Qwen3-0.6B is a same-weights mirror that ships a patched chat
template. The main behavioral effect of that patch is 'always emit
prior assistant <think> blocks' — the renderer ignores the model's
template anyway, so we set renderer.preserve_all_thinking = true on
the qwen3 renderer to reproduce that behavior with the upstream id.
SFT configs that reference PrimeIntellect/Qwen3-{0.6B,1.7B} are left
untouched: SFT defaults use_renderer=false, so HF apply_chat_template
runs directly against the model's template — swapping the id there
would silently change SFT tokenization (PI emits empty <think></think>
for every assistant message; upstream Qwen does not).
48c25cc to
52b1202
Compare
…r.name blocks
Bumps both deps submodules to pick up the three PRs that just merged:
- renderers#48 — pre-flight overflow check with auto-discovery
(`OverlongPromptError`, optional `max_prompt_len`, `/v1/models` lookup
cached by `(base_url, model)`)
- renderers#50 — adds `Qwen/Qwen3-30B-A3B-{Instruct,Thinking}-2507` and
`zai-org/GLM-5-FP8` to `MODEL_RENDERER_MAP`
- verifiers#1408 — translates `renderers.OverlongPromptError` →
`verifiers.errors.OverlongPromptError` in the renderer client
deps/renderers: 17d0584f → 8704f9d (tagged renderers-v0.1.8.dev4)
deps/verifiers: dd89b5e9 → 58b119fa
With renderers#50 in the bumped submodule, four of the five explicit
`[orchestrator.renderer]` blocks added earlier in this PR are no longer
needed — the auto-resolver picks the same renderer:
- examples/multinode/rl.toml → auto resolves to qwen3
- examples/qwen30b_math/rl.toml → auto resolves to qwen3
- examples/qwen30b_swe/rl.toml → auto resolves to qwen3
- examples/glm5_pd_disag/rl.toml → auto resolves to glm-5
examples/minimax_m2.5_swe/rl.toml keeps its explicit
`renderer.name = "minimax-m2"` because the PI-mirror id
`PrimeIntellect/MiniMax-M2.5-bf16` is deliberately not in the upstream
map (renderers#50 dropped PI-mirror ids on purpose).
samsja
reviewed
May 18, 2026
The early-return at the top of validate_renderer_auto_resolves already skips the import unless use_renderer=True AND renderer.name='auto'. If the user reached that path, they intend to actually use renderers at runtime — an ImportError here is a real configuration problem, not something to silently bypass. The slim install CI step exercises configs with explicit renderer.name and never hits this branch.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit aecf703. Configure here.
PrimeIntellect/Qwen3-1.7B-Wordle-SFT ships the standard PI Qwen3 template
patch — chat_template.jinja is byte-identical (md5 c574d57…) to
PrimeIntellect/Qwen3-0.6B base. That patch's effective behavior ("always
emit prior <think>") is exactly what the qwen3 renderer reproduces with
preserve_all_thinking = true, so switch from DefaultRenderer to the
token-aware qwen3 renderer (same pattern as gsm8k/alphabet_sort after
the upstream model swap).
Leaving examples/reverse_text and configs/multi_reverse_text on
DefaultRenderer — their chat template diverges from PI base in
non-think-related ways (missing tool_calls gating, custom default
system prompt) so DefaultRenderer.apply_chat_template is the safer
match to SFT-time tokenization.
OrchestratorConfig's validate_renderer_auto_resolves validator imports renderers.base.MODEL_RENDERER_MAP. With use_renderer=True and renderer.name='auto' as defaults, any plain OrchestratorConfig() call hits that import — so renderers needs to be a hard dep, not optional. The import is still inside the validator body, so the slim install CI's "no heavy deps in sys.modules at import time" check keeps passing: - step 1 (just imports modules) — never runs validation, renderers stays out - step 2 (parses examples/Intellect-3.1/rl.toml with renderer.name='default') — validator early-returns before the import fires
samsja
approved these changes
May 18, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Summary
OrchestratorConfig@model_validator(mode="after")that rejectsuse_renderer=True+renderer.name='auto'whenmodel.nameisn't inMODEL_RENDERER_MAP, so--dry-runcatches it (closes Surface unsupported renderer config errors earlier #2537).PrimeIntellect/Qwen3-0.6B→Qwen/Qwen3-0.6Bin the two public RL configs that referenced it (gsm8k, alphabet_sort) and setsrenderer.preserve_all_thinking = trueto match PI's template behavior under the auto-resolvedqwen3renderer.MODEL_RENDERER_MAP), and verifiers#1408 (renderer-clientOverlongPromptErrortranslation).Why config-time and not orchestrator startup?
The SFT path already rejects this at startup (
src/prime_rl/trainer/sft/train.py:172-179). Putting the same guard at orchestrator startup wouldn't fire on--dry-run. Moving to a config validator catches it incli(RLConfig)— i.e. on dry-run and on every config load — without touching the runtime code path.What the error looks like
Exits non-zero;
Dry run completenever prints. The full list of mapped models is omitted from the error (40+ lines was too noisy); users can greprenderers.base.MODEL_RENDERER_MAPfor it.How I classified each unmapped config
After md5-hashing
chat_template.jinjafor each unmapped model against its closest mapped sibling, plus diffing PI mirror templates against upstream Qwen / MiniMax to confirm same-weights-only:PrimeIntellect/Qwen3-0.6B→Qwen/Qwen3-0.6B, addrenderer.preserve_all_thinking = truegsm8k/rl,ci/integration/alphabet_sortrenderer.name = "qwen3"+preserve_all_thinking = true— chat template byte-identical (md5c574d57…) toPrimeIntellect/Qwen3-0.6Bbaseexamples/wordle/rl.toml(PrimeIntellect/Qwen3-1.7B-Wordle-SFT)renderer.name = "default"— PI / Qwen finetune with custom chat templatemulti_reverse_text,ci/integration/reverse_text/{start,resume},ci/integration/reverse_text_lora/{start,resume},ci/integration/reverse_text_moe/start,ci/integration/reverse_text_multi_run,examples/{reverse_text, Intellect-3.1}renderer.name = "default" + reasoning_parser = "think"— DeepSeek-R1 distillacereason_math/{stage1,stage2},ci/nightly/acereason_math,deepscaler/{stage1,stage2,stage3},hendrycks_math/sanity,examples/hendrycks_sanityrenderer.name— template-identical (md5-confirmed) to a mapped sibling. After the submodule bump for renderers#50, four are auto-resolved and reverted; only the PI-mirror entry stays explicit.examples/minimax_m2.5_swe(→minimax-m2,PrimeIntellect/MiniMax-M2.5-bf16deliberately not in upstream map)Why we're not swapping all PI mirrors
PrimeIntellect/*re-uploads of upstream models mostly exist to carry chat-template patches that the hand-coded renderers ignore anyway. For the two RL configs that use the PI Qwen3 mirror, the swap to upstream +preserve_all_thinking = truereproduces PI's "always emit prior<think>" template patch under the qwen3 renderer.SFT configs that reference
PrimeIntellect/Qwen3-{0.6B,1.7B}are intentionally left untouched — SFT defaultsuse_renderer = false, so HFapply_chat_templateruns directly against the model's template; swapping the model id would silently change SFT tokenization (PI emits empty<think></think>for every assistant message; upstream Qwen does not).Genuine PI finetunes with custom chat templates (
Qwen3-0.6B-Reverse-Text-SFT,Reverse-Text-SFT,INTELLECT-3*,INTELLECT-3-Base) keep the PI id and userenderer.name = "default"soDefaultRendererruns the model's own (patched)apply_chat_template. Exception:Qwen3-1.7B-Wordle-SFTships the standard PI template patch (byte-identical toPrimeIntellect/Qwen3-0.6Bbase), so it gets the sameqwen3renderer +preserve_all_thinking = truetreatment as the two RL configs that swapped to upstream Qwen3.The one exception is
PrimeIntellect/MiniMax-M2.5-bf16: identical weights toMiniMaxAI/MiniMax-M2.5modulo a bf16 dtype cast that prime-rl's trainer requires, and a byte-identical chat template. Keep the PI id (we need the bf16 weights), setrenderer.name = "minimax-m2"explicitly. The corresponding PI-mirror entry was deliberately removed from PR #50 (no PI ids in the upstream renderer map).Slim configs package
prime-rl-configsnow listsrenderers>=0.1.8.dev4as a hard dependency. The validator'sfrom renderers.base import MODEL_RENDERER_MAPlives inside the function body and only fires onrenderer.name='auto'configs, so the slim install CI's "no heavy deps insys.modulesat import time" check still passes —renderers(and its transitivetransformers) stay out ofsys.modulesuntil validation actually needs them. Verified locally against both slim CI steps and Bugbot's failure mode (defaultOrchestratorConfig()now resolves instead of raisingImportError).Submodule bumps
deps/renderers:17d0584f→8704f9d(renderers-v0.1.8.dev4)deps/verifiers:dd89b5e9→58b119faOverlongPromptErrortranslation).Tests
tests/unit/test_configs.pycover: reject unmapped, accept mapped, explicitrenderer.namebypass,use_renderer=falsebypass, explicitrenderer.name="default"opt-in.test_load_configscases pass after the sweep.tests/unit/train/test_runs.py+tests/unit/train/rl/test_packer.pycases pass after addinguse_renderer = Falseto test fixtures that use placeholder"test-model"(now correctly rejected by the validator).tests/unit/sweep (excluding pre-existing unrelated CUDA/Laguna failures): 240 passed.Note
Medium Risk
Medium risk: introduces a new config-time validation that can hard-fail previously working orchestrator configs that relied on
renderer.name='auto'silently falling back toDefaultRenderer, and adds a newrenderersdependency toprime-rl-configs.Overview
Adds a new
OrchestratorConfigvalidator that rejectsuse_renderer=true+renderer.name='auto'when the configured model isn’t inrenderers.base.MODEL_RENDERER_MAP, surfacing misconfiguration during config load /--dry-runinstead of silently falling back toDefaultRenderer.Updates public
configs/andexamples/to explicitly opt into an appropriate renderer (name="default"or a model-specific renderer, plusreasoning_parser="think"/preserve_all_thinkingwhere needed), and adjusts unit tests/fixtures to either use a mapped model or setuse_renderer=false. Also addsrenderers>=0.1.8.dev4toprime-rl-configsdependencies and updates the lockfile accordingly.Reviewed by Cursor Bugbot for commit 30653f4. Bugbot is set up for automated code reviews on this repo. Configure here.