Skip to content

feat(orchestrator): validate renderer auto-resolution at config time (#2537)#2540

Merged
hallerite merged 7 commits into
mainfrom
feat/validate-renderer-auto-resolves
May 18, 2026
Merged

feat(orchestrator): validate renderer auto-resolution at config time (#2537)#2540
hallerite merged 7 commits into
mainfrom
feat/validate-renderer-auto-resolves

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 18, 2026

Summary

  • Adds an OrchestratorConfig @model_validator(mode="after") that rejects use_renderer=True + renderer.name='auto' when model.name isn't in MODEL_RENDERER_MAP, so --dry-run catches it (closes Surface unsupported renderer config errors earlier #2537).
  • Sweeps the public configs to opt into the right renderer explicitly.
  • Swaps PrimeIntellect/Qwen3-0.6BQwen/Qwen3-0.6B in the two public RL configs that referenced it (gsm8k, alphabet_sort) and sets renderer.preserve_all_thinking = true to match PI's template behavior under the auto-resolved qwen3 renderer.
  • Includes submodule bumps to pick up the three companion PRs (now merged): renderers#48 (pre-flight overflow check with auto-discovery), renderers#50 (Qwen3-30B-A3B-{Instruct,Thinking}-2507 and GLM-5-FP8 added to MODEL_RENDERER_MAP), and verifiers#1408 (renderer-client OverlongPromptError translation).

Why config-time and not orchestrator startup?

The SFT path already rejects this at startup (src/prime_rl/trainer/sft/train.py:172-179). Putting the same guard at orchestrator startup wouldn't fire on --dry-run. Moving to a config validator catches it in cli(RLConfig) — i.e. on dry-run and on every config load — without touching the runtime code path.

What the error looks like

Value error, orchestrator.use_renderer=True with renderer.name='auto' but
'PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT' is not in
renderers.base.MODEL_RENDERER_MAP, so it would silently fall back to
DefaultRenderer. Pick one:
(a) [orchestrator.renderer] name='default' — for fine-tunes / vendored
    mirrors with custom chat templates (DefaultRenderer calls
    apply_chat_template); pair with tool_parser=<name> if the env uses tools.
(b) [orchestrator.renderer] name=<model-specific renderer> — if <model> is
    template-identical to a mapped family (and ideally also add it upstream
    to renderers.base.MODEL_RENDERER_MAP).
(c) orchestrator.use_renderer=false — opt out of the renderer client entirely.

Exits non-zero; Dry run complete never prints. The full list of mapped models is omitted from the error (40+ lines was too noisy); users can grep renderers.base.MODEL_RENDERER_MAP for it.

How I classified each unmapped config

After md5-hashing chat_template.jinja for each unmapped model against its closest mapped sibling, plus diffing PI mirror templates against upstream Qwen / MiniMax to confirm same-weights-only:

Action Configs
Swap RL configs: PrimeIntellect/Qwen3-0.6BQwen/Qwen3-0.6B, add renderer.preserve_all_thinking = true gsm8k/rl, ci/integration/alphabet_sort
Keep PI finetune id, set renderer.name = "qwen3" + preserve_all_thinking = true — chat template byte-identical (md5 c574d57…) to PrimeIntellect/Qwen3-0.6B base examples/wordle/rl.toml (PrimeIntellect/Qwen3-1.7B-Wordle-SFT)
renderer.name = "default" — PI / Qwen finetune with custom chat template 9 configs — multi_reverse_text, ci/integration/reverse_text/{start,resume}, ci/integration/reverse_text_lora/{start,resume}, ci/integration/reverse_text_moe/start, ci/integration/reverse_text_multi_run, examples/{reverse_text, Intellect-3.1}
renderer.name = "default" + reasoning_parser = "think" — DeepSeek-R1 distill 8 configs — acereason_math/{stage1,stage2}, ci/nightly/acereason_math, deepscaler/{stage1,stage2,stage3}, hendrycks_math/sanity, examples/hendrycks_sanity
Model-specific renderer.name — template-identical (md5-confirmed) to a mapped sibling. After the submodule bump for renderers#50, four are auto-resolved and reverted; only the PI-mirror entry stays explicit. 1 config — examples/minimax_m2.5_swe (→ minimax-m2, PrimeIntellect/MiniMax-M2.5-bf16 deliberately not in upstream map)

Why we're not swapping all PI mirrors

PrimeIntellect/* re-uploads of upstream models mostly exist to carry chat-template patches that the hand-coded renderers ignore anyway. For the two RL configs that use the PI Qwen3 mirror, the swap to upstream + preserve_all_thinking = true reproduces PI's "always emit prior <think>" template patch under the qwen3 renderer.

SFT configs that reference PrimeIntellect/Qwen3-{0.6B,1.7B} are intentionally left untouched — SFT defaults use_renderer = false, so HF apply_chat_template runs directly against the model's template; swapping the model id would silently change SFT tokenization (PI emits empty <think></think> for every assistant message; upstream Qwen does not).

Genuine PI finetunes with custom chat templates (Qwen3-0.6B-Reverse-Text-SFT, Reverse-Text-SFT, INTELLECT-3*, INTELLECT-3-Base) keep the PI id and use renderer.name = "default" so DefaultRenderer runs the model's own (patched) apply_chat_template. Exception: Qwen3-1.7B-Wordle-SFT ships the standard PI template patch (byte-identical to PrimeIntellect/Qwen3-0.6B base), so it gets the same qwen3 renderer + preserve_all_thinking = true treatment as the two RL configs that swapped to upstream Qwen3.

The one exception is PrimeIntellect/MiniMax-M2.5-bf16: identical weights to MiniMaxAI/MiniMax-M2.5 modulo a bf16 dtype cast that prime-rl's trainer requires, and a byte-identical chat template. Keep the PI id (we need the bf16 weights), set renderer.name = "minimax-m2" explicitly. The corresponding PI-mirror entry was deliberately removed from PR #50 (no PI ids in the upstream renderer map).

Slim configs package

prime-rl-configs now lists renderers>=0.1.8.dev4 as a hard dependency. The validator's from renderers.base import MODEL_RENDERER_MAP lives inside the function body and only fires on renderer.name='auto' configs, so the slim install CI's "no heavy deps in sys.modules at import time" check still passes — renderers (and its transitive transformers) stay out of sys.modules until validation actually needs them. Verified locally against both slim CI steps and Bugbot's failure mode (default OrchestratorConfig() now resolves instead of raising ImportError).

Submodule bumps

  • deps/renderers: 17d0584f8704f9d (renderers-v0.1.8.dev4)
    • Includes renderers#48 (overflow check) and renderers#50 (map additions).
  • deps/verifiers: dd89b5e958b119fa
    • Includes verifiers#1408 (OverlongPromptError translation).

Tests

  • 5 new unit tests in tests/unit/test_configs.py cover: reject unmapped, accept mapped, explicit renderer.name bypass, use_renderer=false bypass, explicit renderer.name="default" opt-in.
  • All 78 parametrized test_load_configs cases pass after the sweep.
  • All 13 tests/unit/train/test_runs.py + tests/unit/train/rl/test_packer.py cases pass after adding use_renderer = False to test fixtures that use placeholder "test-model" (now correctly rejected by the validator).
  • Broad tests/unit/ sweep (excluding pre-existing unrelated CUDA/Laguna failures): 240 passed.

Note

Medium Risk
Medium risk: introduces a new config-time validation that can hard-fail previously working orchestrator configs that relied on renderer.name='auto' silently falling back to DefaultRenderer, and adds a new renderers dependency to prime-rl-configs.

Overview
Adds a new OrchestratorConfig validator that rejects use_renderer=true + renderer.name='auto' when the configured model isn’t in renderers.base.MODEL_RENDERER_MAP, surfacing misconfiguration during config load / --dry-run instead of silently falling back to DefaultRenderer.

Updates public configs/ and examples/ to explicitly opt into an appropriate renderer (name="default" or a model-specific renderer, plus reasoning_parser="think" / preserve_all_thinking where needed), and adjusts unit tests/fixtures to either use a mapped model or set use_renderer=false. Also adds renderers>=0.1.8.dev4 to prime-rl-configs dependencies and updates the lockfile accordingly.

Reviewed by Cursor Bugbot for commit 30653f4. Bugbot is set up for automated code reviews on this repo. Configure here.

@hallerite hallerite force-pushed the feat/validate-renderer-auto-resolves branch from 246b575 to ac97830 Compare May 18, 2026 13:37
Closes #2537.

When `use_renderer=True` with `renderer.name='auto'` and `model.name`
isn't in `MODEL_RENDERER_MAP`, `create_renderer` silently falls back
to `DefaultRenderer`. That fallback (a) doesn't fix the
position-dependent chat-template bug the renderer client exists to
solve, and (b) rejects envs that pass tools (rollout dies with
"RendererPool does not support tools") unless `renderer.tool_parser`
is set. Today this only surfaces mid-rollout.

Add a config-time `@model_validator(mode="after")` on
`OrchestratorConfig` that rejects this combination at parse time, so
`--dry-run` reports it. Lazy-imports `MODEL_RENDERER_MAP` so the slim
`prime-rl-configs` package still parses configs when `renderers`
isn't installed.

Sweep 25 existing configs to opt into a renderer explicitly:
- 20 PI-vendored / fine-tuned / R1-distilled configs get
  `[orchestrator.renderer] name = "default"` (the right choice — their
  templates are customized, so `apply_chat_template` is correct;
  forcing a model-specific renderer would emit canonical tokens that
  don't match the vendored template). R1 distills also get
  `reasoning_parser = "think"`.
- 5 configs whose model is template-identical (md5-confirmed) to an
  already-mapped sibling get the model-specific renderer name
  explicitly: GLM-5-FP8 → `glm-5`, Qwen3-30B-A3B-Thinking-2507 →
  `qwen3`, PrimeIntellect/MiniMax-M2.5-bf16 → `minimax-m2`.

Those three models will also be added to MODEL_RENDERER_MAP upstream
(PrimeIntellect-ai/renderers#50). Once that lands and the submodule
bumps, the explicit names in those 5 configs become redundant and
can be removed in a follow-up — but the PR here is self-contained
and doesn't depend on the renderers PR landing first.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…RL configs

Replace PI mirror id in the two public RL configs that previously had
the model name in MODEL_RENDERER_MAP-incompatible form:

- gsm8k/rl.toml:  PrimeIntellect/Qwen3-0.6B -> Qwen/Qwen3-0.6B
- ci/integration/alphabet_sort.toml: same

PI's Qwen3-0.6B is a same-weights mirror that ships a patched chat
template. The main behavioral effect of that patch is 'always emit
prior assistant <think> blocks' — the renderer ignores the model's
template anyway, so we set renderer.preserve_all_thinking = true on
the qwen3 renderer to reproduce that behavior with the upstream id.

SFT configs that reference PrimeIntellect/Qwen3-{0.6B,1.7B} are left
untouched: SFT defaults use_renderer=false, so HF apply_chat_template
runs directly against the model's template — swapping the id there
would silently change SFT tokenization (PI emits empty <think></think>
for every assistant message; upstream Qwen does not).
@hallerite hallerite force-pushed the feat/validate-renderer-auto-resolves branch from 48c25cc to 52b1202 Compare May 18, 2026 17:26
…r.name blocks

Bumps both deps submodules to pick up the three PRs that just merged:

- renderers#48 — pre-flight overflow check with auto-discovery
  (`OverlongPromptError`, optional `max_prompt_len`, `/v1/models` lookup
  cached by `(base_url, model)`)
- renderers#50 — adds `Qwen/Qwen3-30B-A3B-{Instruct,Thinking}-2507` and
  `zai-org/GLM-5-FP8` to `MODEL_RENDERER_MAP`
- verifiers#1408 — translates `renderers.OverlongPromptError` →
  `verifiers.errors.OverlongPromptError` in the renderer client

deps/renderers: 17d0584f → 8704f9d (tagged renderers-v0.1.8.dev4)
deps/verifiers: dd89b5e9 → 58b119fa

With renderers#50 in the bumped submodule, four of the five explicit
`[orchestrator.renderer]` blocks added earlier in this PR are no longer
needed — the auto-resolver picks the same renderer:

- examples/multinode/rl.toml      → auto resolves to qwen3
- examples/qwen30b_math/rl.toml   → auto resolves to qwen3
- examples/qwen30b_swe/rl.toml    → auto resolves to qwen3
- examples/glm5_pd_disag/rl.toml  → auto resolves to glm-5

examples/minimax_m2.5_swe/rl.toml keeps its explicit
`renderer.name = "minimax-m2"` because the PI-mirror id
`PrimeIntellect/MiniMax-M2.5-bf16` is deliberately not in the upstream
map (renderers#50 dropped PI-mirror ids on purpose).
@hallerite hallerite marked this pull request as ready for review May 18, 2026 22:11
Comment thread packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py
Comment thread packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py Outdated
The early-return at the top of validate_renderer_auto_resolves already
skips the import unless use_renderer=True AND renderer.name='auto'. If
the user reached that path, they intend to actually use renderers at
runtime — an ImportError here is a real configuration problem, not
something to silently bypass. The slim install CI step exercises configs
with explicit renderer.name and never hits this branch.
Comment thread examples/wordle/rl.toml Outdated
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit aecf703. Configure here.

Comment thread packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py
hallerite added 2 commits May 18, 2026 22:30
PrimeIntellect/Qwen3-1.7B-Wordle-SFT ships the standard PI Qwen3 template
patch — chat_template.jinja is byte-identical (md5 c574d57…) to
PrimeIntellect/Qwen3-0.6B base. That patch's effective behavior ("always
emit prior <think>") is exactly what the qwen3 renderer reproduces with
preserve_all_thinking = true, so switch from DefaultRenderer to the
token-aware qwen3 renderer (same pattern as gsm8k/alphabet_sort after
the upstream model swap).

Leaving examples/reverse_text and configs/multi_reverse_text on
DefaultRenderer — their chat template diverges from PI base in
non-think-related ways (missing tool_calls gating, custom default
system prompt) so DefaultRenderer.apply_chat_template is the safer
match to SFT-time tokenization.
OrchestratorConfig's validate_renderer_auto_resolves validator imports
renderers.base.MODEL_RENDERER_MAP. With use_renderer=True and
renderer.name='auto' as defaults, any plain OrchestratorConfig() call
hits that import — so renderers needs to be a hard dep, not optional.

The import is still inside the validator body, so the slim install CI's
"no heavy deps in sys.modules at import time" check keeps passing:
- step 1 (just imports modules) — never runs validation, renderers stays out
- step 2 (parses examples/Intellect-3.1/rl.toml with renderer.name='default')
  — validator early-returns before the import fires
@hallerite hallerite merged commit 6591943 into main May 18, 2026
14 of 16 checks passed
@hallerite hallerite deleted the feat/validate-renderer-auto-resolves branch May 18, 2026 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Surface unsupported renderer config errors earlier

3 participants