feat(orchestrator): validate renderer auto-resolution at config time (#2537) by hallerite · Pull Request #2540 · PrimeIntellect-ai/prime-rl

hallerite · 2026-05-18T13:25:44Z

Summary

Adds an OrchestratorConfig @model_validator(mode="after") that rejects use_renderer=True + renderer.name='auto' when model.name isn't in MODEL_RENDERER_MAP, so --dry-run catches it (closes Surface unsupported renderer config errors earlier #2537).
Sweeps the public configs to opt into the right renderer explicitly.
Swaps PrimeIntellect/Qwen3-0.6B → Qwen/Qwen3-0.6B in the two public RL configs that referenced it (gsm8k, alphabet_sort) and sets renderer.preserve_all_thinking = true to match PI's template behavior under the auto-resolved qwen3 renderer.
Includes submodule bumps to pick up the three companion PRs (now merged): renderers#48 (pre-flight overflow check with auto-discovery), renderers#50 (Qwen3-30B-A3B-{Instruct,Thinking}-2507 and GLM-5-FP8 added to MODEL_RENDERER_MAP), and verifiers#1408 (renderer-client OverlongPromptError translation).

Why config-time and not orchestrator startup?

The SFT path already rejects this at startup (src/prime_rl/trainer/sft/train.py:172-179). Putting the same guard at orchestrator startup wouldn't fire on --dry-run. Moving to a config validator catches it in cli(RLConfig) — i.e. on dry-run and on every config load — without touching the runtime code path.

What the error looks like

Value error, orchestrator.use_renderer=True with renderer.name='auto' but
'PrimeIntellect/Qwen3-0.6B-Reverse-Text-SFT' is not in
renderers.base.MODEL_RENDERER_MAP, so it would silently fall back to
DefaultRenderer. Pick one:
(a) [orchestrator.renderer] name='default' — for fine-tunes / vendored
    mirrors with custom chat templates (DefaultRenderer calls
    apply_chat_template); pair with tool_parser=<name> if the env uses tools.
(b) [orchestrator.renderer] name=<model-specific renderer> — if <model> is
    template-identical to a mapped family (and ideally also add it upstream
    to renderers.base.MODEL_RENDERER_MAP).
(c) orchestrator.use_renderer=false — opt out of the renderer client entirely.

Exits non-zero; Dry run complete never prints. The full list of mapped models is omitted from the error (40+ lines was too noisy); users can grep renderers.base.MODEL_RENDERER_MAP for it.

How I classified each unmapped config

After md5-hashing chat_template.jinja for each unmapped model against its closest mapped sibling, plus diffing PI mirror templates against upstream Qwen / MiniMax to confirm same-weights-only:

Action	Configs
Swap RL configs: `PrimeIntellect/Qwen3-0.6B` → `Qwen/Qwen3-0.6B`, add `renderer.preserve_all_thinking = true`	`gsm8k/rl`, `ci/integration/alphabet_sort`
Keep PI finetune id, set `renderer.name = "qwen3"` + `preserve_all_thinking = true` — chat template byte-identical (md5 `c574d57…`) to `PrimeIntellect/Qwen3-0.6B` base	`examples/wordle/rl.toml` (`PrimeIntellect/Qwen3-1.7B-Wordle-SFT`)
`renderer.name = "default"` — PI / Qwen finetune with custom chat template	9 configs — `multi_reverse_text`, `ci/integration/reverse_text/{start,resume}`, `ci/integration/reverse_text_lora/{start,resume}`, `ci/integration/reverse_text_moe/start`, `ci/integration/reverse_text_multi_run`, `examples/{reverse_text, Intellect-3.1}`
`renderer.name = "default" + reasoning_parser = "think"` — DeepSeek-R1 distill	8 configs — `acereason_math/{stage1,stage2}`, `ci/nightly/acereason_math`, `deepscaler/{stage1,stage2,stage3}`, `hendrycks_math/sanity`, `examples/hendrycks_sanity`
Model-specific `renderer.name` — template-identical (md5-confirmed) to a mapped sibling. After the submodule bump for renderers#50, four are auto-resolved and reverted; only the PI-mirror entry stays explicit.	1 config — `examples/minimax_m2.5_swe` (→ `minimax-m2`, `PrimeIntellect/MiniMax-M2.5-bf16` deliberately not in upstream map)

Why we're not swapping all PI mirrors

PrimeIntellect/* re-uploads of upstream models mostly exist to carry chat-template patches that the hand-coded renderers ignore anyway. For the two RL configs that use the PI Qwen3 mirror, the swap to upstream + preserve_all_thinking = true reproduces PI's "always emit prior <think>" template patch under the qwen3 renderer.

SFT configs that reference PrimeIntellect/Qwen3-{0.6B,1.7B} are intentionally left untouched — SFT defaults use_renderer = false, so HF apply_chat_template runs directly against the model's template; swapping the model id would silently change SFT tokenization (PI emits empty <think></think> for every assistant message; upstream Qwen does not).

Genuine PI finetunes with custom chat templates (Qwen3-0.6B-Reverse-Text-SFT, Reverse-Text-SFT, INTELLECT-3*, INTELLECT-3-Base) keep the PI id and use renderer.name = "default" so DefaultRenderer runs the model's own (patched) apply_chat_template. Exception: Qwen3-1.7B-Wordle-SFT ships the standard PI template patch (byte-identical to PrimeIntellect/Qwen3-0.6B base), so it gets the same qwen3 renderer + preserve_all_thinking = true treatment as the two RL configs that swapped to upstream Qwen3.

The one exception is PrimeIntellect/MiniMax-M2.5-bf16: identical weights to MiniMaxAI/MiniMax-M2.5 modulo a bf16 dtype cast that prime-rl's trainer requires, and a byte-identical chat template. Keep the PI id (we need the bf16 weights), set renderer.name = "minimax-m2" explicitly. The corresponding PI-mirror entry was deliberately removed from PR #50 (no PI ids in the upstream renderer map).

Slim configs package

prime-rl-configs now lists renderers>=0.1.8.dev4 as a hard dependency. The validator's from renderers.base import MODEL_RENDERER_MAP lives inside the function body and only fires on renderer.name='auto' configs, so the slim install CI's "no heavy deps in sys.modules at import time" check still passes — renderers (and its transitive transformers) stay out of sys.modules until validation actually needs them. Verified locally against both slim CI steps and Bugbot's failure mode (default OrchestratorConfig() now resolves instead of raising ImportError).

Submodule bumps

deps/renderers: 17d0584f → 8704f9d (renderers-v0.1.8.dev4)
- Includes renderers#48 (overflow check) and renderers#50 (map additions).
deps/verifiers: dd89b5e9 → 58b119fa
- Includes verifiers#1408 (OverlongPromptError translation).

Tests

5 new unit tests in tests/unit/test_configs.py cover: reject unmapped, accept mapped, explicit renderer.name bypass, use_renderer=false bypass, explicit renderer.name="default" opt-in.
All 78 parametrized test_load_configs cases pass after the sweep.
All 13 tests/unit/train/test_runs.py + tests/unit/train/rl/test_packer.py cases pass after adding use_renderer = False to test fixtures that use placeholder "test-model" (now correctly rejected by the validator).
Broad tests/unit/ sweep (excluding pre-existing unrelated CUDA/Laguna failures): 240 passed.

Note

Medium Risk
Medium risk: introduces a new config-time validation that can hard-fail previously working orchestrator configs that relied on renderer.name='auto' silently falling back to DefaultRenderer, and adds a new renderers dependency to prime-rl-configs.

Overview
Adds a new OrchestratorConfig validator that rejects use_renderer=true + renderer.name='auto' when the configured model isn’t in renderers.base.MODEL_RENDERER_MAP, surfacing misconfiguration during config load / --dry-run instead of silently falling back to DefaultRenderer.

Updates public configs/ and examples/ to explicitly opt into an appropriate renderer (name="default" or a model-specific renderer, plus reasoning_parser="think" / preserve_all_thinking where needed), and adjusts unit tests/fixtures to either use a mapped model or set use_renderer=false. Also adds renderers>=0.1.8.dev4 to prime-rl-configs dependencies and updates the lockfile accordingly.

^{Reviewed by Cursor Bugbot for commit 30653f4. Bugbot is set up for automated code reviews on this repo. Configure here.}

Closes #2537. When `use_renderer=True` with `renderer.name='auto'` and `model.name` isn't in `MODEL_RENDERER_MAP`, `create_renderer` silently falls back to `DefaultRenderer`. That fallback (a) doesn't fix the position-dependent chat-template bug the renderer client exists to solve, and (b) rejects envs that pass tools (rollout dies with "RendererPool does not support tools") unless `renderer.tool_parser` is set. Today this only surfaces mid-rollout. Add a config-time `@model_validator(mode="after")` on `OrchestratorConfig` that rejects this combination at parse time, so `--dry-run` reports it. Lazy-imports `MODEL_RENDERER_MAP` so the slim `prime-rl-configs` package still parses configs when `renderers` isn't installed. Sweep 25 existing configs to opt into a renderer explicitly: - 20 PI-vendored / fine-tuned / R1-distilled configs get `[orchestrator.renderer] name = "default"` (the right choice — their templates are customized, so `apply_chat_template` is correct; forcing a model-specific renderer would emit canonical tokens that don't match the vendored template). R1 distills also get `reasoning_parser = "think"`. - 5 configs whose model is template-identical (md5-confirmed) to an already-mapped sibling get the model-specific renderer name explicitly: GLM-5-FP8 → `glm-5`, Qwen3-30B-A3B-Thinking-2507 → `qwen3`, PrimeIntellect/MiniMax-M2.5-bf16 → `minimax-m2`. Those three models will also be added to MODEL_RENDERER_MAP upstream (PrimeIntellect-ai/renderers#50). Once that lands and the submodule bumps, the explicit names in those 5 configs become redundant and can be removed in a follow-up — but the PR here is self-contained and doesn't depend on the renderers PR landing first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…RL configs Replace PI mirror id in the two public RL configs that previously had the model name in MODEL_RENDERER_MAP-incompatible form: - gsm8k/rl.toml: PrimeIntellect/Qwen3-0.6B -> Qwen/Qwen3-0.6B - ci/integration/alphabet_sort.toml: same PI's Qwen3-0.6B is a same-weights mirror that ships a patched chat template. The main behavioral effect of that patch is 'always emit prior assistant <think> blocks' — the renderer ignores the model's template anyway, so we set renderer.preserve_all_thinking = true on the qwen3 renderer to reproduce that behavior with the upstream id. SFT configs that reference PrimeIntellect/Qwen3-{0.6B,1.7B} are left untouched: SFT defaults use_renderer=false, so HF apply_chat_template runs directly against the model's template — swapping the id there would silently change SFT tokenization (PI emits empty <think></think> for every assistant message; upstream Qwen does not).

…r.name blocks Bumps both deps submodules to pick up the three PRs that just merged: - renderers#48 — pre-flight overflow check with auto-discovery (`OverlongPromptError`, optional `max_prompt_len`, `/v1/models` lookup cached by `(base_url, model)`) - renderers#50 — adds `Qwen/Qwen3-30B-A3B-{Instruct,Thinking}-2507` and `zai-org/GLM-5-FP8` to `MODEL_RENDERER_MAP` - verifiers#1408 — translates `renderers.OverlongPromptError` → `verifiers.errors.OverlongPromptError` in the renderer client deps/renderers: 17d0584f → 8704f9d (tagged renderers-v0.1.8.dev4) deps/verifiers: dd89b5e9 → 58b119fa With renderers#50 in the bumped submodule, four of the five explicit `[orchestrator.renderer]` blocks added earlier in this PR are no longer needed — the auto-resolver picks the same renderer: - examples/multinode/rl.toml → auto resolves to qwen3 - examples/qwen30b_math/rl.toml → auto resolves to qwen3 - examples/qwen30b_swe/rl.toml → auto resolves to qwen3 - examples/glm5_pd_disag/rl.toml → auto resolves to glm-5 examples/minimax_m2.5_swe/rl.toml keeps its explicit `renderer.name = "minimax-m2"` because the PI-mirror id `PrimeIntellect/MiniMax-M2.5-bf16` is deliberately not in the upstream map (renderers#50 dropped PI-mirror ids on purpose).

The early-return at the top of validate_renderer_auto_resolves already skips the import unless use_renderer=True AND renderer.name='auto'. If the user reached that path, they intend to actually use renderers at runtime — an ImportError here is a real configuration problem, not something to silently bypass. The slim install CI step exercises configs with explicit renderer.name and never hits this branch.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit aecf703. Configure here.}

PrimeIntellect/Qwen3-1.7B-Wordle-SFT ships the standard PI Qwen3 template patch — chat_template.jinja is byte-identical (md5 c574d57…) to PrimeIntellect/Qwen3-0.6B base. That patch's effective behavior ("always emit prior <think>") is exactly what the qwen3 renderer reproduces with preserve_all_thinking = true, so switch from DefaultRenderer to the token-aware qwen3 renderer (same pattern as gsm8k/alphabet_sort after the upstream model swap). Leaving examples/reverse_text and configs/multi_reverse_text on DefaultRenderer — their chat template diverges from PI base in non-think-related ways (missing tool_calls gating, custom default system prompt) so DefaultRenderer.apply_chat_template is the safer match to SFT-time tokenization.

OrchestratorConfig's validate_renderer_auto_resolves validator imports renderers.base.MODEL_RENDERER_MAP. With use_renderer=True and renderer.name='auto' as defaults, any plain OrchestratorConfig() call hits that import — so renderers needs to be a hard dep, not optional. The import is still inside the validator body, so the slim install CI's "no heavy deps in sys.modules at import time" check keeps passing: - step 1 (just imports modules) — never runs validation, renderers stays out - step 2 (parses examples/Intellect-3.1/rl.toml with renderer.name='default') — validator early-returns before the import fires

hallerite force-pushed the feat/validate-renderer-auto-resolves branch from 246b575 to ac97830 Compare May 18, 2026 13:37

hallerite force-pushed the feat/validate-renderer-auto-resolves branch from ac97830 to 72760e5 Compare May 18, 2026 13:59

hallerite mentioned this pull request May 18, 2026

feat(map): add Qwen3-30B-A3B-{Instruct,Thinking}-2507, GLM-5-FP8 PrimeIntellect-ai/renderers#50

Merged

hallerite force-pushed the feat/validate-renderer-auto-resolves branch from 48c25cc to 52b1202 Compare May 18, 2026 17:26

hallerite marked this pull request as ready for review May 18, 2026 22:11

Merge branch 'main' into feat/validate-renderer-auto-resolves

f348c23

cursor Bot reviewed May 18, 2026

View reviewed changes

Comment thread packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py

samsja reviewed May 18, 2026

View reviewed changes

Comment thread packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py Outdated

mikasenghaas reviewed May 18, 2026

View reviewed changes

Comment thread examples/wordle/rl.toml Outdated

cursor Bot reviewed May 18, 2026

View reviewed changes

Comment thread packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py

hallerite added 2 commits May 18, 2026 22:30

samsja approved these changes May 18, 2026

View reviewed changes

hallerite merged commit 6591943 into main May 18, 2026
14 of 16 checks passed

hallerite deleted the feat/validate-renderer-auto-resolves branch May 18, 2026 23:06

hallerite mentioned this pull request May 18, 2026

Overlong prompt errors leak from renderers into orchestrator #2535

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(orchestrator): validate renderer auto-resolution at config time (#2537)#2540

feat(orchestrator): validate renderer auto-resolution at config time (#2537)#2540
hallerite merged 7 commits into
mainfrom
feat/validate-renderer-auto-resolves

hallerite commented May 18, 2026 •

edited by cursor Bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hallerite commented May 18, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why config-time and not orchestrator startup?

What the error looks like

How I classified each unmapped config

Why we're not swapping all PI mirrors

Slim configs package

Submodule bumps

Tests

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hallerite commented May 18, 2026 •

edited by cursor Bot

Loading