Consume the typed RendererConfig surface by hallerite · Pull Request #2635 · PrimeIntellect-ai/prime-rl

hallerite · 2026-05-25T22:04:25Z

Summary

Switches the orchestrator and SFT trainer to consume the typed
renderers.RendererConfig discriminated union (PrimeIntellect-ai/renderers#60)
and the matching vf.ClientConfig.renderer_config field
(PrimeIntellect-ai/verifiers#1467) — both now merged.

OrchestratorConfig.renderer: RendererConfig | None — a typed discriminated union (default AutoRendererConfig), None opts into MITO. prime-rl's own duplicate RendererConfig in configs.shared is removed; the renderers package is the single source of truth.
Fields are flat under [orchestrator.renderer], dispatched on the name discriminator. No .settings nesting (dropped after review).
renderer_pool_size → pool_size, moved to the orchestrator root.
setup_inference_pool / setup_clients / Static+ElasticInferencePool collapse the multi-arg renderer surface to a single renderer_config arg.
A model_serializer writes renderer = "None" so MITO configs round-trip through the saved orch.toml.

TOML before / after

Before:

[orchestrator]
use_renderer = true

[orchestrator.renderer]
name = "qwen3"

After (typed; or omit the block entirely for AutoRendererConfig):

[orchestrator.renderer]
name = "qwen3"
enable_thinking = false

MITO mode:

renderer = "None"

Bogus combinations (e.g. add_vision_id under name = "qwen3") raise pydantic.ValidationError at config-load time instead of silently no-op'ing.

Dependencies (all landed)

Typed renderer configs renderers#60 (typed configs) — merged
Consume the typed RendererConfig surface verifiers#1467 (typed ClientConfig.renderer_config) — merged
deps/verifiers bumped to f9c68eb2 (verifiers 0.1.15.dev11), which also includes PoC: add Qwen3 VL support #1463 (drop silent tool_call status filter) and Bump verifiers #1468 (pipe chat_template_kwargs into the typed renderer config)
Supersedes Use sampling chat template kwargs for renderer RL #2605 (which wired chat_template_kwargs via sampling.extra_body)

Merged latest main (docs revamp #2602 + wandb-metrics-by-role #2642); resolved the docs/multimodal.md modify/delete conflict and fixed a stale use_renderer = true reference in docs/training.md.

Validation

All 39 RL-shaped configs/*.toml load cleanly against the new RLConfig schema; grep confirms zero stale shapes (.settings, use_renderer, renderer_pool_size) remain in-tree.
MITO round-trip verified: TOML renderer = "None" → cfg.renderer is None → model_dump → renderer = "None" → reload → None.
End-to-end smoke on Qwen3.6-35B-A3B + mini-browse-apps (renderer/TITO route, no [renderer] block → AutoRendererConfig → Qwen36Renderer): 32 rollouts / 686 assistant turns, reward 0.64, renderer route exclusively (0 chat-completions). enable_thinking=false propagated correctly — 0 turns with reasoning content.
pytest tests/unit/orchestrator/test_orchestrator_setup.py tests/unit/test_configs.py — pass (the one failing case is an untracked local example config using the old shape, not in this PR).

Note

Medium Risk
Changes how rollout tokenization is selected (TITO vs MITO) and config validation for VLMs and auto-renderer resolution; misconfigured TOML could fail at load time or pick the wrong client until upstream renderers/verifiers deps align.

Overview
Replaces the orchestrator.use_renderer boolean with orchestrator.renderer: a typed renderers.RendererConfig discriminated union (default AutoRendererConfig) when using the renderer/TITO path, or None for MITO. The duplicate flat RendererConfig in configs.shared is removed in favor of the renderers package; create_renderer and verifiers clients now take a single renderer_config instead of scattered **name / parsers / preserve_*` fields.

orchestrator.pool_size moves to the orchestrator root (no longer under [orchestrator.renderer]). Validators and docs are updated accordingly (VLM requires a non-None renderer, SFT forces renderer=None, pool_size only when a renderer is set). A custom serializer writes renderer = "None" so MITO configs round-trip through saved orch.toml. Sample configs, multimodal docs, prime-rl-configs renderers pin, and unit tests follow the new shape.

^{Reviewed by Cursor Bugbot for commit 310a2fa. Bugbot is set up for automated code reviews on this repo. Configure here.}

Reshapes the orchestrator / SFT renderer plumbing around ``renderers.RendererConfig`` (the typed discriminated union) instead of flat strings + booleans. - ``configs.shared.RendererConfig`` now composes ``renderers.RendererConfig`` as a single ``settings`` field; ``pool_size`` stays as a separate prime-rl-side concern. The flat ``name`` / ``tool_parser`` / ``reasoning_parser`` / ``preserve_*`` fields are gone. - ``setup_inference_pool`` / ``setup_clients`` / ``StaticInferencePool`` / ``ElasticInferencePool`` collapse their six-arg renderer surface to a single ``renderer_config: RendererConfig | None`` plus the orthogonal ``renderer_model_name`` / ``renderer_pool_size`` fields. - Orchestrator + SFT trainer call sites pass the typed config straight through to ``create_renderer(tokenizer, config.renderer.settings)``. - Cross-env / unmapped-model validators rewritten to inspect ``renderer.settings`` (discriminated-union shape) instead of the flat field names. - Tests updated to construct typed configs (``Qwen3VLRendererConfig()``, etc.) and assert against the new call shapes. Pins ``deps/renderers`` and the ``prime-rl-configs`` ``renderers`` dep to the renderers PR's tip (``4c9099debbc9bc4fcc3eb6a118bbfa5c364ec395``) until it merges and a dev release ships. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The typed-config refactor landed on renderers main as ``c86c50b`` (tag ``renderers-v0.1.8.dev27``). Switching: - ``deps/renderers`` submodule pointer → ``c86c50b`` - ``prime-rl-configs`` git URL pin → ``@renderers-v0.1.8.dev27`` Same commit, more readable, and stable to anyone reading the diff after the feature branch is deleted. PyPI dev release for the same version is pending a publisher-config fix; once it lands the git URL pin can collapse to a plain ``renderers>=0.1.8.dev27`` PyPI pin.

The new RendererConfig accepts only `settings` (typed `renderers.RendererConfig` discriminated union) and `pool_size`, so flat `name`/`reasoning_parser`/`preserve_*` keys under `[orchestrator.renderer]` now fail pydantic validation. Move those keys under `[orchestrator.renderer.settings]`. `pool_size` stays in the outer block. Two configs that only set `preserve_all_thinking` gain an explicit `name = "auto"` since the discriminator is required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Per Mika's review: drop prime-rl's RendererConfig wrapper that nested the renderers' discriminated union under a ``.settings`` field. The wrapper bought a parallel ``pool_size`` attribute but cost a clunky two-level TOML layout and made every call site dereference ``.settings``. Use ``renderers.RendererConfig`` (the typed discriminated union) as the field type directly. Pool size moves up one level to ``orchestrator.renderer_pool_size``. ``use_renderer: bool`` is gone — ``renderer: RendererConfig | None`` with ``None`` for MITO replaces it (consistent with how SFT, optimizer, and other optional sub-configs express absence). TOML callers spell MITO with ``renderer = "None"``, which ``prime_pydantic_config.BaseConfig._none_str_to_none`` coerces back to Python None. Net effect on TOMLs is a wash with the previous commit: 26 in-repo configs lose the ``.settings`` indirection they gained yesterday, and one (multimodal) splits ``pool_size`` out of the renderer block. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The slim ``prime-rl-configs`` wheel (installed standalone, outside the workspace) now resolves ``renderers`` from PyPI instead of the git tag URL it used while the dev release was waiting on PyPI trusted-publisher config. The full ``prime-rl`` workspace still pulls renderers from the ``deps/renderers`` submodule, so this only affects downstream slim consumers. Re-locks to refresh the lockfile against the same workspace state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The slim ``prime-rl-configs`` install path (configs only, no GPU deps) asserts that importing the config classes doesn't pull in ``transformers``. ``from renderers import RendererConfig`` at the top of ``orchestrator.py`` / ``sft.py`` broke that: ``renderers/__init__`` eagerly imports every concrete renderer module (DefaultRenderer, Qwen3Renderer, …), each of which imports ``transformers.tokenization_utils.PreTrainedTokenizer``. Defer the renderers import via: - ``TYPE_CHECKING``-guarded ``from renderers import RendererConfig`` so static type checkers still see the typed discriminated union. - ``renderer: Any`` field annotation at runtime, so no resolution happens at class definition. - ``model_validator(mode="before") _resolve_renderer`` that lazy-imports ``renderers``, fills the orchestrator default (``AutoRendererConfig()``), and coerces dict input through the typed union via ``TypeAdapter``. SFT's default stays ``None``. End-user semantics unchanged; class import is now transformers-free. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

dev28 lazy-loads concrete renderer classes via PEP 562 ``__getattr__`` (renderers#64), so ``from renderers import RendererConfig`` no longer drags ``transformers`` into ``sys.modules``. The slim-install CI gate is satisfied without the prior workaround. Drop the ``Any`` annotation and the ``_resolve_renderer`` model-validator that lazy-resolved the discriminated union from ``OrchestratorConfig`` and ``SFTConfig``. The fields are typed ``RendererConfig | None`` again, and pydantic does the full discriminator validation at class definition. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- deps/renderers c86c50b -> 2ec28a8 — picks up the PEP 562 ``__getattr__`` so the workspace install path also lazy-loads concrete renderer classes. - deps/verifiers 04651be -> d107442 — adds the typed ``ClientConfig.renderer_config`` field that ``setup_clients`` forwards. Fixes the unit-test ``test_setup_clients_assigns_renderer_and_dp_rank_headers`` AttributeError on ``renderer_config``. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two cleanups on the typed renderer surface: 1. Per Samsja: drop the ``RendererConfig as RendererSettings`` alias in ``client.py`` / ``elastic.py``. The alias was a workaround for the former local ``RendererConfig`` wrapper class; with that gone, we just import ``RendererConfig`` from renderers directly. 2. Per Cursor bugbot: ``OrchestratorConfig.renderer = None`` (MITO) was silently dropped by ``model_dump(exclude_none=True)``, so any saved ``control/orch.toml`` would reload as the default ``AutoRendererConfig()`` (TITO). Add a wrapping ``model_serializer`` that re-injects ``renderer = "None"`` (string) when the field is ``None``, matching what ``BaseConfig._none_str_to_none`` accepts on reload. MITO and TITO now round-trip through TOML. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 2072afe. Configure here.}

cursor · 2026-05-26T00:20:26Z


-    use_renderer: bool = True
-    """Use the renderer-backed TITO client (client-side tokenization via the ``renderers`` package, served by ``/v1/generate``). When True, the ``[orchestrator.renderer]`` block (name / tool_parser / reasoning_parser / pool_size) applies. Default for both text-only and VLM rollouts; VLMs require it. False falls back to MITO (``openai_chat_completions``)."""
-


Missing breaking config changelog

Medium Severity

This PR removes orchestrator.use_renderer, moves renderer pool sizing to orchestrator.renderer_pool_size, and replaces the flat [orchestrator.renderer] schema with the typed renderers.RendererConfig union (MITO via renderer = "None"). Those are breaking config changes, but CHANGELOG.md was not updated with a migration entry.

^{Triggered by project rule: BugBot Instructions}

^{Reviewed by Cursor Bugbot for commit 2072afe. Configure here.}

mikasenghaas · 2026-05-26T00:22:35Z

-    renderer: RendererConfig = RendererConfig()
-    """Client-side renderer configuration. Only consumed when ``use_renderer=true``."""
+    renderer: RendererConfig | None = Field(default_factory=AutoRendererConfig)
+    """Typed renderer config (``renderers.RendererConfig`` discriminated


can we make this docstring a bit more concise

mikasenghaas · 2026-05-26T00:23:26Z

+    @model_serializer(mode="wrap")
+    def _preserve_mito_renderer(self, handler: SerializerFunctionWrapHandler) -> dict[str, Any]:
+        """Emit ``renderer = "None"`` (string) when MITO so
+        ``model_dump(exclude_none=True)`` round-trips: dumped TOML has
+        ``renderer = "None"``, and on reload
+        ``BaseConfig._none_str_to_none`` coerces it back to ``None``.
+        Without this, a MITO orchestrator config saved to
+        ``control/orch.toml`` would lose the renderer key entirely and
+        reload as the default ``AutoRendererConfig()`` (TITO)."""
+        result = handler(self)
+        if self.renderer is None:
+            result["renderer"] = "None"
+        return result


hmm this is code smell from the pydantic-config side. can merge for now but can you put an issue to fix this one

…cstring - renderer field uses direct AutoRendererConfig() default (frozen, no factory needed) - renderer_pool_size -> pool_size on OrchestratorConfig; cascade through client.py, elastic.py, orchestrator.py call sites and the test; vf.ClientConfig kwarg name (verifiers-side) stays renderer_pool_size - shorter docstring on the renderer field Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mikasenghaas

as discussed, approving this to unblock quickly. short-term cleanup would be to expose resolve_renderer(model_name: str) -> RendererConfig from renderers so we can resolve 1. the renderer 2. the chat template kwargs in the prime-rl configs

Picks up verifiers main with PR #1463 (surface every parsed tool_call, drop status filter) and PR #1468 (pipe chat_template_kwargs into typed RendererConfig) now that both are merged. verifiers 0.1.15.dev11. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…onfig # Conflicts: # docs/multimodal.md

hallerite and others added 2 commits May 25, 2026 22:03

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread packages/prime-rl-configs/src/prime_rl/configs/shared.py

mikasenghaas reviewed May 25, 2026

View reviewed changes

Comment thread packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py Outdated

Comment thread packages/prime-rl-configs/src/prime_rl/configs/shared.py Outdated

cursor Bot reviewed May 25, 2026

View reviewed changes

Comment thread tests/unit/train/test_runs.py

hallerite and others added 6 commits May 25, 2026 23:21

Merge branch 'main' into feat/typed-renderer-config

351e1f4

Merge branch 'main' into feat/typed-renderer-config

74f1376

samsja reviewed May 26, 2026

View reviewed changes

Comment thread src/prime_rl/utils/client.py Outdated

cursor Bot reviewed May 26, 2026

View reviewed changes

Comment thread src/prime_rl/orchestrator/orchestrator.py

cursor Bot reviewed May 26, 2026

View reviewed changes

mikasenghaas reviewed May 26, 2026

View reviewed changes

eligotts requested a review from mikasenghaas May 26, 2026 17:56

mikasenghaas previously approved these changes May 26, 2026

View reviewed changes

eligotts and others added 2 commits May 26, 2026 18:37

Merge remote-tracking branch 'origin/main' into feat/typed-renderer-c…

d82a58d

…onfig # Conflicts: # docs/multimodal.md

eligotts dismissed mikasenghaas’s stale review via d82a58d May 26, 2026 18:44

mikasenghaas mentioned this pull request May 26, 2026

exp: verifiers v1 smoke configs #2637

Draft

eligotts requested review from mikasenghaas and samsja May 26, 2026 18:49

mikasenghaas approved these changes May 26, 2026

View reviewed changes

eligotts merged commit 6d17559 into main May 26, 2026
18 checks passed

mikasenghaas mentioned this pull request May 29, 2026

How should Prime-RL disable Qwen3.5 thinking in renderer mode? #2662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consume the typed RendererConfig surface#2635

Consume the typed RendererConfig surface#2635
eligotts merged 14 commits into
mainfrom
feat/typed-renderer-config

hallerite commented May 25, 2026 •

edited by eligotts

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 26, 2026

Uh oh!

Uh oh!

Uh oh!

mikasenghaas May 26, 2026

Uh oh!

mikasenghaas May 26, 2026

Uh oh!

mikasenghaas left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		use_renderer: bool = True
		"""Use the renderer-backed TITO client (client-side tokenization via the ``renderers`` package, served by ``/v1/generate``). When True, the ``[orchestrator.renderer]`` block (name / tool_parser / reasoning_parser / pool_size) applies. Default for both text-only and VLM rollouts; VLMs require it. False falls back to MITO (``openai_chat_completions``)."""

Conversation

hallerite commented May 25, 2026 • edited by eligotts Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

TOML before / after

Dependencies (all landed)

Validation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 26, 2026

Choose a reason for hiding this comment

Missing breaking config changelog

Uh oh!

Uh oh!

Uh oh!

mikasenghaas May 26, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas May 26, 2026

Choose a reason for hiding this comment

Uh oh!

mikasenghaas left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hallerite commented May 25, 2026 •

edited by eligotts

Loading