Skip to content

Consume the typed RendererConfig surface#2635

Merged
eligotts merged 14 commits into
mainfrom
feat/typed-renderer-config
May 26, 2026
Merged

Consume the typed RendererConfig surface#2635
eligotts merged 14 commits into
mainfrom
feat/typed-renderer-config

Conversation

@hallerite
Copy link
Copy Markdown
Member

@hallerite hallerite commented May 25, 2026

Summary

Switches the orchestrator and SFT trainer to consume the typed
renderers.RendererConfig discriminated union (PrimeIntellect-ai/renderers#60)
and the matching vf.ClientConfig.renderer_config field
(PrimeIntellect-ai/verifiers#1467) — both now merged.

  • OrchestratorConfig.renderer: RendererConfig | None — a typed discriminated union (default AutoRendererConfig), None opts into MITO. prime-rl's own duplicate RendererConfig in configs.shared is removed; the renderers package is the single source of truth.
  • Fields are flat under [orchestrator.renderer], dispatched on the name discriminator. No .settings nesting (dropped after review).
  • renderer_pool_sizepool_size, moved to the orchestrator root.
  • setup_inference_pool / setup_clients / Static+ElasticInferencePool collapse the multi-arg renderer surface to a single renderer_config arg.
  • A model_serializer writes renderer = "None" so MITO configs round-trip through the saved orch.toml.

TOML before / after

Before:

[orchestrator]
use_renderer = true

[orchestrator.renderer]
name = "qwen3"

After (typed; or omit the block entirely for AutoRendererConfig):

[orchestrator.renderer]
name = "qwen3"
enable_thinking = false

MITO mode:

renderer = "None"

Bogus combinations (e.g. add_vision_id under name = "qwen3") raise pydantic.ValidationError at config-load time instead of silently no-op'ing.

Dependencies (all landed)

Merged latest main (docs revamp #2602 + wandb-metrics-by-role #2642); resolved the docs/multimodal.md modify/delete conflict and fixed a stale use_renderer = true reference in docs/training.md.

Validation

  • All 39 RL-shaped configs/*.toml load cleanly against the new RLConfig schema; grep confirms zero stale shapes (.settings, use_renderer, renderer_pool_size) remain in-tree.
  • MITO round-trip verified: TOML renderer = "None"cfg.renderer is Nonemodel_dumprenderer = "None" → reload → None.
  • End-to-end smoke on Qwen3.6-35B-A3B + mini-browse-apps (renderer/TITO route, no [renderer] block → AutoRendererConfigQwen36Renderer): 32 rollouts / 686 assistant turns, reward 0.64, renderer route exclusively (0 chat-completions). enable_thinking=false propagated correctly — 0 turns with reasoning content.
  • pytest tests/unit/orchestrator/test_orchestrator_setup.py tests/unit/test_configs.py — pass (the one failing case is an untracked local example config using the old shape, not in this PR).

Note

Medium Risk
Changes how rollout tokenization is selected (TITO vs MITO) and config validation for VLMs and auto-renderer resolution; misconfigured TOML could fail at load time or pick the wrong client until upstream renderers/verifiers deps align.

Overview
Replaces the orchestrator.use_renderer boolean with orchestrator.renderer: a typed renderers.RendererConfig discriminated union (default AutoRendererConfig) when using the renderer/TITO path, or None for MITO. The duplicate flat RendererConfig in configs.shared is removed in favor of the renderers package; create_renderer and verifiers clients now take a single renderer_config instead of scattered **name / parsers / preserve_*` fields.

orchestrator.pool_size moves to the orchestrator root (no longer under [orchestrator.renderer]). Validators and docs are updated accordingly (VLM requires a non-None renderer, SFT forces renderer=None, pool_size only when a renderer is set). A custom serializer writes renderer = "None" so MITO configs round-trip through saved orch.toml. Sample configs, multimodal docs, prime-rl-configs renderers pin, and unit tests follow the new shape.

Reviewed by Cursor Bugbot for commit 310a2fa. Bugbot is set up for automated code reviews on this repo. Configure here.

hallerite and others added 2 commits May 25, 2026 22:03
Reshapes the orchestrator / SFT renderer plumbing around
``renderers.RendererConfig`` (the typed discriminated union) instead of
flat strings + booleans.

- ``configs.shared.RendererConfig`` now composes ``renderers.RendererConfig``
  as a single ``settings`` field; ``pool_size`` stays as a separate
  prime-rl-side concern. The flat ``name`` / ``tool_parser`` /
  ``reasoning_parser`` / ``preserve_*`` fields are gone.
- ``setup_inference_pool`` / ``setup_clients`` / ``StaticInferencePool``
  / ``ElasticInferencePool`` collapse their six-arg renderer surface to
  a single ``renderer_config: RendererConfig | None`` plus the orthogonal
  ``renderer_model_name`` / ``renderer_pool_size`` fields.
- Orchestrator + SFT trainer call sites pass the typed config straight
  through to ``create_renderer(tokenizer, config.renderer.settings)``.
- Cross-env / unmapped-model validators rewritten to inspect
  ``renderer.settings`` (discriminated-union shape) instead of the flat
  field names.
- Tests updated to construct typed configs (``Qwen3VLRendererConfig()``,
  etc.) and assert against the new call shapes.

Pins ``deps/renderers`` and the ``prime-rl-configs`` ``renderers`` dep to
the renderers PR's tip
(``4c9099debbc9bc4fcc3eb6a118bbfa5c364ec395``) until it merges and a dev
release ships.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The typed-config refactor landed on renderers main as ``c86c50b``
(tag ``renderers-v0.1.8.dev27``). Switching:

- ``deps/renderers`` submodule pointer → ``c86c50b``
- ``prime-rl-configs`` git URL pin → ``@renderers-v0.1.8.dev27``

Same commit, more readable, and stable to anyone reading the diff
after the feature branch is deleted. PyPI dev release for the same
version is pending a publisher-config fix; once it lands the git URL
pin can collapse to a plain ``renderers>=0.1.8.dev27`` PyPI pin.
Comment thread packages/prime-rl-configs/src/prime_rl/configs/shared.py
The new RendererConfig accepts only `settings` (typed
`renderers.RendererConfig` discriminated union) and `pool_size`, so
flat `name`/`reasoning_parser`/`preserve_*` keys under
`[orchestrator.renderer]` now fail pydantic validation. Move those
keys under `[orchestrator.renderer.settings]`. `pool_size` stays in
the outer block. Two configs that only set `preserve_all_thinking`
gain an explicit `name = "auto"` since the discriminator is required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py Outdated
Comment thread packages/prime-rl-configs/src/prime_rl/configs/shared.py Outdated
Per Mika's review: drop prime-rl's RendererConfig wrapper that nested
the renderers' discriminated union under a ``.settings`` field. The
wrapper bought a parallel ``pool_size`` attribute but cost a clunky
two-level TOML layout and made every call site dereference ``.settings``.

Use ``renderers.RendererConfig`` (the typed discriminated union) as
the field type directly. Pool size moves up one level to
``orchestrator.renderer_pool_size``. ``use_renderer: bool`` is gone —
``renderer: RendererConfig | None`` with ``None`` for MITO replaces it
(consistent with how SFT, optimizer, and other optional sub-configs
express absence). TOML callers spell MITO with ``renderer = "None"``,
which ``prime_pydantic_config.BaseConfig._none_str_to_none`` coerces
back to Python None.

Net effect on TOMLs is a wash with the previous commit: 26 in-repo
configs lose the ``.settings`` indirection they gained yesterday, and
one (multimodal) splits ``pool_size`` out of the renderer block.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread tests/unit/train/test_runs.py
hallerite and others added 6 commits May 25, 2026 23:21
The slim ``prime-rl-configs`` wheel (installed standalone, outside the
workspace) now resolves ``renderers`` from PyPI instead of the git tag
URL it used while the dev release was waiting on PyPI trusted-publisher
config. The full ``prime-rl`` workspace still pulls renderers from the
``deps/renderers`` submodule, so this only affects downstream slim
consumers.

Re-locks to refresh the lockfile against the same workspace state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The slim ``prime-rl-configs`` install path (configs only, no GPU
deps) asserts that importing the config classes doesn't pull in
``transformers``. ``from renderers import RendererConfig`` at the top
of ``orchestrator.py`` / ``sft.py`` broke that: ``renderers/__init__``
eagerly imports every concrete renderer module (DefaultRenderer,
Qwen3Renderer, …), each of which imports
``transformers.tokenization_utils.PreTrainedTokenizer``.

Defer the renderers import via:
  - ``TYPE_CHECKING``-guarded ``from renderers import RendererConfig``
    so static type checkers still see the typed discriminated union.
  - ``renderer: Any`` field annotation at runtime, so no resolution
    happens at class definition.
  - ``model_validator(mode="before") _resolve_renderer`` that
    lazy-imports ``renderers``, fills the orchestrator default
    (``AutoRendererConfig()``), and coerces dict input through the
    typed union via ``TypeAdapter``. SFT's default stays ``None``.

End-user semantics unchanged; class import is now transformers-free.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
dev28 lazy-loads concrete renderer classes via PEP 562
``__getattr__`` (renderers#64), so ``from renderers import
RendererConfig`` no longer drags ``transformers`` into ``sys.modules``.
The slim-install CI gate is satisfied without the prior workaround.

Drop the ``Any`` annotation and the ``_resolve_renderer``
model-validator that lazy-resolved the discriminated union from
``OrchestratorConfig`` and ``SFTConfig``. The fields are typed
``RendererConfig | None`` again, and pydantic does the full
discriminator validation at class definition.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- deps/renderers c86c50b -> 2ec28a8 — picks up the PEP 562
  ``__getattr__`` so the workspace install path also lazy-loads
  concrete renderer classes.
- deps/verifiers 04651be -> d107442 — adds the typed
  ``ClientConfig.renderer_config`` field that ``setup_clients``
  forwards. Fixes the unit-test
  ``test_setup_clients_assigns_renderer_and_dp_rank_headers``
  AttributeError on ``renderer_config``.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread src/prime_rl/utils/client.py Outdated
Comment thread src/prime_rl/orchestrator/orchestrator.py
Two cleanups on the typed renderer surface:

1. Per Samsja: drop the ``RendererConfig as RendererSettings`` alias
   in ``client.py`` / ``elastic.py``. The alias was a workaround for the
   former local ``RendererConfig`` wrapper class; with that gone, we
   just import ``RendererConfig`` from renderers directly.

2. Per Cursor bugbot: ``OrchestratorConfig.renderer = None`` (MITO) was
   silently dropped by ``model_dump(exclude_none=True)``, so any saved
   ``control/orch.toml`` would reload as the default
   ``AutoRendererConfig()`` (TITO). Add a wrapping ``model_serializer``
   that re-injects ``renderer = "None"`` (string) when the field is
   ``None``, matching what ``BaseConfig._none_str_to_none`` accepts on
   reload. MITO and TITO now round-trip through TOML.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 2072afe. Configure here.


use_renderer: bool = True
"""Use the renderer-backed TITO client (client-side tokenization via the ``renderers`` package, served by ``/v1/generate``). When True, the ``[orchestrator.renderer]`` block (name / tool_parser / reasoning_parser / pool_size) applies. Default for both text-only and VLM rollouts; VLMs require it. False falls back to MITO (``openai_chat_completions``)."""

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing breaking config changelog

Medium Severity

This PR removes orchestrator.use_renderer, moves renderer pool sizing to orchestrator.renderer_pool_size, and replaces the flat [orchestrator.renderer] schema with the typed renderers.RendererConfig union (MITO via renderer = "None"). Those are breaking config changes, but CHANGELOG.md was not updated with a migration entry.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Reviewed by Cursor Bugbot for commit 2072afe. Configure here.

Comment thread packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py Outdated
Comment thread packages/prime-rl-configs/src/prime_rl/configs/orchestrator.py Outdated
renderer: RendererConfig = RendererConfig()
"""Client-side renderer configuration. Only consumed when ``use_renderer=true``."""
renderer: RendererConfig | None = Field(default_factory=AutoRendererConfig)
"""Typed renderer config (``renderers.RendererConfig`` discriminated
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we make this docstring a bit more concise

Comment on lines +587 to +599
@model_serializer(mode="wrap")
def _preserve_mito_renderer(self, handler: SerializerFunctionWrapHandler) -> dict[str, Any]:
"""Emit ``renderer = "None"`` (string) when MITO so
``model_dump(exclude_none=True)`` round-trips: dumped TOML has
``renderer = "None"``, and on reload
``BaseConfig._none_str_to_none`` coerces it back to ``None``.
Without this, a MITO orchestrator config saved to
``control/orch.toml`` would lose the renderer key entirely and
reload as the default ``AutoRendererConfig()`` (TITO)."""
result = handler(self)
if self.renderer is None:
result["renderer"] = "None"
return result
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm this is code smell from the pydantic-config side. can merge for now but can you put an issue to fix this one

…cstring

- renderer field uses direct AutoRendererConfig() default (frozen, no
  factory needed)
- renderer_pool_size -> pool_size on OrchestratorConfig; cascade through
  client.py, elastic.py, orchestrator.py call sites and the test;
  vf.ClientConfig kwarg name (verifiers-side) stays renderer_pool_size
- shorter docstring on the renderer field

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eligotts eligotts requested a review from mikasenghaas May 26, 2026 17:56
mikasenghaas
mikasenghaas previously approved these changes May 26, 2026
Copy link
Copy Markdown
Member

@mikasenghaas mikasenghaas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed, approving this to unblock quickly. short-term cleanup would be to expose resolve_renderer(model_name: str) -> RendererConfig from renderers so we can resolve 1. the renderer 2. the chat template kwargs in the prime-rl configs

eligotts and others added 2 commits May 26, 2026 18:37
Picks up verifiers main with PR #1463 (surface every parsed tool_call,
drop status filter) and PR #1468 (pipe chat_template_kwargs into typed
RendererConfig) now that both are merged. verifiers 0.1.15.dev11.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@eligotts eligotts merged commit 6d17559 into main May 26, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants