Summary
renderers.client._build_mm_features (renderers/client.py:202) raises NotImplementedError for Qwen35Renderer, even though Qwen35Renderer is a MultimodalRenderer and ships the same pixel_values + image_grid_thw payload shape as Qwen3VLRenderer.
Reproduced while dogfooding the prime-rl multimodal renderer path (prime-rl PR #2473) with Qwen/Qwen3.5-2B on the color-codeword env: every rollout fails immediately with:
NotImplementedError: Multimodal serialization not implemented for Qwen35Renderer.
Add a dispatch branch in renderers.client._build_mm_features.
Repro
from transformers import AutoTokenizer
from renderers import create_renderer, is_multimodal
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3.5-2B", trust_remote_code=True)
r = create_renderer(tok, renderer="auto")
print(type(r).__name__) # Qwen35Renderer
print(is_multimodal(r)) # True
# ... any /inference/v1/generate call that goes through renderers.client.generate
# with multi_modal_data populated raises NotImplementedError.
Suggested fix
Current code (renderers/client.py:236):
from renderers.qwen3_vl import Qwen3VLRenderer
if issubclass(renderer_cls, Qwen3VLRenderer):
return _build_qwen_vl_features(mm_data, spatial_merge_size=2)
raise NotImplementedError(...)
renderers/qwen35.py:325–328 packs pixel_values and image_grid_thw into mm_items["image"] — identical to Qwen3VLRenderer. Reading merge_size from proc.image_processor.merge_size (renderers/qwen35.py:187) confirms the Qwen2-VL family field factory should apply. The fix looks like:
from renderers.qwen35 import Qwen35Renderer
if issubclass(renderer_cls, (Qwen3VLRenderer, Qwen35Renderer)):
return _build_qwen_vl_features(mm_data, spatial_merge_size=2)
One caveat to verify: spatial_merge_size is hardcoded to 2 in _build_qwen_vl_features. Qwen3.5's image processor exposes merge_size per-model; if any Qwen3.5 size ships merge_size != 2, the helper should read it from mm_items metadata rather than hardcoding.
Versions
renderers==0.1.8.dev2
verifiers==0.1.15.dev7 (verifiers main, post #1395)
- prime-rl PR #2473 branch
feat/multimodal-renderer-pr
- vLLM 0.20.2
Workaround used during dogfood
Pivoted the test run to Qwen/Qwen3-VL-4B-Instruct (PR #2473's tested model), which routes through the Qwen3VLRenderer dispatch and works.
Summary
renderers.client._build_mm_features(renderers/client.py:202) raisesNotImplementedErrorforQwen35Renderer, even thoughQwen35Rendereris aMultimodalRendererand ships the samepixel_values + image_grid_thwpayload shape asQwen3VLRenderer.Reproduced while dogfooding the prime-rl multimodal renderer path (prime-rl PR #2473) with
Qwen/Qwen3.5-2Bon thecolor-codewordenv: every rollout fails immediately with:Repro
Suggested fix
Current code (
renderers/client.py:236):renderers/qwen35.py:325–328packspixel_valuesandimage_grid_thwintomm_items["image"]— identical toQwen3VLRenderer. Readingmerge_sizefromproc.image_processor.merge_size(renderers/qwen35.py:187) confirms the Qwen2-VL family field factory should apply. The fix looks like:One caveat to verify:
spatial_merge_sizeis hardcoded to 2 in_build_qwen_vl_features. Qwen3.5's image processor exposesmerge_sizeper-model; if any Qwen3.5 size shipsmerge_size != 2, the helper should read it frommm_itemsmetadata rather than hardcoding.Versions
renderers==0.1.8.dev2verifiers==0.1.15.dev7(verifiers main, post #1395)feat/multimodal-renderer-prWorkaround used during dogfood
Pivoted the test run to
Qwen/Qwen3-VL-4B-Instruct(PR #2473's tested model), which routes through theQwen3VLRendererdispatch and works.