Summary
_is_image_part in renderers/qwen3_vl.py:68 uses a permissive fallback (return "image" in item or "image_url" in item) that misfires when the content list has been through datasets.Dataset.from_list — Arrow schema unification adds an image_url: None key to every text part. The fallback returns True, the renderer calls emit_image on the text part, and _load_pil_image raises:
TypeError: Unsupported image source 'NoneType'; expected PIL Image, bytes,
path, http(s):// URL, file:// URL, or data: URI.
Every rollout fails. Reproduced while dogfooding prime-rl#2473 with the
tic-tac-toe env (prime env pull prime/tic-tac-toe), whose dataset
constructor embeds an image in the initial prompt.
Minimal repro
from datasets import Dataset
row = {"prompt": [{"role": "user", "content": [
{"type": "text", "text": "hello"},
{"type": "image_url", "image_url": {"url": "data:image/png;base64,XXX"}},
]}]}
hf = Dataset.from_list([row])
print(hf[0]["prompt"][0]["content"][0])
# => {'image_url': None, 'text': 'hello', 'type': 'text'}
# ^^^^^^^^^^^^^^^^^^^^^^ Arrow unified the schema across the list
Now feed this to the renderer:
from transformers import AutoTokenizer
from renderers import create_renderer
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-VL-4B-Instruct", trust_remote_code=True)
r = create_renderer(tok, renderer="auto")
r.render(list(hf[0]["prompt"]), add_generation_prompt=True)
# TypeError: Unsupported image source 'NoneType'; ...
Why the fallback was added
The docstring on _is_image_part:
Permissive fallback: chat templates check 'image' in content to accept loosely-shaped image parts, so mirror that.
Fair intent — but the fallback can't distinguish "key absent vs. key present with None value", and HF Arrow schema unification puts every dict in the list through a "union of keys, fill missing with None" pass. So an envelope of {"type": "text", "text": "x", "image_url": None} looks like an image part to the fallback.
Suggested fix
The type field is authoritative when present. Make the fallback only fire when type is absent, and require the candidate key to have a truthy value:
def _is_image_part(item: Any) -> bool:
if not isinstance(item, dict):
return False
t = item.get("type")
if t in ("image", "image_url"):
return True
if t is not None:
return False # type-tagged but not an image type
# Untyped fallback: require a truthy image/image_url value
return bool(item.get("image")) or bool(item.get("image_url"))
Same fix applies to _is_video_part (renderers/qwen3_vl.py:78).
Also worth tightening _load_pil_image to raise a clearer error if it's called on a non-image part — current TypeError: ... 'NoneType' ... is technically accurate but the caller mis-dispatched.
Why it didn't bite color_codeword
color_codeword constructs its dataset with empty initial prompts and adds images later via env_response/setup_state. So the dataset schema for prompt has no image content parts to unify against, and text parts come through clean. Any env that puts image content into the initial prompt (e.g. tic-tac-toe's visual_prompt = [{"role": "user", "content": [text + image]}]) hits this.
Versions
renderers==0.1.8.dev2
datasets==4.x
- discovered against
Qwen3VLRenderer; same _is_image_part/_is_video_part pattern exists in Qwen35Renderer and likely others — please audit.
Workaround in tree
For now, envs can construct their initial prompts so that the dataset only carries images via columns that never go through schema unification (HF Arrow doesn't unify scalar str columns), or strip the None keys client-side before rendering. Cleanest fix is at the renderer.
Summary
_is_image_partinrenderers/qwen3_vl.py:68uses a permissive fallback (return "image" in item or "image_url" in item) that misfires when the content list has been throughdatasets.Dataset.from_list— Arrow schema unification adds animage_url: Nonekey to every text part. The fallback returns True, the renderer callsemit_imageon the text part, and_load_pil_imageraises:Every rollout fails. Reproduced while dogfooding prime-rl#2473 with the
tic-tac-toeenv (prime env pull prime/tic-tac-toe), whose datasetconstructor embeds an image in the initial prompt.
Minimal repro
Now feed this to the renderer:
Why the fallback was added
The docstring on
_is_image_part:Fair intent — but the fallback can't distinguish "key absent vs. key present with
Nonevalue", and HF Arrow schema unification puts every dict in the list through a "union of keys, fill missing with None" pass. So an envelope of{"type": "text", "text": "x", "image_url": None}looks like an image part to the fallback.Suggested fix
The
typefield is authoritative when present. Make the fallback only fire whentypeis absent, and require the candidate key to have a truthy value:Same fix applies to
_is_video_part(renderers/qwen3_vl.py:78).Also worth tightening
_load_pil_imageto raise a clearer error if it's called on a non-image part — currentTypeError: ... 'NoneType' ...is technically accurate but the caller mis-dispatched.Why it didn't bite color_codeword
color_codewordconstructs its dataset with empty initial prompts and adds images later viaenv_response/setup_state. So the dataset schema forprompthas no image content parts to unify against, and text parts come through clean. Any env that puts image content into the initial prompt (e.g.tic-tac-toe'svisual_prompt = [{"role": "user", "content": [text + image]}]) hits this.Versions
renderers==0.1.8.dev2datasets==4.xQwen3VLRenderer; same_is_image_part/_is_video_partpattern exists inQwen35Rendererand likely others — please audit.Workaround in tree
For now, envs can construct their initial prompts so that the dataset only carries images via columns that never go through schema unification (HF Arrow doesn't unify scalar
strcolumns), or strip theNonekeys client-side before rendering. Cleanest fix is at the renderer.