Skip to content

Qwen3VLRenderer misclassifies text parts as images when content list has been through HF Dataset round-trip #40

@eligotts

Description

@eligotts

Summary

_is_image_part in renderers/qwen3_vl.py:68 uses a permissive fallback (return "image" in item or "image_url" in item) that misfires when the content list has been through datasets.Dataset.from_list — Arrow schema unification adds an image_url: None key to every text part. The fallback returns True, the renderer calls emit_image on the text part, and _load_pil_image raises:

TypeError: Unsupported image source 'NoneType'; expected PIL Image, bytes,
path, http(s):// URL, file:// URL, or data: URI.

Every rollout fails. Reproduced while dogfooding prime-rl#2473 with the
tic-tac-toe env (prime env pull prime/tic-tac-toe), whose dataset
constructor embeds an image in the initial prompt.

Minimal repro

from datasets import Dataset
row = {"prompt": [{"role": "user", "content": [
    {"type": "text", "text": "hello"},
    {"type": "image_url", "image_url": {"url": "data:image/png;base64,XXX"}},
]}]}
hf = Dataset.from_list([row])
print(hf[0]["prompt"][0]["content"][0])
# => {'image_url': None, 'text': 'hello', 'type': 'text'}
#                ^^^^^^^^^^^^^^^^^^^^^^ Arrow unified the schema across the list

Now feed this to the renderer:

from transformers import AutoTokenizer
from renderers import create_renderer
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-VL-4B-Instruct", trust_remote_code=True)
r = create_renderer(tok, renderer="auto")
r.render(list(hf[0]["prompt"]), add_generation_prompt=True)
# TypeError: Unsupported image source 'NoneType'; ...

Why the fallback was added

The docstring on _is_image_part:

Permissive fallback: chat templates check 'image' in content to accept loosely-shaped image parts, so mirror that.

Fair intent — but the fallback can't distinguish "key absent vs. key present with None value", and HF Arrow schema unification puts every dict in the list through a "union of keys, fill missing with None" pass. So an envelope of {"type": "text", "text": "x", "image_url": None} looks like an image part to the fallback.

Suggested fix

The type field is authoritative when present. Make the fallback only fire when type is absent, and require the candidate key to have a truthy value:

def _is_image_part(item: Any) -> bool:
    if not isinstance(item, dict):
        return False
    t = item.get("type")
    if t in ("image", "image_url"):
        return True
    if t is not None:
        return False  # type-tagged but not an image type
    # Untyped fallback: require a truthy image/image_url value
    return bool(item.get("image")) or bool(item.get("image_url"))

Same fix applies to _is_video_part (renderers/qwen3_vl.py:78).

Also worth tightening _load_pil_image to raise a clearer error if it's called on a non-image part — current TypeError: ... 'NoneType' ... is technically accurate but the caller mis-dispatched.

Why it didn't bite color_codeword

color_codeword constructs its dataset with empty initial prompts and adds images later via env_response/setup_state. So the dataset schema for prompt has no image content parts to unify against, and text parts come through clean. Any env that puts image content into the initial prompt (e.g. tic-tac-toe's visual_prompt = [{"role": "user", "content": [text + image]}]) hits this.

Versions

  • renderers==0.1.8.dev2
  • datasets==4.x
  • discovered against Qwen3VLRenderer; same _is_image_part/_is_video_part pattern exists in Qwen35Renderer and likely others — please audit.

Workaround in tree

For now, envs can construct their initial prompts so that the dataset only carries images via columns that never go through schema unification (HF Arrow doesn't unify scalar str columns), or strip the None keys client-side before rendering. Cleanest fix is at the renderer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions