Qwen3VLRenderer misclassifies text parts as images when content list has been through HF Dataset round-trip

## Summary

`_is_image_part` in `renderers/qwen3_vl.py:68` uses a permissive fallback (`return "image" in item or "image_url" in item`) that misfires when the content list has been through `datasets.Dataset.from_list` — Arrow schema unification adds an `image_url: None` key to every text part. The fallback returns True, the renderer calls `emit_image` on the text part, and `_load_pil_image` raises:

    TypeError: Unsupported image source 'NoneType'; expected PIL Image, bytes,
    path, http(s):// URL, file:// URL, or data: URI.

Every rollout fails. Reproduced while dogfooding prime-rl#2473 with the
`tic-tac-toe` env (`prime env pull prime/tic-tac-toe`), whose dataset
constructor embeds an image in the initial prompt.

## Minimal repro

```python
from datasets import Dataset
row = {"prompt": [{"role": "user", "content": [
    {"type": "text", "text": "hello"},
    {"type": "image_url", "image_url": {"url": "data:image/png;base64,XXX"}},
]}]}
hf = Dataset.from_list([row])
print(hf[0]["prompt"][0]["content"][0])
# => {'image_url': None, 'text': 'hello', 'type': 'text'}
#                ^^^^^^^^^^^^^^^^^^^^^^ Arrow unified the schema across the list
```

Now feed this to the renderer:

```python
from transformers import AutoTokenizer
from renderers import create_renderer
tok = AutoTokenizer.from_pretrained("Qwen/Qwen3-VL-4B-Instruct", trust_remote_code=True)
r = create_renderer(tok, renderer="auto")
r.render(list(hf[0]["prompt"]), add_generation_prompt=True)
# TypeError: Unsupported image source 'NoneType'; ...
```

## Why the fallback was added

The docstring on `_is_image_part`:

> Permissive fallback: chat templates check ``'image' in content`` to accept loosely-shaped image parts, so mirror that.

Fair intent — but the fallback can't distinguish "key absent vs. key present with `None` value", and HF Arrow schema unification puts every dict in the list through a "union of keys, fill missing with None" pass. So an envelope of `{"type": "text", "text": "x", "image_url": None}` looks like an image part to the fallback.

## Suggested fix

The `type` field is authoritative when present. Make the fallback only fire when `type` is absent, and require the candidate key to have a truthy value:

```python
def _is_image_part(item: Any) -> bool:
    if not isinstance(item, dict):
        return False
    t = item.get("type")
    if t in ("image", "image_url"):
        return True
    if t is not None:
        return False  # type-tagged but not an image type
    # Untyped fallback: require a truthy image/image_url value
    return bool(item.get("image")) or bool(item.get("image_url"))
```

Same fix applies to `_is_video_part` (`renderers/qwen3_vl.py:78`).

Also worth tightening `_load_pil_image` to raise a clearer error if it's called on a non-image part — current `TypeError: ... 'NoneType' ...` is technically accurate but the caller mis-dispatched.

## Why it didn't bite color_codeword

`color_codeword` constructs its dataset with **empty initial prompts** and adds images later via `env_response`/`setup_state`. So the dataset schema for `prompt` has no image content parts to unify against, and text parts come through clean. Any env that puts image content into the initial prompt (e.g. `tic-tac-toe`'s `visual_prompt = [{"role": "user", "content": [text + image]}]`) hits this.

## Versions

- `renderers==0.1.8.dev2`
- `datasets==4.x`
- discovered against `Qwen3VLRenderer`; same `_is_image_part`/`_is_video_part` pattern exists in `Qwen35Renderer` and likely others — please audit.

## Workaround in tree

For now, envs can construct their initial prompts so that the dataset only carries images via columns that never go through schema unification (HF Arrow doesn't unify scalar `str` columns), or strip the `None` keys client-side before rendering. Cleanest fix is at the renderer.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen3VLRenderer misclassifies text parts as images when content list has been through HF Dataset round-trip #40

Summary

Minimal repro

Why the fallback was added

Suggested fix

Why it didn't bite color_codeword

Versions

Workaround in tree

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Qwen3VLRenderer misclassifies text parts as images when content list has been through HF Dataset round-trip #40

Description

Summary

Minimal repro

Why the fallback was added

Suggested fix

Why it didn't bite color_codeword

Versions

Workaround in tree

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions