Skip to content

fix(data): serialize extra_info as JSON string in verl postprocessing#603

Open
bingshao333 wants to merge 1 commit into
rllm-org:mainfrom
bingshao333:fix/extra-info-json-serialization
Open

fix(data): serialize extra_info as JSON string in verl postprocessing#603
bingshao333 wants to merge 1 commit into
rllm-org:mainfrom
bingshao333:fix/extra-info-json-serialization

Conversation

@bingshao333

Copy link
Copy Markdown

Fixes #364.\n\nThis PR serializes extra_info with json.dumps(..., default=str) during Verl postprocessing.\n\nPreviously, extra_info was stored as a Python dict when generating the _verl.parquet companion dataset. Explicitly serializing it keeps the data format consistent with Verl's expected JSON-string handling in verl.utils.dataset.rl_dataset.py.

When saving datasets to parquet via apply_verl_postprocessing, the extra_info field was stored as a Python dict. However, parquet serializes nested dicts as JSON strings. When verl's rl_dataset.py reads the parquet file, it needs to json.loads() the extra_info field to access its contents.

This fix explicitly serializes extra_info using json.dumps() so that the data format is consistent and verl can correctly parse it with json.loads().

Fixes rllm-org#364
Copilot AI review requested due to automatic review settings May 31, 2026 05:39

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Changes the extra_info field in verl postprocessing output from a raw dict to a JSON-serialized string.

Changes:

  • Serializes entry via json.dumps(..., default=str) instead of passing the dict directly.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rllm/data/dataset.py
"ground_truth": None,
},
"extra_info": entry,
"extra_info": json.dumps(entry, default=str),
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

'str' object has no attribute 'get'

2 participants