Skip to content

Conversation

davanstrien
Copy link
Member

This PR adds a wandb_log_extra_columns parameter to GRPOConfig that allows logging additional dataset columns to W&B tables during GRPO training. I found this helpful for including ground truth labels or other metadata alongside generated completions for easier evaluation.

Implementation

  • Added wandb_log_extra_columns to GRPOConfig:
  • None (default): No extra columns logged
  • []: Log all available dataset columns
  • ["col1", "col2"]: Log specific columns only
  • Modified GRPOTrainer to collect and log specified columns
  • Added support for both standard generation and vLLM backends

Example Usage

  config = GRPOConfig(
      wandb_log_extra_columns=["ground_truth", "difficulty"],
      # ... other config
  )

This is quite a rough POC developed for a specific use case. Opening as a draft to discuss whether this functionality would be useful for TRL more broadly.

Example of logging an extra column for grount_truth

Screenshot 2025-09-09 at 12 09 51

I have done some minimal testing, but mostly made this for myself to start, so would probably benefit from some better tests if it seemed interesting to add to TRL.

davanstrien and others added 3 commits September 9, 2025 10:08
- Add wandb_log_extra_columns parameter to GRPOConfig
- Support None (default), [] (all columns), or specific column list
- Collect and log extra columns during generation
- Include extra columns in W&B table output
- Backward compatible - no changes when parameter is None
- Handles missing columns gracefully
- Gather extra columns data in vLLM server mode
- Handle extra columns in vLLM colocate mode with tensor parallel
- Properly slice data back to process-specific portions
- Ensures extra columns work with all generation backends
@davanstrien
Copy link
Member Author

@qgallouedec I've been using this for my own projects successfully now. Let me know if this seems useful for TRL more generally, and I can make a cleaner PR. No worries if not, can also keep this on my own fork otherwise :)

@qgallouedec
Copy link
Member

thanks @davanstrien!

Ideally we would have something that is compatible with other loggers as well, like trackio. I'll look into it when I have the bandwidth

@davanstrien
Copy link
Member Author

No rush from my side! Also happy to take a look at a more generic approach, but wanted to check the idea in general was of interest for TRL first :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants