Why gpt-oss doesn’t support RL/GRPO Training? #3236

alex-ht · 2025-08-29T09:32:17Z

alex-ht
Aug 29, 2025

I saw in the FAQ that it’s currently not possible to perform RL or GRPO training on gpt-oss. Could anyone explain why? Isn’t it possible to use transformers for inference and compute the GRPO loss?

Answered by shimmyshimmer

Sep 28, 2025

@alex-ht @azizDentero what a coincidence, we just supported with a new notebook too! Summary:

We’re introducing gpt-oss RL support and the fastest RL inference and lowest VRAM use vs. any implementation. Blog: https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning

Unsloth now offers the fastest inference (~3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss.
Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to enable faster inference
gpt-oss-20b GSPO free Colab notebook
This notebook automatically creates faster matrix multiplication kernels and uses a new Unsloth reward func…

View full answer

azizDentero · 2025-09-25T12:45:59Z

azizDentero
Sep 25, 2025

For me the same and I am trying to work it around but I can't go around the runtime error: RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.

But after some research I think the error is caused because how gpt oss is inferencing. maybe it's because of the mxfp 4 since it's already unique to use this on models as big as 20b or 120b.

0 replies

shimmyshimmer · 2025-09-28T08:19:54Z

shimmyshimmer
Sep 28, 2025
Maintainer

@alex-ht @azizDentero what a coincidence, we just supported with a new notebook too! Summary:

We’re introducing gpt-oss RL support and the fastest RL inference and lowest VRAM use vs. any implementation. Blog: https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning

Unsloth now offers the fastest inference (~3x faster), lowest VRAM (50% less) and most context (8x longer) for gpt-oss RL vs. any implementation - with no accuracy loss.
Since RL on gpt-oss isn't yet vLLM compatible, we rewrote Transformers inference code to enable faster inference
gpt-oss-20b GSPO free Colab notebook
This notebook automatically creates faster matrix multiplication kernels and uses a new Unsloth reward function. We also show how to counteract reward-hacking which is one of RL's biggest challenges.

1 reply

azizDentero Sep 28, 2025

Thank you ! I actually managed to start the GPRO training by coincidence when I forced scoping on the temporarypatch/gpt_oss.py with a debugging print XD.

But I wasn't sure if the model's weights are actually adjusting in the right way.
anyway thanks for the lucky update! I really needed to use RGPO soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Why gpt-oss doesn’t support RL/GRPO Training? #3236

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Why gpt-oss doesn’t support RL/GRPO Training? #3236

Uh oh!

alex-ht Aug 29, 2025

Replies: 2 comments · 1 reply

Uh oh!

azizDentero Sep 25, 2025

Uh oh!

shimmyshimmer Sep 28, 2025 Maintainer

Uh oh!

azizDentero Sep 28, 2025

alex-ht
Aug 29, 2025

Replies: 2 comments 1 reply

azizDentero
Sep 25, 2025

shimmyshimmer
Sep 28, 2025
Maintainer