Why gpt-oss doesn’t support RL/GRPO Training? #3236
-
|
I saw in the FAQ that it’s currently not possible to perform RL or GRPO training on gpt-oss. Could anyone explain why? Isn’t it possible to use transformers for inference and compute the GRPO loss? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
|
For me the same and I am trying to work it around but I can't go around the runtime error: RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd. But after some research I think the error is caused because how gpt oss is inferencing. maybe it's because of the mxfp 4 since it's already unique to use this on models as big as 20b or 120b. |
Beta Was this translation helpful? Give feedback.
-
|
@alex-ht @azizDentero what a coincidence, we just supported with a new notebook too! Summary: We’re introducing gpt-oss RL support and the fastest RL inference and lowest VRAM use vs. any implementation. Blog: https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning
|
Beta Was this translation helpful? Give feedback.

@alex-ht @azizDentero what a coincidence, we just supported with a new notebook too! Summary:
We’re introducing gpt-oss RL support and the fastest RL inference and lowest VRAM use vs. any implementation. Blog: https://docs.unsloth.ai/new/gpt-oss-reinforcement-learning