[issue] Surprising Performance Drop When Using <think> Instead of <reasoning> as Custom Tags for Fine-tuning #3039
Replies: 3 comments
-
|
Ok this is a very odd issue since the instruct versions with reasoning use so theoretically should perform better than but I do know you're using Qwen3-Base which shouldn't have that much impact. Honestly, we aren't exactly sure what the issue is since your results showcase the opposite of what Qwen3 uses so the solution might just be to use the reasoning tag |
Beta Was this translation helpful? Give feedback.
-
|
Thanks for the quick response! I'm using the qwen3-4b-base model, not an instruct-tuned version, which is consistent with the official Unsloth example I'm following. |
Beta Was this translation helpful? Give feedback.
-
|
For now I will be moving this issue to discussion but if you have anymore questions please feel free to ask or if anyone wants to add to the discussion! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello Unsloth team!
Please excuse this beginner question. I'm new to the world of fine-tuning, and your library has been a fantastic and accessible starting point for me. While experimenting, I've encountered some model behavior that I don't understand and was hoping to get some clarification on what feels like a fundamental concept.
1. Did you update?
Yes,
pip install --upgrade unslothis up to date.2.
ColaborKaggleor local / cloudLocal.
3. Number GPUs used
1x NVIDIA GeForce RTX 4090
4. Which notebook? Please link!
I only modified the custom tag in the official qwen3-4b-gpro example and removed some unnecessary output checks. Below is the link to the online notebook. https://colab.research.google.com/drive/1id4WqGn3yDZ4uOEmQI5HCR8UM1S64H07?usp=sharing
5. Which Unsloth version, TRL version, etc.?
Transformers: 4.53.2. vLLM: 0.9.2.
NVIDIA GeForce RTX 4090. Num GPUs = 2. Max memory: 23.514 GB. Platform: Linux.
Torch: 2.7.0+cu126. CUDA: 8.9. CUDA Toolkit: 12.6. Triton: 3.3.0
6. Which trainer?
GRPOTrainer(but the same issue is observable withSFTTrainer).Problem Description
I am trying to fine-tune the
unsloth/Qwen3-8B-Basemodel for mathematical reasoning. My goal is to teach the model to first "think" about the problem and then provide a final answer, using a specific format.I conducted an experiment with two scenarios. The only difference between them was the custom tags I used in my data formatting.
Scenario A: This works perfectly.
I used
<reasoning>and<answer>as my custom tags. The model learns the format very well and generates responses that follow theassistant: <reasoning>...</reasoning><answer>...</answer>structure.Scenario B: This performs very poorly.
I changed the tags from
<reasoning>to<think>. So the target format becameassistant: <think>...</think><answer>...</answer>. To my surprise, the model completely fails to learn this format. The output is often incoherent, and it doesn't follow the desired structure at all.Is there something wrong with my code? How should I fix it? Thank you for your time!
Beta Was this translation helpful? Give feedback.
All reactions