Hi team,
Great work on TraceRL and the TraDo models!
In Appendix B.5, you mentioned using 1.7k random SFT samples from CodeContest (generated by Qwen2.5-32B-Instruct) for the cold start.
Are there any plans to open-source this specific 1.7k SFT dataset? 'm very interested in this data, particularly to see exactly how the eos token was formatted and appended to stabilize the subsequent RL training.
It would be super helpful for reproducing the cold start and RL pipeline.
Thanks!
Hi team,
Great work on TraceRL and the TraDo models!
In Appendix B.5, you mentioned using 1.7k random SFT samples from CodeContest (generated by Qwen2.5-32B-Instruct) for the cold start.
Are there any plans to open-source this specific 1.7k SFT dataset? 'm very interested in this data, particularly to see exactly how the
eostoken was formatted and appended to stabilize the subsequent RL training.It would be super helpful for reproducing the cold start and RL pipeline.
Thanks!