-
Notifications
You must be signed in to change notification settings - Fork 149
Description
Dear ROLL team,
Thanks for your great work! I was trying to implement a near Search-R1 setup with custom format-based reward function with my custom ToolUse.
Per https://alibaba.github.io/ROLL/docs/English/UserGuide/agentic/Tool_Use, I have registered my tool.
Per https://alibaba.github.io/ROLL/docs/%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87/%E6%89%A9%E5%B1%95%E5%BC%80%E5%8F%91%E6%89%8B%E5%86%8C/custom_reward_cn, I have registered my reward worker as well.
However, it has been a bit hard for me to find a good example yaml that would help me with my requirements. gem_math_hotpotqa_search does not seem to have the reward function configs, and traj_envs_gem_math appear to be a bit unclear on how it's used. Is it the right way to register everything in config/tra_envs_blabla, and cite them in custom_envs in gem_math_hotpotqa_search?