Skip to content

Inquires about custom reward and tool use #213

@altria-zewei-wang

Description

@altria-zewei-wang

Dear ROLL team,

Thanks for your great work! I was trying to implement a near Search-R1 setup with custom format-based reward function with my custom ToolUse.
Per https://alibaba.github.io/ROLL/docs/English/UserGuide/agentic/Tool_Use, I have registered my tool.
Per https://alibaba.github.io/ROLL/docs/%E7%AE%80%E4%BD%93%E4%B8%AD%E6%96%87/%E6%89%A9%E5%B1%95%E5%BC%80%E5%8F%91%E6%89%8B%E5%86%8C/custom_reward_cn, I have registered my reward worker as well.

However, it has been a bit hard for me to find a good example yaml that would help me with my requirements. gem_math_hotpotqa_search does not seem to have the reward function configs, and traj_envs_gem_math appear to be a bit unclear on how it's used. Is it the right way to register everything in config/tra_envs_blabla, and cite them in custom_envs in gem_math_hotpotqa_search?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions