Support RLM skills for V1 tools#1444
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 38612491af
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| task_id = str(task.get("task_id") or task.get("task_name") or "task") | ||
| key = str(state.get("trajectory_id") or id(state)).replace("/", "_") | ||
| target = cache_root / task_id / key | ||
| if target.exists(): | ||
| shutil.rmtree(target) |
There was a problem hiding this comment.
Sanitize task identifiers before deleting staged skill dirs
stage_rlm_tool_skills uses task_id directly in target = cache_root / task_id / key and then unconditionally runs shutil.rmtree(target) if it exists. If a task row provides an absolute ID (e.g. /tmp/x) or traversal segments (e.g. ../x), target can resolve outside cache_root, so rollout setup may delete arbitrary host directories rather than only the RLM skill cache. This path is data-driven (task_id comes from task payloads), so it should be normalized/rejected before path construction.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 3861249. Configure here.
ApprovabilityVerdict: Needs human review This PR introduces new RLM skill generation functionality with significant code additions. Two unresolved P1 review comments identify a potential path traversal vulnerability in skill staging and silent dropping of non-identifier tool parameters, both requiring human attention. You can customize Macroscope's approvability policy. Learn more. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e69611ec7d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| not isinstance(name, str) | ||
| or not name.isidentifier() | ||
| or keyword.iskeyword(name) | ||
| ): | ||
| continue |
There was a problem hiding this comment.
Preserve non-identifier tool parameter names
Do not drop schema properties whose names are not valid Python identifiers here. params_from_schema currently continues for keys like "repo-name" or keyword names, so generated RLM skills omit those arguments entirely and always post incomplete payloads to /vf/tools/{name}. This breaks endpoint-backed skills for MCP/custom tools that legitimately use such JSON property names, especially when those fields are required.
Useful? React with 👍 / 👎.
|
Pushed follow-up commit
Local |

Summary
Verification
Note
Medium Risk
Introduces dynamic, per-rollout skill directory generation and changes
RLMto set/rlm/skillsvia a callable loader, which may affect sandbox uploads and runtime behavior. Adds new code that generates and executes Python modules that call the interception endpoint, so mistakes could break tool calling or leak/incorrectly use endpoint credentials.Overview
RLMnow passesVF_ENDPOINT_ROOT_URLinto the command environment and, when skills are not explicitly configured, populates/rlm/skillsvia a per-task/state loader rather than a static directory.Adds
rlm_skills.pyto stage a cache-backed skills directory that merges any taskset-provided skills with auto-generated Python skill packages for each resolved V1 tool; generated skills POST to.../vf/tools/{TOOL_NAME}with a bearer token and fixedOpenAI/PythonUser-Agent.Updates docs to describe endpoint-backed skill staging, and expands
test_v1_rlm_swe.pyto cover staging behavior, cache path sanitization, generated-skill endpoint calls, and required-parameter ordering.Reviewed by Cursor Bugbot for commit e69611e. Bugbot is set up for automated code reviews on this repo. Configure here.
Note
Add generated endpoint-backed RLM skills for resolved V1 tools
/vf/tools/{TOOL_NAME}usingVF_ENDPOINT_ROOT_URL, authorize via Bearer token, and setUser-Agent: OpenAI/Python.VF_ENDPOINT_ROOT_URLto the program environment and wires up askill_loadercallable for the/rlm/skillsdirectory instead of a static path.Macroscope summarized e69611e.