Support RLM skills for V1 tools by xeophon · Pull Request #1444 · PrimeIntellect-ai/verifiers

xeophon · 2026-05-22T22:01:17Z

Summary

stage taskset skills plus generated endpoint-backed RLM skills from resolved V1 tools
pass the rollout endpoint root URL into the RLM command environment
document generic RLM tool skill staging

Verification

uv run ruff check --fix verifiers/v1/packages/harnesses/rlm.py verifiers/v1/packages/harnesses/rlm_skills.py tests/test_v1_rlm_swe.py
uv run ruff format verifiers/v1/packages/harnesses/rlm.py verifiers/v1/packages/harnesses/rlm_skills.py tests/test_v1_rlm_swe.py
uv run ty check verifiers/v1/packages/harnesses/rlm_skills.py tests/test_v1_rlm_swe.py
uv run pytest tests/test_v1_rlm_swe.py -q
uv run pre-commit run --all-files

Note

Medium Risk
Introduces dynamic, per-rollout skill directory generation and changes RLM to set /rlm/skills via a callable loader, which may affect sandbox uploads and runtime behavior. Adds new code that generates and executes Python modules that call the interception endpoint, so mistakes could break tool calling or leak/incorrectly use endpoint credentials.

Overview
RLM now passes VF_ENDPOINT_ROOT_URL into the command environment and, when skills are not explicitly configured, populates /rlm/skills via a per-task/state loader rather than a static directory.

Adds rlm_skills.py to stage a cache-backed skills directory that merges any taskset-provided skills with auto-generated Python skill packages for each resolved V1 tool; generated skills POST to .../vf/tools/{TOOL_NAME} with a bearer token and fixed OpenAI/Python User-Agent.

Updates docs to describe endpoint-backed skill staging, and expands test_v1_rlm_swe.py to cover staging behavior, cache path sanitization, generated-skill endpoint calls, and required-parameter ordering.

^{Reviewed by Cursor Bugbot for commit e69611e. Bugbot is set up for automated code reviews on this repo. Configure here.}

Note

Add generated endpoint-backed RLM skills for resolved V1 tools

The RLM harness now generates installable skill packages for each resolved V1 tool when using taskset-discovered skills, staging them alongside any provided source skills in a per-task cache directory.
Generated skills in rlm_skills.py POST to /vf/tools/{TOOL_NAME} using VF_ENDPOINT_ROOT_URL, authorize via Bearer token, and set User-Agent: OpenAI/Python.
The RLM harness in rlm.py adds VF_ENDPOINT_ROOT_URL to the program environment and wires up a skill_loader callable for the /rlm/skills directory instead of a static path.
Required tool parameters are ordered before optional ones in the generated async function signature; tool names are sanitized to valid Python identifiers with a SHA1 suffix if modified.
Risk: each task setup now performs file I/O to stage skills and removes any pre-existing target directory, which may affect task setup latency.

^{Macroscope summarized e69611e.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 38612491af

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T22:05:19Z

+    task_id = str(task.get("task_id") or task.get("task_name") or "task")
+    key = str(state.get("trajectory_id") or id(state)).replace("/", "_")
+    target = cache_root / task_id / key
+    if target.exists():
+        shutil.rmtree(target)


Sanitize task identifiers before deleting staged skill dirs

stage_rlm_tool_skills uses task_id directly in target = cache_root / task_id / key and then unconditionally runs shutil.rmtree(target) if it exists. If a task row provides an absolute ID (e.g. /tmp/x) or traversal segments (e.g. ../x), target can resolve outside cache_root, so rollout setup may delete arbitrary host directories rather than only the RLM skill cache. This path is data-driven (task_id comes from task payloads), so it should be normalized/rejected before path construction.

Useful? React with 👍 / 👎.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 3861249. Configure here.}

macroscopeapp · 2026-05-22T22:09:49Z

Approvability

Verdict: Needs human review

This PR introduces new RLM skill generation functionality with significant code additions. Two unresolved P1 review comments identify a potential path traversal vulnerability in skill staging and silent dropping of non-identifier tool parameters, both requiring human attention.

^{You can customize Macroscope's approvability policy. Learn more.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e69611ec7d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-05-22T22:21:26Z

+            not isinstance(name, str)
+            or not name.isidentifier()
+            or keyword.iskeyword(name)
+        ):
+            continue


Preserve non-identifier tool parameter names

Do not drop schema properties whose names are not valid Python identifiers here. params_from_schema currently continues for keys like "repo-name" or keyword names, so generated RLM skills omit those arguments entirely and always post incomplete payloads to /vf/tools/{name}. This breaks endpoint-backed skills for MCP/custom tools that legitimately use such JSON property names, especially when those fields are required.

Useful? React with 👍 / 👎.

xeophon · 2026-05-22T22:33:00Z

Pushed follow-up commit e69611e for the review comments:

sanitized task/cache path segments before staging/removing RLM skill directories
ordered generated skill parameters with required params before optional params
deferred the default cache root Path.home() lookup until staging time
added regression coverage for path sanitization and mixed required/optional schema ordering

Local ruff, ty, focused pytest, and pre-commit all passed; PR checks are now pass/skipped.

Support RLM skills for v1 tools

3861249

xeophon mentioned this pull request May 22, 2026

Refactor general_agent for verifiers V1 PrimeIntellect-ai/research-environments#395

Open

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

macroscopeapp Bot reviewed May 22, 2026

View reviewed changes

Comment thread verifiers/v1/packages/harnesses/rlm_skills.py

Comment thread verifiers/v1/packages/harnesses/rlm_skills.py

Comment thread verifiers/v1/packages/harnesses/rlm_skills.py Outdated

cursor Bot reviewed May 22, 2026

View reviewed changes

Comment thread verifiers/v1/packages/harnesses/rlm_skills.py

Fix RLM skill staging edge cases

e69611e

chatgpt-codex-connector Bot reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support RLM skills for V1 tools#1444

Support RLM skills for V1 tools#1444
xeophon wants to merge 2 commits into
mainfrom
fix/v1-rlm-tool-skills

xeophon commented May 22, 2026 •

edited by macroscopeapp Bot

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

macroscopeapp Bot commented May 22, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Uh oh!

xeophon commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xeophon commented May 22, 2026 • edited by macroscopeapp Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Add generated endpoint-backed RLM skills for resolved V1 tools

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

macroscopeapp Bot commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Approvability

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 22, 2026

Choose a reason for hiding this comment

Uh oh!

xeophon commented May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xeophon commented May 22, 2026 •

edited by macroscopeapp Bot

Loading

macroscopeapp Bot commented May 22, 2026 •

edited

Loading