feat: autobrowse skill — self-improving browser automation by shubh24 · Pull Request #71 · browserbase/skills

shubh24 · 2026-04-07T01:12:37Z

Summary

Adds the autobrowse skill to the skills library
Auto-research loop for building reliable browser navigation skills: inner agent browses → outer agent reads trace → improves strategy.md → repeat
Supports single task (interactive) and parallel multi-task mode via Claude Code sub-agents
Inspired by Karpathy's autoresearch pattern applied to browser automation

What's included

File	Purpose
`SKILL.md`	`/autobrowse` skill — single entry point for everything
`scripts/evaluate.ts`	Inner agent harness (Anthropic API + browse CLI)
`references/example-task.md`	Template for writing `task.md`
`references/example-skill.md`	Template for a graduated `skill.md`
`README.md`	Setup + project structure guide
`EXAMPLES.md`	Usage examples
`REFERENCE.md`	CLI flags, env vars, trace artifacts

How it works

/autobrowse --task my-portal          # single task loop
/autobrowse --all --env remote        # parallel via sub-agents

Customer creates tasks/<name>/task.md in their project, runs the skill, gets back a skill.md they can drop into any stagehand/browser-use agent.

🤖 Generated with Claude Code

Note

Medium Risk
Adds a new Node-based harness that executes browse CLI commands and writes traces to disk; while it restricts execution to browse and avoids shell expansion, it still introduces new code that runs external processes and handles API credentials.

Overview
Introduces a new skills/autobrowse package that adds the /autobrowse Claude Code skill for iterating on website-specific strategy.md files and optionally running multiple tasks in parallel via sub-agents.

Adds an inner-agent runner (scripts/evaluate.mjs) that calls the Anthropic API, executes only browse CLI commands (no shell), and records per-run artifacts (summary.md, trace.json, messages.json, screenshots) under traces/<task>/ with a latest symlink, plus supporting docs/templates (README.md, REFERENCE.md, EXAMPLES.md, references/) and Node deps/env examples.

^{Reviewed by Cursor Bugbot for commit 70d9ad0. Bugbot is set up for automated code reviews on this repo. Configure here.}

Auto-research loop for building reliable browser navigation skills. Inner agent browses the site, outer agent reads the trace and improves strategy.md. Repeat until it passes consistently. - SKILL.md: /autobrowse skill (single + parallel task modes via sub-agents) - scripts/evaluate.ts: inner agent harness (Anthropic API + browse CLI) - references/: example-task.md and example-skill.md templates - README, EXAMPLES, REFERENCE docs Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

17 iterations, 3/3 consecutive passes, remote env. Shows real site-specific gotchas: named Browserbase sessions, ESRI map location, Verint form patterns, XPath vs ref tradeoffs. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- structured skill.md graduation template (not just copying strategy.md) - graduate at max iterations too, not just on pass rate - persistent session reports in reports/ with per-iteration cost table - richer sub-agent prompt with key learnings output - cost column in multi-task report table Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

skills/autobrowse/scripts/evaluate.mjs

skills/autobrowse/package.json

skills/autobrowse/tsconfig.json

skills/autobrowse/scripts/evaluate.ts

- remove --session flags (skills must be portable, not session-coupled) - add Known Failure Point section (turn budget exhaustion pattern) - correct wait syntax docs (browse wait load / browse wait timeout N) - 22 iterations, updated gotchas order Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

skills/autobrowse/package.json

- [high] block shell metacharacters (;|&|`$()<>) to prevent injection bypass - [medium] fix npm evaluate script path: evaluate.ts → scripts/evaluate.ts - [low] fix tsconfig include: *.ts → scripts/**/*.ts - [low] lastAssistantText += (accumulate, not overwrite) - [low] remove unused deps: @browserbasehq/stagehand, zod Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

skills/autobrowse/scripts/evaluate.ts

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Autofix Details

Bugbot Autofix prepared fixes for both issues found in the latest run.

✅ Fixed: Opus 4.6 pricing rates are 3x too high
- Updated the pricing table to use Claude Opus 4.6 at 5/25 and Haiku 4.5 at 1/5 dollars per million tokens.
✅ Fixed: Broken symlink silently prevents latest link updates
- Reworked latest-link rotation to always attempt unlink with ENOENT-only suppression before creating the symlink and warn on failures.

Or push these changes by commenting:

@cursor push b8ac79b21f

Preview (b8ac79b21f)

diff --git a/skills/autobrowse/scripts/evaluate.ts b/skills/autobrowse/scripts/evaluate.ts
--- a/skills/autobrowse/scripts/evaluate.ts
+++ b/skills/autobrowse/scripts/evaluate.ts
@@ -483,9 +483,9 @@
   const durationSec = (Date.now() - startTime) / 1000;
   // Pricing per million tokens (input/output)
   const pricing: Record<string, [number, number]> = {
-    "claude-opus-4-6": [15, 75],
+    "claude-opus-4-6": [5, 25],
     "claude-sonnet-4-6": [3, 15],
-    "claude-haiku-4-5-20251001": [0.80, 4],
+    "claude-haiku-4-5-20251001": [1, 5],
   };
   const [inputRate, outputRate] = pricing[model] ?? [3, 15];
   const costUsd = (totalInputTokens * inputRate + totalOutputTokens * outputRate) / 1_000_000;
@@ -534,7 +534,16 @@
 
   // Update latest symlink
   const latestLink = path.join(tracesDir, "latest");
-  try { if (fs.existsSync(latestLink)) fs.unlinkSync(latestLink); fs.symlinkSync(runId, latestLink); } catch {}
+  try {
+    try {
+      fs.unlinkSync(latestLink);
+    } catch (err: unknown) {
+      if ((err as NodeJS.ErrnoException).code !== "ENOENT") throw err;
+    }
+    fs.symlinkSync(runId, latestLink);
+  } catch (err: unknown) {
+    console.warn(`Warning: failed to update latest symlink: ${(err as Error).message}`);
+  }
 
   console.log(`\n${summary}`);
   console.log(`\n${"=".repeat(60)}`);

_{This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.}

skills/autobrowse/scripts/evaluate.ts

cursor · 2026-04-10T00:27:48Z

skills/autobrowse/scripts/evaluate.mjs

+      }
+    }
+
+    lastAssistantText = assistantText;


Agent output lost when last turn lacks text

Medium Severity

lastAssistantText is unconditionally overwritten with assistantText on every turn, even when assistantText is empty. If the agent exhausts MAX_TURNS and the final API response contains only tool_use blocks (no text blocks), lastAssistantText becomes "", wiping out any meaningful text the agent produced on prior turns. The summary then omits the "Agent Final Output" section entirely, which the outer agent depends on to understand the inner agent's result.

Additional Locations (1)

skills/autobrowse/scripts/evaluate.ts#L524-L527

^{Reviewed by Cursor Bugbot for commit ef21429. Configure here.}

cursor · 2026-04-10T00:27:48Z

skills/autobrowse/scripts/evaluate.mjs

+    for (const block of response.content) {
+      if (block.type === "text") {
+        reasoningText += block.text;
+        assistantText += block.text;


Redundant identical variables obscure intent

Low Severity

reasoningText and assistantText are populated by the exact same text blocks in the same loop iteration and always hold identical values. The distinct names imply a semantic difference that doesn't exist, which could mislead future contributors into updating one but not the other.

^{Reviewed by Cursor Bugbot for commit ef21429. Configure here.}

Captures full pipeline from storytelling to render: - IDEA ENGINE narrative structure - Remotion gotchas (extrapolateLeft, staticFile, durationInFrames) - Browserbase brand colors and typography - UI mocking patterns with real reference video analysis - Audio production (background, SFX, ffmpeg generation) - Content authenticity (real failures vs invented ones) - Render quality settings (CRF 8 for Twitter masters) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

This reverts commit 07e7794.

cursor · 2026-04-10T01:49:12Z

skills/autobrowse/scripts/evaluate.mjs

+        console.log(`  [${turn}] ❌ error: ${output.slice(0, 100)}`);
+      } else {
+        console.log(`  [${turn}] ✓ ${output.slice(0, 100)} (${duration_ms}ms)`);
+      }


Snapshot/screenshot errors masked in summary and logs

Medium Severity

The if-else chain checks isSnapshot/isScreenshot before checking error, so a failed browse snapshot is reported as "📸 snapshot: 0 refs" instead of "❌ error: ...". The same pattern is repeated in the summary.md generation. Since the outer agent reads the summary to diagnose failures and improve strategy.md, a dead session error like "No page available" being misreported as "0 refs" leads it to the wrong hypothesis, wasting improvement iterations.

Additional Locations (1)

skills/autobrowse/scripts/evaluate.ts#L510-L521

^{Reviewed by Cursor Bugbot for commit 0734f7d. Configure here.}

….io conventions - Converted scripts/evaluate.ts → scripts/evaluate.mjs (plain ESM JavaScript) - Removed tsx, typescript, @types/node devDependencies - Removed tsconfig.json (no longer needed) - Added --help flag with full usage documentation - Moved diagnostics to stderr, structured JSON result to stdout - Added license field to SKILL.md frontmatter - Updated all references from tsx/evaluate.ts to node/evaluate.mjs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 4 total unresolved issues (including 3 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit a0d48ee. Configure here.}

cursor · 2026-04-10T16:45:39Z

skills/autobrowse/scripts/evaluate.mjs

+  const result = {
+    task: taskName,
+    run: runId,
+    status: turn < MAX_TURNS ? "completed" : "max_turns",


Status misreported when agent completes on final turn

Medium Severity

The status field is computed as turn < MAX_TURNS ? "completed" : "max_turns", but turn equals MAX_TURNS both when the agent successfully completes on the final turn (via break) and when it exhausts all turns without completing. If the agent finishes with stop_reason === "end_turn" on turn 30, the status is incorrectly reported as "max_turns" instead of "completed". The outer agent uses this status to decide pass/fail, so a successful run can be misclassified as an incomplete one.

Additional Locations (1)

skills/autobrowse/scripts/evaluate.mjs#L387-L393

^{Reviewed by Cursor Bugbot for commit a0d48ee. Configure here.}

Graduated skills now install as Claude Code slash commands at ~/.claude/skills/<task-name>/SKILL.md instead of committing a local skill.md file. Removed all git commit references from the loop — the working directory (tasks/, traces/, strategy.md) doesn't need to be a git repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

shubh24 and others added 3 commits April 6, 2026 18:12

cursor bot reviewed Apr 7, 2026

View reviewed changes

skills/autobrowse/scripts/evaluate.mjs Show resolved Hide resolved

skills/autobrowse/package.json Outdated Show resolved Hide resolved

skills/autobrowse/tsconfig.json Outdated Show resolved Hide resolved

skills/autobrowse/scripts/evaluate.ts Outdated Show resolved Hide resolved

cursor bot reviewed Apr 7, 2026

View reviewed changes

skills/autobrowse/package.json Outdated Show resolved Hide resolved

shubh24 requested a review from shrey150 April 7, 2026 01:27

cursor bot reviewed Apr 7, 2026

View reviewed changes

skills/autobrowse/scripts/evaluate.ts Outdated Show resolved Hide resolved

skills/autobrowse/scripts/evaluate.ts Outdated Show resolved Hide resolved

fix autobrowse bugbot findings

982eecd

cursor bot reviewed Apr 9, 2026

View reviewed changes

skills/autobrowse/scripts/evaluate.ts Outdated Show resolved Hide resolved

skills/autobrowse/scripts/evaluate.ts Outdated Show resolved Hide resolved

fix autobrowse pricing and symlink handling

ef21429

cursor bot reviewed Apr 10, 2026

View reviewed changes

shubh24 and others added 2 commits April 9, 2026 18:37

Revert "feat: add demo-video skill — Remotion launch video learnings"

0734f7d

This reverts commit 07e7794.

cursor bot reviewed Apr 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: autobrowse skill — self-improving browser automation#71

feat: autobrowse skill — self-improving browser automation#71
shubh24 wants to merge 11 commits intomainfrom
feat/autobrowse-skill

shubh24 commented Apr 7, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

cursor bot Apr 10, 2026

Uh oh!

cursor bot Apr 10, 2026

Uh oh!

cursor bot Apr 10, 2026

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shubh24 commented Apr 7, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

How it works

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cursor bot Apr 10, 2026

Choose a reason for hiding this comment

Agent output lost when last turn lacks text

Uh oh!

cursor bot Apr 10, 2026

Choose a reason for hiding this comment

Redundant identical variables obscure intent

Uh oh!

cursor bot Apr 10, 2026

Choose a reason for hiding this comment

Snapshot/screenshot errors masked in summary and logs

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Apr 10, 2026

Choose a reason for hiding this comment

Status misreported when agent completes on final turn

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shubh24 commented Apr 7, 2026 •

edited by cursor bot

Loading

cursor bot left a comment •

edited

Loading