feat: autobrowse skill — self-improving browser automation#71
feat: autobrowse skill — self-improving browser automation#71
Conversation
Auto-research loop for building reliable browser navigation skills. Inner agent browses the site, outer agent reads the trace and improves strategy.md. Repeat until it passes consistently. - SKILL.md: /autobrowse skill (single + parallel task modes via sub-agents) - scripts/evaluate.ts: inner agent harness (Anthropic API + browse CLI) - references/: example-task.md and example-skill.md templates - README, EXAMPLES, REFERENCE docs Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
17 iterations, 3/3 consecutive passes, remote env. Shows real site-specific gotchas: named Browserbase sessions, ESRI map location, Verint form patterns, XPath vs ref tradeoffs. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- structured skill.md graduation template (not just copying strategy.md) - graduate at max iterations too, not just on pass rate - persistent session reports in reports/ with per-iteration cost table - richer sub-agent prompt with key learnings output - cost column in multi-task report table Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- remove --session flags (skills must be portable, not session-coupled) - add Known Failure Point section (turn budget exhaustion pattern) - correct wait syntax docs (browse wait load / browse wait timeout N) - 22 iterations, updated gotchas order Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- [high] block shell metacharacters (;|&|`$()<>) to prevent injection bypass - [medium] fix npm evaluate script path: evaluate.ts → scripts/evaluate.ts - [low] fix tsconfig include: *.ts → scripts/**/*.ts - [low] lastAssistantText += (accumulate, not overwrite) - [low] remove unused deps: @browserbasehq/stagehand, zod Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Autofix Details
Bugbot Autofix prepared fixes for both issues found in the latest run.
- ✅ Fixed: Opus 4.6 pricing rates are 3x too high
- Updated the pricing table to use Claude Opus 4.6 at 5/25 and Haiku 4.5 at 1/5 dollars per million tokens.
- ✅ Fixed: Broken symlink silently prevents
latestlink updates- Reworked latest-link rotation to always attempt unlink with ENOENT-only suppression before creating the symlink and warn on failures.
Or push these changes by commenting:
@cursor push b8ac79b21f
Preview (b8ac79b21f)
diff --git a/skills/autobrowse/scripts/evaluate.ts b/skills/autobrowse/scripts/evaluate.ts
--- a/skills/autobrowse/scripts/evaluate.ts
+++ b/skills/autobrowse/scripts/evaluate.ts
@@ -483,9 +483,9 @@
const durationSec = (Date.now() - startTime) / 1000;
// Pricing per million tokens (input/output)
const pricing: Record<string, [number, number]> = {
- "claude-opus-4-6": [15, 75],
+ "claude-opus-4-6": [5, 25],
"claude-sonnet-4-6": [3, 15],
- "claude-haiku-4-5-20251001": [0.80, 4],
+ "claude-haiku-4-5-20251001": [1, 5],
};
const [inputRate, outputRate] = pricing[model] ?? [3, 15];
const costUsd = (totalInputTokens * inputRate + totalOutputTokens * outputRate) / 1_000_000;
@@ -534,7 +534,16 @@
// Update latest symlink
const latestLink = path.join(tracesDir, "latest");
- try { if (fs.existsSync(latestLink)) fs.unlinkSync(latestLink); fs.symlinkSync(runId, latestLink); } catch {}
+ try {
+ try {
+ fs.unlinkSync(latestLink);
+ } catch (err: unknown) {
+ if ((err as NodeJS.ErrnoException).code !== "ENOENT") throw err;
+ }
+ fs.symlinkSync(runId, latestLink);
+ } catch (err: unknown) {
+ console.warn(`Warning: failed to update latest symlink: ${(err as Error).message}`);
+ }
console.log(`\n${summary}`);
console.log(`\n${"=".repeat(60)}`);This Bugbot Autofix run was free. To enable autofix for future PRs, go to the Cursor dashboard.
| } | ||
| } | ||
|
|
||
| lastAssistantText = assistantText; |
There was a problem hiding this comment.
Agent output lost when last turn lacks text
Medium Severity
lastAssistantText is unconditionally overwritten with assistantText on every turn, even when assistantText is empty. If the agent exhausts MAX_TURNS and the final API response contains only tool_use blocks (no text blocks), lastAssistantText becomes "", wiping out any meaningful text the agent produced on prior turns. The summary then omits the "Agent Final Output" section entirely, which the outer agent depends on to understand the inner agent's result.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit ef21429. Configure here.
| for (const block of response.content) { | ||
| if (block.type === "text") { | ||
| reasoningText += block.text; | ||
| assistantText += block.text; |
There was a problem hiding this comment.
Redundant identical variables obscure intent
Low Severity
reasoningText and assistantText are populated by the exact same text blocks in the same loop iteration and always hold identical values. The distinct names imply a semantic difference that doesn't exist, which could mislead future contributors into updating one but not the other.
Reviewed by Cursor Bugbot for commit ef21429. Configure here.
Captures full pipeline from storytelling to render: - IDEA ENGINE narrative structure - Remotion gotchas (extrapolateLeft, staticFile, durationInFrames) - Browserbase brand colors and typography - UI mocking patterns with real reference video analysis - Audio production (background, SFX, ffmpeg generation) - Content authenticity (real failures vs invented ones) - Render quality settings (CRF 8 for Twitter masters) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
This reverts commit 07e7794.
| console.log(` [${turn}] ❌ error: ${output.slice(0, 100)}`); | ||
| } else { | ||
| console.log(` [${turn}] ✓ ${output.slice(0, 100)} (${duration_ms}ms)`); | ||
| } |
There was a problem hiding this comment.
Snapshot/screenshot errors masked in summary and logs
Medium Severity
The if-else chain checks isSnapshot/isScreenshot before checking error, so a failed browse snapshot is reported as "📸 snapshot: 0 refs" instead of "❌ error: ...". The same pattern is repeated in the summary.md generation. Since the outer agent reads the summary to diagnose failures and improve strategy.md, a dead session error like "No page available" being misreported as "0 refs" leads it to the wrong hypothesis, wasting improvement iterations.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit 0734f7d. Configure here.
….io conventions - Converted scripts/evaluate.ts → scripts/evaluate.mjs (plain ESM JavaScript) - Removed tsx, typescript, @types/node devDependencies - Removed tsconfig.json (no longer needed) - Added --help flag with full usage documentation - Moved diagnostics to stderr, structured JSON result to stdout - Added license field to SKILL.md frontmatter - Updated all references from tsx/evaluate.ts to node/evaluate.mjs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 4 total unresolved issues (including 3 from previous reviews).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit a0d48ee. Configure here.
| const result = { | ||
| task: taskName, | ||
| run: runId, | ||
| status: turn < MAX_TURNS ? "completed" : "max_turns", |
There was a problem hiding this comment.
Status misreported when agent completes on final turn
Medium Severity
The status field is computed as turn < MAX_TURNS ? "completed" : "max_turns", but turn equals MAX_TURNS both when the agent successfully completes on the final turn (via break) and when it exhausts all turns without completing. If the agent finishes with stop_reason === "end_turn" on turn 30, the status is incorrectly reported as "max_turns" instead of "completed". The outer agent uses this status to decide pass/fail, so a successful run can be misclassified as an incomplete one.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit a0d48ee. Configure here.
Graduated skills now install as Claude Code slash commands at ~/.claude/skills/<task-name>/SKILL.md instead of committing a local skill.md file. Removed all git commit references from the loop — the working directory (tasks/, traces/, strategy.md) doesn't need to be a git repo. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>



Summary
autobrowseskill to the skills librarystrategy.md→ repeatWhat's included
SKILL.md/autobrowseskill — single entry point for everythingscripts/evaluate.tsreferences/example-task.mdtask.mdreferences/example-skill.mdskill.mdREADME.mdEXAMPLES.mdREFERENCE.mdHow it works
Customer creates
tasks/<name>/task.mdin their project, runs the skill, gets back askill.mdthey can drop into any stagehand/browser-use agent.🤖 Generated with Claude Code
Note
Medium Risk
Adds a new Node-based harness that executes
browseCLI commands and writes traces to disk; while it restricts execution tobrowseand avoids shell expansion, it still introduces new code that runs external processes and handles API credentials.Overview
Introduces a new
skills/autobrowsepackage that adds the/autobrowseClaude Code skill for iterating on website-specificstrategy.mdfiles and optionally running multiple tasks in parallel via sub-agents.Adds an inner-agent runner (
scripts/evaluate.mjs) that calls the Anthropic API, executes onlybrowseCLI commands (no shell), and records per-run artifacts (summary.md,trace.json,messages.json, screenshots) undertraces/<task>/with alatestsymlink, plus supporting docs/templates (README.md,REFERENCE.md,EXAMPLES.md,references/) and Node deps/env examples.Reviewed by Cursor Bugbot for commit 70d9ad0. Bugbot is set up for automated code reviews on this repo. Configure here.