Add data-extractor skill for structured web scraping by aq17 · Pull Request #66 · browserbase/skills

aq17 · 2026-04-02T19:45:16Z

Summary

Adds a new data-extractor skill that teaches agents how to extract structured JSON from websites using browse CLI primitives (eval, get text, snapshot)
Covers 5 extraction patterns: single-page, list-page, paginated, search-then-extract, and authenticated extraction
Includes SKILL.md, REFERENCE.md (pattern specs, selector strategies, error reference), and EXAMPLES.md (7 worked examples)
Registers the skill in marketplace.json and adds it to README.md

Why

Over the last 30 days, 6+ customers (Vivian Health, Drata, Rippling, Zogo, Freshsauce, HappyStack) are all trying to extract structured data from websites. The existing browser skill only shows basic browse get text "body". This skill fills the gap by teaching the browse eval "JSON.stringify(...)" pattern and pagination workflows.

The browse CLI stays LLM-free and deterministic — the agent is the intelligence that reads page structure and decides what to extract.

Stress tested against

Hacker News (list extraction with querySelectorAll + .map())
GitHub Trending (list extraction with varied selectors)
Wikipedia (table extraction with IIFE headers-as-keys pattern)
Single-field extraction with browse get text

Test plan

Verify SKILL.md frontmatter parses correctly
Verify marketplace.json is valid JSON
Run Example 1 (single-page extract) against a real URL
Run Example 3 (paginated extract) to test pagination flow
Run Example 7 (table extract) against a Wikipedia table
Confirm skill appears in Claude Code when installed

🤖 Generated with Claude Code

Note

Low Risk
Low risk: adds new documentation-only data-extractor skill content and registers it in the marketplace/README without changing runtime code paths.

Overview
Adds a new data-extractor skill (docs + examples) that guides agents to extract structured JSON from web pages using browse primitives (notably eval/JSON.stringify, snapshot, waiting, and pagination patterns).

Registers the skill in .claude-plugin/marketplace.json under the browse plugin and lists it in README.md, including new SKILL.md, REFERENCE.md, EXAMPLES.md, and an MIT LICENSE.txt.

^{Written by Cursor Bugbot for commit 119df5b. This will update automatically on new commits. Configure here.}

Teaches agents how to extract structured JSON from websites using browse CLI primitives (eval, get text, snapshot). Covers five patterns: single-page, list-page, paginated, search-then-extract, and authenticated extraction — mapped to top customer use cases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

aq17 force-pushed the add-data-extractor-skill branch from 119df5b to e4283b0 Compare April 2, 2026 20:01

aq17 requested a review from shubh24 April 2, 2026 20:22

aq17 closed this Apr 3, 2026

aq17 reopened this Apr 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data-extractor skill for structured web scraping#66

Add data-extractor skill for structured web scraping#66
aq17 wants to merge 1 commit intomainfrom
add-data-extractor-skill

aq17 commented Apr 2, 2026 •

edited by cursor bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aq17 commented Apr 2, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Stress tested against

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

aq17 commented Apr 2, 2026 •

edited by cursor bot

Loading