Skip to content

naveedulislam/webcure

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WebCure

Browser automation for AI agents and developers. Search the web, test browser-based applications, and record automation scripts — all from your editor.

WebCure is a hybrid VS Code extension that combines two separate browser automation approaches into one package:

  1. Language Model Tools — 28 tools registered with VS Code's vscode.lm.registerTool() API so GitHub Copilot can control the browser directly in chat.
  2. File Bridge + CLI — A file-based protocol (.webcure/input.jsonoutput.json) with a CLI wrapper so AI agents in Cursor, Antigravity, or any terminal-capable IDE can control the same browser engine via shell commands.

Both approaches share the same Playwright-based browser engine. Whether Copilot invokes explorer_navigate or a Cursor agent runs node .webcure/cli.js navigate url=..., the same underlying code executes.


Table of Contents


Architecture Overview

┌─────────────────────────────────────────────────────────────────────┐
│                          WebCure Extension                          │
│                                                                     │
│  ┌──────────────────────┐   ┌──────────────────────┐                │
│  │ Language Model Tools │   │ File Bridge + CLI    │                │
│  │ (VS Code Copilot)    │   │ (Cursor / Agents)    │                │
│  │                      │   │                      │                │
│  │ 28 explorer_* tools  │   │ .webcure/input.json  │                │
│  │ registered via       │   │ .webcure/output.json │                │
│  │ vscode.lm API        │   │ .webcure/cli.js      │                │
│  └──────────┬───────────┘   └──────────┬───────────┘                │
│             │                          │                            │
│             ▼                          ▼                            │
│  ┌───────────────────────────────────────────────┐                  │
│  │       Shared Tool Instances (28 tools)        │                  │
│  └───────────────────────┬───────────────────────┘                  │
│                          │                                          │
│                          ▼                                          │
│  ┌───────────────────────────────────────────────┐                  │
│  │       BrowserManager (Playwright-core)        │                  │
│  │          Uses system Chrome or Edge           │                  │
│  └───────────────────────────────────────────────┘                  │
│                                                                     │
│  ┌─────────────────-┐  ┌─────────────────────┐  ┌─────────────────┐ │
│  │ HTTP API Server  │  │ API Script Recorder │  │ Browser Session │ │
│  │ (port 5678)      │  │ (Legacy)            │  │ Recorder        │ │
│  └────────────────-─┘  └─────────────────────┘  └─────────────────┘ │
└─────────────────────────────────────────────────────────────────────┘

Key points:

  • The 28 Language Model Tool classes are the core engine. They implement vscode.LanguageModelTool.
  • The file bridge handles 61 command names — 38 map to the 28 tool instances, plus 23 bridge-only commands for scrolling, recording, etc.
  • Commands like scrollDown, doubleClick, rightClick, launchBrowser, getPageText, highlight exist only in the file bridge — they have no LM tool equivalent because they are simple Playwright calls that don't need the full tool infrastructure.
  • The HTTP API server provides a third access path to the same tools via POST /invoke.

Prerequisites

  • Node.js 18+ — check with node --version
  • npm — comes with Node.js
  • Google Chrome or Microsoft Edge — WebCure uses playwright-core which connects to your system browser (it does not download Chromium automatically)
  • VS Code 1.95+ — for Language Model Tools support (requires Copilot)

Installation (Step by Step)

Step 1: Get the source code

cd ~/Developer
git clone https://github.com/naveedulislam/webcure.git webcure

Or if you already have the source:

cd ~/Developer/webcure

Step 2: Install dependencies

npm install

Step 3: Compile the TypeScript

npm run compile

This runs tsc and outputs JavaScript to the out/ directory.

Step 4: Package into a .vsix file

npm run package

This produces webcure-<version>.vsix (e.g. webcure-1.0.0.vsix) in the project root. The version comes from package.json.

Step 5: Install in your editor

VS Code (command line):

code --install-extension webcure-1.0.0.vsix

VS Code (graphical):

  1. Open VS Code
  2. Press Cmd+Shift+P (macOS) or Ctrl+Shift+P (Windows/Linux)
  3. Type Extensions: Install from VSIX...
  4. Navigate to ~/Developer/webcure/
  5. Select webcure-1.0.0.vsix
  6. Restart VS Code when prompted

Cursor (command line):

cursor --install-extension webcure-1.0.0.vsix

Cursor (graphical):

  1. Open Cursor
  2. Press Cmd+Shift+P
  3. Type Extensions: Install from VSIX...
  4. Select webcure-1.0.0.vsix
  5. Restart Cursor

Step 6: Verify the installation

  1. Open the Command Palette: Cmd+Shift+P
  2. Type WebCure
  3. You should see all WebCure commands listed (Navigate to URL, Click Element, etc.)

Quick Start: VS Code Copilot (Language Model Tools)

Once the extension is installed and VS Code has restarted, the 28 explorer_* tools are automatically registered with VS Code's Language Model API. GitHub Copilot can use them directly in chat.

Example conversation:

You: Navigate to https://example.com and take a screenshot
Copilot: (uses explorer_navigate then explorer_take_screenshot automatically)
Done — screenshot saved to screenshot.png

You: Find the login form data, fill in the email and password fields
Copilot: (uses explorer_snapshot, explorer_fill_form)

You can also reference tools explicitly by typing # in chat:

#explorer_navigate to https://news.ycombinator.com
#explorer_snapshot to see the page structure
#explorer_click on "new" link

No configuration required. The tools are available as soon as the extension activates (onStartupFinished).


Quick Start: Cursor / AI Agents (File Bridge + CLI)

When the extension activates, it creates a .webcure/ directory in your workspace root containing:

  • cli.js — CLI helper that writes input.json and polls for output.json
  • input.json — Written by the agent, read by the extension
  • output.json — Written by the extension, read by the agent

How It Works

  1. Agent runs: node .webcure/cli.js navigate url=https://example.com
  2. cli.js writes {"command": "navigate", "args": {"url": "https://example.com"}} to .webcure/input.json
  3. The extension detects the file via fs.watch, executes the command, writes the result to .webcure/output.json, and deletes input.json
  4. cli.js polls for output.json, reads it, prints the result, and deletes it

CLI Examples

# Launch a browser and navigate
node .webcure/cli.js launchBrowser url=https://example.com

# Click by visible text
node .webcure/cli.js click target="Sign In"

# Click with spatial targeting (click "Edit" below "Profile")
node .webcure/cli.js click target="Edit" below="Profile"

# Type into a field
node .webcure/cli.js typeText text="hello" into="Search"

# Take an accessibility snapshot (assigns refs e1, e2, ...)
node .webcure/cli.js snapshot

# Find an element (returns a ref for later use)
node .webcure/cli.js find text="Submit"

# Interact using a ref from snapshot or find
node .webcure/cli.js interact action=click ref=e3

# Fill a form using refs
node .webcure/cli.js fillForm 'fields=[{"name":"Email","type":"textbox","ref":"e3","value":"test@example.com"}]'

# Scroll the page
node .webcure/cli.js scrollDown pixels=500

# Take a screenshot
node .webcure/cli.js screenshot filename=result.png

# Get page text content
node .webcure/cli.js getPageText

# Scrape menu structure
node .webcure/cli.js scrapeMenu

# Scrape page structure (forms, tables)
node .webcure/cli.js scrapePage

# Close the browser
node .webcure/cli.js closeBrowser

# See all available commands
node .webcure/cli.js help

Can LM Tools Be Used Through input.json?

Yes. The file bridge routes commands to the same 28 tool instances that Copilot uses. When you run node .webcure/cli.js navigate url=..., the bridge internally calls NavigateTool.invoke() — the exact same code path as when Copilot uses explorer_navigate. The bridge command names are just friendlier aliases (e.g., navigate instead of explorer_navigate, click instead of explorer_click).


Language Model Tools Reference

These 28 tools are registered with VS Code's vscode.lm.registerTool() API. Copilot invokes them automatically based on user requests.

Tool Name Display Name Description
explorer_navigate Navigate to URL Open a URL in the browser
explorer_resize Resize Window Resize viewport (preset fullscreen or custom width/height)
explorer_extract Extract Content Extract visible text from the page or a CSS selector
explorer_click Click Element Click by ref, text, or selector with spatial targeting
explorer_hover Hover Element Hover over an element
explorer_type Type Text Type into an input field
explorer_type_from_file Type Text from File Type large text content from a file into an input
explorer_wait_for Wait For Wait for text to appear/disappear or a fixed time
explorer_wait_for_element Wait For Element Wait for an element to be visible/hidden/attached/detached
explorer_select_option Select Option Pick a value from a dropdown
explorer_fill_form Fill Form Fill multiple form fields using refs from snapshot
explorer_take_screenshot Take Screenshot Capture screenshot (full page or specific element)
explorer_close Close Browser Close the page and release resources
explorer_console_messages Get Console Messages Return recent browser console messages
explorer_drag Drag And Drop Drag from one element to another using refs
explorer_evaluate Evaluate Script Run JavaScript in the page context
explorer_file_upload Upload File Upload files via file input or file chooser
explorer_handle_dialog Handle Dialog Accept or dismiss alert/confirm/prompt dialogs
explorer_navigate_back Navigate Back Go back in browser history
explorer_network_requests Get Network Requests Return observed network requests
explorer_press_key Press Key Press a keyboard key (Enter, Tab, etc.)
explorer_snapshot Accessibility Snapshot Capture accessibility tree with element refs (e1, e2, ...)
explorer_tabs Manage Tabs List, create, close, or select browser tabs
explorer_install Install Browser No-op (uses system Chrome/Edge)
explorer_find Find Element Find elements by text, position, or selector — returns a ref
explorer_interact Interact with Element Multi-action: click, type, hover, clear, select, focus, check, uncheck
explorer_scrape_menu Scrape Menu Structure Extract hierarchical navigation menus as JSON
explorer_scrape_page Scrape Page Content Extract forms, tables, and filters as structured JSON

File Bridge Commands Reference

The file bridge accepts all of the above tool commands (by alias) plus additional bridge-only commands:

Commands That Map to LM Tools

These bridge commands invoke the same tool code as the LM tools:

Bridge Command Maps To LM Tool Notes
navigate explorer_navigate url arg (or target as alias)
click explorer_click target arg maps to text
hover explorer_hover Same spatial targeting (above, below, etc.)
typeText explorer_type into arg maps to field identifier
typeFromFile explorer_type_from_file
pressKey explorer_press_key key or target arg
selectOption explorer_select_option comboBox + value args
fillForm explorer_fill_form
screenshot explorer_take_screenshot filename or outputPath args
consoleMessages explorer_console_messages
networkRequests explorer_network_requests
handleDialog explorer_handle_dialog
uploadFile explorer_file_upload
evaluate explorer_evaluate expression arg maps to function
navigateBack explorer_navigate_back
goBack explorer_navigate_back Alias for navigateBack
snapshot explorer_snapshot
find explorer_find
interact explorer_interact
scrapeMenu explorer_scrape_menu
scrapePage explorer_scrape_page
drag / dragTo explorer_drag source + target args
close explorer_close
closeBrowser explorer_close Alias
tabs explorer_tabs
listTabs explorer_tabs (list)
newTab explorer_tabs (new)
closeTab explorer_tabs (close)
selectTab explorer_tabs (select)
waitForText explorer_wait_for
waitForElement explorer_wait_for_element
wait explorer_wait_for ms or time arg
resize explorer_resize
resizeBrowser explorer_resize Alias
fullscreenBrowser explorer_resize (fullscreen)
extract explorer_extract

Bridge-Only Commands (No LM Tool Equivalent)

These commands exist only in the file bridge and are implemented directly against BrowserManager / Playwright:

Bridge Command Description
launchBrowser Open a browser window (optionally navigate to a URL)
scrollDown Scroll down (default 500px, configurable via pixels arg)
scrollUp Scroll up
scrollRight Scroll right (default 300px)
scrollLeft Scroll left
doubleClick Double-click an element by text or ref
rightClick Right-click an element by text or ref
refresh Reload the current page
goForward Go forward in browser history
switchWindow Switch to a tab by title match
getPageInfo Get current URL and page title
getPageContent Get raw HTML content (truncated to 50KB)
getPageText Get visible text content (truncated to 50KB)
getAccessibilityTree Get accessibility tree (delegates to snapshot tool)
highlight Visually outline an element on the page
getDialogText Get text from the current dialog
startRecording Begin recording API-based actions (legacy)
stopRecording Stop API recording and return generated Python script
startStepRecorder Start browser session recording (optionally with url)
stopStepRecorder Stop browser session and generate output
restartExtensionHost Restart VS Code extension host (useful after VSIX install)
runScript Execute a JSON automation script

How LM Tools and Bridge Commands Relate

Are the commands duplicated?

No. There is one set of 28 tool classes (in tools.ts). The LM tool registration and the bridge both point to the same instances:

  • LM Path: Copilot → vscode.lm.registerTool('explorer_navigate', NavigateTool)NavigateTool.invoke()
  • Bridge Path: CLI → input.json → bridge → BRIDGE_TO_TOOL['navigate'] → toolInstances.navigate.invoke()

The bridge adds convenience aliases (e.g., goBacknavigateBack, closeBrowserclose) and bridge-only commands (scrolling, page info, recording) that don't need the full LM tool interface.

The Command Palette exposes 41 VS Code commands total. Of these, 27 (e.g., webcure.testNavigate) are test harnesses — they prompt for input via vscode.window.showInputBox and invoke the same tool instances, displaying results in the Output panel. The remaining commands handle recording, assertions, the API server, and script execution.


VS Code Commands (Command Palette)

All commands are accessible via Cmd+Shift+P (macOS) / Ctrl+Shift+P (Windows/Linux) under the WebCure category:

Command What it does
WebCure: Navigate to URL Prompts for a URL and navigates
WebCure: Click Element Prompts for text/selector and clicks
WebCure: Hover Element Hover over an element
WebCure: Type Text Type into a field
WebCure: Type Text from File Type file contents into a field
WebCure: Press Key Press a keyboard key
WebCure: Take Screenshot Save a screenshot
WebCure: Take Snapshot Capture accessibility tree with refs
WebCure: Find Element Find element by text/selector
WebCure: Interact with Element Multi-action on an element by ref
WebCure: Select Option Pick a dropdown value
WebCure: Fill Form Fill multiple form fields
WebCure: Drag Element Drag and drop
WebCure: Evaluate JavaScript Run JS in page context
WebCure: Extract Text Content Extract visible text
WebCure: Wait For Text Wait for text on page
WebCure: Wait For Element Wait for element state
WebCure: Resize Browser Window Resize viewport
WebCure: Navigate Back Go back in history
WebCure: Manage Tabs List/create/close/select tabs
WebCure: Close Browser Close the browser
WebCure: Get Console Messages Show browser console output
WebCure: Get Network Requests Show observed network requests
WebCure: Handle Dialog Accept/dismiss a dialog
WebCure: Upload File Upload files
WebCure: Scrape Menu/Navigation Extract menu structure
WebCure: Scrape Page Structure Extract forms/tables
WebCure: Tools Menu Quick-pick menu of all tools
WebCure: Start API Server Start the HTTP API on port 5678
WebCure: Stop API Server Stop the HTTP API
WebCure: Record API Script (Legacy) Begin recording API-based browser actions
WebCure: Stop API Script Recording Stop and generate Python script
WebCure: Record Browser Session Start session recording (choose output)
WebCure: Stop Browser Session Stop session recording and save output
WebCure: Insert Wait Step Insert a timed pause at a specific point
WebCure: Assert: Element Pick assertion type, then click target element (Cmd+Shift+A)
WebCure: Assert: Page Title Assert current page title matches
WebCure: Assert: Page URL Assert current page URL (exact or contains)
WebCure: Assert: Element Count Click element, then assert how many matching elements exist
WebCure: Assert: Full Page Snapshot Assert page body contains specific text
WebCure: Run Script Execute a JSON automation script

HTTP API Server

WebCure includes an HTTP API server for programmatic access from external scripts or tools.

Enable via settings (webcure.api.enabled: true) or start manually:

  • Cmd+Shift+PWebCure: Start API Server

Endpoints:

# Invoke any command
curl -X POST http://localhost:5678/invoke \
  -H "Content-Type: application/json" \
  -d '{"command": "navigate", "args": {"url": "https://example.com"}}'

# List available tools
curl http://localhost:5678/tools

# Health check
curl http://localhost:5678/health

Browser Session Recording

WebCure can record every user interaction in the browser and produce human-readable documentation and/or a Playwright Python test script. Choose from three output modes when you start a recording.

Recording Modes

Command Palette → WebCure: Record Browser Session presents a mode picker:

Mode What is produced
Markdown + Screenshots A timestamped folder containing Recording.md plus a screenshot for each step. Great for test documentation, bug reports, and onboarding guides.
Python Test Script A standalone Playwright Python script with self-healing locators. No screenshots folder is created.
Both Markdown with screenshots and a Python script, saved together in the same session folder.

How It Works

  1. Start recording: Command Palette → WebCure: Record Browser Session
  2. Choose output mode (Markdown / Python / Both)
  3. (Markdown/Both) Optionally enter a folder name (blank = auto-timestamp)
  4. (Python/Both) Optionally enter a script filename (default test_recording.py)
  5. (Python/Both) Optionally enter a default wait between steps in seconds (e.g. 1). If set, time.sleep(N) is added after each action in the generated script.
  6. Optionally enter an initial URL (defaults to https://demo.testfire.net)
  7. Interact normally — every click, form input, file upload, and Enter key press is captured
  8. Stop recording: Command Palette → WebCure: Stop Browser Session, or simply close the browser window

What Gets Captured

User Action Recorded As Screenshot
Click a link or button Clicked on button 'Login' Taken after the page reacts to the click
Click a dropdown trigger Clicked on button 'Options' Captured via deferred pointerdown
Click a menu item Selected 'Edit' from 'Options' dropdown ARIA-aware with trigger label lookup
Select a dropdown option Selected menu item 'Orange' Deferred pointerdown for Radix Select
Change a <select> Selected 'Active' from 'Status' Taken after change
Type into a field Typed 'admin' into 'Username' Taken immediately (shows the typed value)
Type a password Typed '********' into 'Password' Password value is masked
Press Enter Pressed 'Enter' on 'Search' Taken after the page reacts
Upload a file Uploaded file to 'Resume' (path captured via dialog) Taken after file selected
Navigate to URL Performed 'navigate' on 'Navigated to https://...' Taken after page load
Close the browser Performed 'close' on 'Browser window closed' No screenshot (browser is gone)
Insert Wait Step Wait 2 seconds No screenshot (no UI change)

Output Structure

Markdown + Screenshots and Both modes create a timestamped folder:

WebCure_Steps_2026-03-09_22-13-00/
├── Recording.md          # Markdown log with all steps
├── test_recording.py     # Python script (Both mode only)
├── step_1.png
├── step_2.png
└── ...

Python only mode writes a single timestamped script directly to the workspace root:

test_recording_2026-03-09_22-13-00.py

The Markdown file contains structured entries like:

### Step 3

**Action:** Typed 'admin' into 'Username'

![Screenshot for Step 3](./step_3.png)

The Python script uses self-healing locators that try multiple strategies in confidence order (test ID → ID → ARIA role → link text → visible text → name → CSS → XPath):

# Step 3: Typed 'admin' into 'Username'
self_healing_fill(page, [
    {"strategy": "id", "value": "uid", "confidence": 0.95},
    {"strategy": "ariaLabel", "value": "Username", "confidence": 0.85},
    {"strategy": "css", "value": "input[name='uid']", "confidence": 0.6},
], "admin")

Assertions

During a recording session, you can insert assertions to verify the state of the page or specific elements. Assertions are recorded as steps and generate corresponding Python assertion calls in the output script.

Quick Start

  1. Start a recording — Command Palette → WebCure: Record Browser Session (Python or Both mode)
  2. Interact normally — click, type, navigate
  3. Insert an assertion — press Cmd+Shift+A (macOS) / Ctrl+Shift+A, pick the assertion type, then click the target element
  4. Stop recording — the generated Python script includes assertion calls with pass/fail logging

Element Assertions

Press Cmd+Shift+A or Command Palette → Assert: Element to see a QuickPick menu:

Assertion What It Verifies Example Use Case
Visible Element is visible on the page Verify a success message appeared after form submission
Not Visible Element is NOT visible Verify an error banner disappeared after correction
Text Element contains specific text Verify a heading shows "Welcome, Admin" after login
Value Input/select has a specific value Verify a form field retained its value after page reload
Checked Checkbox/radio is checked Verify "Remember me" checkbox is checked by default
Not Checked Checkbox/radio is NOT checked Verify a terms checkbox starts unchecked
Enabled Element is enabled (clickable) Verify the submit button is active after filling required fields
Disabled Element is disabled Verify the submit button is greyed out before filling the form
Attribute Element attribute has expected value Verify a link's href attribute points to the correct URL

After picking the type, click the target element on the page. WebCure captures the element's locators and current state, then records the assertion step.

Page-Level Assertions

These don't require clicking an element:

Command What It Verifies Example Use Case
Assert: Page Title document.title matches exactly Verify the page title changes to "Dashboard" after login
Assert: Page URL URL matches (exact or contains) Verify redirect to /dashboard after successful login
Assert: Element Count Number of matching elements Verify exactly 5 search results are displayed
Assert: Full Page Snapshot Page body contains specific text Verify "Sign Off" link appears somewhere on the page

Pass/Fail Logging

Generated Python scripts include a built-in logging system:

  • Each step is wrapped in try/except with _record_step() for pass/fail tracking
  • Terminal output shows ✅/❌ icons with step descriptions in real time
  • Log file is saved as test_results_YYYYMMDD_HHMMSS.log
  • Summary prints at the end: TEST SUMMARY: 34/35 steps passed, 1 failed
  • Exit code is 1 if any step failed, 0 if all passed

Example output:

✅  Step 1: Navigate to https://example.com — PASS
✅  Step 2: Assert page title is 'Example' — PASS
✅  Step 3: Typed 'admin' into username — PASS
❌  Step 4: Assert heading text contains 'Welcome' — FAIL  [Expected text 'Welcome' in 'Hello']

============================================================
TEST SUMMARY: 3/4 steps passed, 1 failed
============================================================
Failed steps:
  Step 4: Assert heading text contains 'Welcome'
    Error: Expected text 'Welcome' in 'Hello'
Full log saved to: test_results_20260329_160005.log
============================================================

Generated Python Code

Assertions use the same self-healing locator system as action steps:

# Step 5: Assert heading text contains 'Hello Admin User'
assert_element_text(page, [
    {"strategy": "css", "value": "h1", "confidence": 0.8},
    {"strategy": "xpath", "value": "//h1", "confidence": 0.5},
], "Hello Admin User")

# Step 6: Assert page URL contains 'dashboard'
assert_page_url(page, "dashboard", "contains")

# Step 7: Assert exactly 2 checkboxes exist
assert_element_count(page, [
    {"strategy": "css", "value": "input[type='checkbox']", "confidence": 0.9},
], 2)

Inserting Sleep Steps

To insert an explicit pause at a specific point during recording: Command Palette → WebCure: Insert Wait Step. You will be prompted for a duration in seconds. This adds a time.sleep(N) call at that exact position in the generated Python script.

For a uniform wait after every step, use the default wait between steps option at recording start instead.

File Upload Recording

When the browser opens a native file-chooser dialog, WebCure:

  1. Intercepts it automatically
  2. Prompts you to enter the file path (or leave blank to record a placeholder)
  3. Records the upload step with captured locators

The generated Python script emits an upload_file() call with self-healing locators, or a # TODO comment if no path was provided.

Element Identification

The step recorder uses multiple heuristics to produce human-readable element names:

  • <label for="..."> associations — preferred for form fields
  • Parent <label> wrappers — for fields wrapped inside labels
  • Adjacent table cell text — for table-based layouts (e.g., "Username:" from a neighboring <td>)
  • Previous sibling text<span>, <label>, <b> elements before the input
  • Button value/text — for <button> and <input type="submit">, the button's own text takes priority
  • ARIA attributesaria-label, title, placeholder
  • ARIA rolesrole="menuitem", role="option", etc. with parent menu trigger lookup via aria-labelledby
  • Element ID or name — as a fallback

Each step also records CSS selector and XPath in HTML comments for reference.

Radix UI / Portal-Based Component Support

The step recorder handles modern component libraries (Radix UI, shadcn/ui, etc.) that use pointerdown events and portal-rendered menus:

  • Dropdown triggers: Captured via a "deferred pointerdown" strategy — the recorder captures every pointerdown on interactive elements, waits 400ms, and records it as a click only if no matching click event follows. This handles Radix UI DropdownMenu triggers that open a portal on pointerdown, causing the click to land on document.body.
  • Select options: Radix UI Select options (role="option") are removed from the DOM between pointerdown and mouseup, so the click event never fires on the option. The deferred pointerdown timer catches these and records the selection.
  • Menu items: Elements with role="menuitem" (or data-slot="dropdown-menu-item" / data-slot="select-item") are detected and the recorder walks up the DOM to find the parent [role="menu"] or [role="listbox"] container, then resolves the trigger label via aria-labelledby.

CLI Access

The step recorder can also be started and stopped via the CLI, enabling AI agents to trigger recordings programmatically:

# Markdown mode (default)
node .webcure/cli.js startStepRecorder url=https://example.com
# Python mode with 1-second wait between steps
node .webcure/cli.js startStepRecorder url=https://example.com mode=python defaultWaitSeconds=1
# ... interact with the browser ...
node .webcure/cli.js stopStepRecorder

Automatic Stop on Browser Close

If you close the browser window while recording is active, the recorder:

  1. Logs a final "Browser window closed" step (without a screenshot)
  2. Waits for any pending steps to finish writing
  3. Automatically stops recording and opens the Markdown preview

API Script Recording & Python Playback (Legacy)

WebCure can record your browser actions and generate a Python script that replays them via the API server. This is the original recording method — for most use cases, Browser Session Recording is recommended instead.

Step 1: Install the Python Client

The webcure Python package lives in the python/ directory of this repository. Install it once:

pip install /path/to/webcure/python

Or in development/editable mode:

pip install -e /path/to/webcure/python

This provides module-level convenience functions (navigate, click, type_text, etc.) that the generated scripts import.

Step 2: Record Actions

  1. Open the Command Palette (Cmd+Shift+P)
  2. Run WebCure: Record API Script (Legacy)
  3. Perform browser actions — navigate, click, type, resize, etc. using any WebCure command (Command Palette or Copilot)
  4. Run WebCure: Stop API Script Recording (works even if the browser was closed during the session)

All actions are logged in the WebCure Tools Output channel. Start and stop recording events are also logged there with timestamps.

A new Python script opens in your editor with the recorded actions.

Step 3: Run the Generated Script

The script needs the WebCure API server running to execute commands:

  1. Start the API server — Command Palette → WebCure: Start API Server (or it auto-starts when you stop recording)
  2. Run the script:
python recording.py

Example Output

#!/usr/bin/env python3
# Auto-generated by WebCure

from webcure import click, close_browser, navigate, resize_browser, type_text

navigate("https://demo.testfire.net")
resize_browser("fullscreen")
click("ONLINE BANKING LOGIN")
type_text("admin", into="#uid")
type_text("admin", into="#passw")
click("#login > table > tbody > tr:nth-child(3) > td:nth-child(2) > input[type=submit]")
click("#btnGetAccount")
close_browser()

The Python client automatically detects whether a target is a CSS selector (e.g., #uid, .class, div > span) or visible text (e.g., "ONLINE BANKING LOGIN") and routes it to the correct Playwright locator strategy.

Python Client API

The webcure package also supports class-based usage:

from webcure import WebCure

wc = WebCure(port=5678)
wc.invoke("navigate", {"url": "https://example.com"})
print(wc.health())   # True if API server is running
print(wc.tools())    # List available tool names

To change the default port for module-level functions:

import webcure
webcure.set_port(9999)
webcure.navigate("https://example.com")

JSON Script Runner

Execute multi-step automation scripts with variables, capture patterns, and retry logic.

Script Format

{
  "name": "Login and verify dashboard",
  "stopOnError": true,
  "retries": 1,
  "retryDelay": 2000,
  "variables": {
    "baseUrl": "https://example.com",
    "username": "admin"
  },
  "steps": [
    {
      "command": "navigate",
      "args": { "url": "${baseUrl}/login" }
    },
    {
      "command": "typeText",
      "args": { "text": "${username}", "into": "Username" }
    },
    {
      "command": "click",
      "args": { "target": "Sign In" }
    },
    {
      "command": "find",
      "args": { "text": "Dashboard" },
      "captureRef": "dashRef"
    },
    {
      "command": "interact",
      "args": { "action": "click", "ref": "${dashRef}" }
    }
  ]
}

Running Scripts

  • Command Palette: Cmd+Shift+PWebCure: Run Script → select a .json file
  • CLI: node .webcure/cli.js runScript file=/path/to/script.json

Features

Feature Description
Variables Define at script level, use ${varName} in any step arg
captureRef Auto-extract [ref=eN] from step output into a variable
capturePattern Regex with capture group to extract values from output text
captureValue Map { variableName: "property.path" } to extract structured values
Retries Script-level retries + retryDelay with per-step overrides
stopOnError Continue or halt on failure (default: true, per-step override)
Command Aliases Both camelCase and snake_case accepted

Configuration

Open Settings (Cmd+, / Ctrl+,) and search for webcure:

Setting Default Description
webcure.api.enabled false Enable the HTTP API server on activation
webcure.api.port 5678 Port for the HTTP API server
webcure.api.host 127.0.0.1 Host address for the HTTP API server
webcure.bridge.enabled true Enable the file bridge for AI agent integration

Environment Variables

Variable Default Description
WEBEXPLORER_BROWSER (Chrome) Set to msedge to use Microsoft Edge instead of Chrome

Development

Build from Source

cd ~/Developer/webcure

# Install dependencies
npm install

# Compile TypeScript
npm run compile

# Watch mode (recompile on file changes)
npm run watch

# Package into .vsix
npm run package

Run in Extension Development Host

  1. Open the webcure/ folder in VS Code
  2. Press F5
  3. A new VS Code window opens with the extension loaded
  4. Open the Command Palette and type "WebCure" to test commands

Project Structure

webcure/
├── src/
│   ├── extension.ts          # Entry point: registers tools, bridge, API, commands
│   ├── tools.ts              # 28 Language Model Tool classes
│   ├── browserManager.ts     # Playwright-core browser singleton
│   ├── apiServer.ts          # HTTP API server
│   ├── constants.ts          # Bridge directory/file names
│   ├── types.ts              # Shared types
│   ├── bridge/
│   │   ├── file-bridge.ts    # File-based command router
│   │   └── cli-template.js   # CLI helper (copied to .webcure/)
│   └── recorder/
│       ├── action-log.ts       # Start/stop/record actions
│       ├── script-generator.ts # Convert actions to Python
│       ├── step-recorder.ts   # Step recorder (Markdown + Python + assertions)
│       └── element-rules-engine.ts  # W3C/ARIA element classification engine
├── python/
│   ├── pyproject.toml        # Python package metadata
│   ├── setup.py              # Package setup (pip install)
│   └── webcure/
│       ├── __init__.py       # Convenience functions (navigate, click, etc.)
│       └── client.py         # WebCure API client class
├── tests/
│   ├── MANUAL-TEST-RESULTS.md  # Manual test documentation
│   ├── bridge-integration-tests.sh  # Automated bridge integration tests
│   ├── unit/
│   │   ├── tools.test.ts     # Unit tests (bridge routing, recording, params)
│   │   └── element-rules-engine.test.ts  # 113 unit tests for the rules engine
│   └── integration/
│       ├── live_engine_test.py  # 63 Python live browser tests against real sites
│       ├── test_assertions.py   # 46 assertion helper integration tests
│       ├── test_recorded_assertions.py  # 35-step end-to-end recorded-style test
│       └── screenshots/         # Test screenshots captured during live tests
├── status/
│   ├── project_status_01.md  # Initial release status report
│   ├── project_status_02.md  # Recording fix & Python package
│   ├── project_status_03.md  # Action persistence & interact tool fixes
│   ├── project_status_04.md  # Step recorder feature
│   ├── project_status_05.md  # Radix UI fixes & CLI step recorder
│   ├── project_status_06.md  # Deferred pointerdown & Select support
│   ├── project_status_07.md  # Element rules engine
│   ├── project_status_08.md  # Python test script generation
│   └── project_status_09.md  # Assertion recording & pass/fail logging
├── out/                      # Compiled JavaScript (tsc output)
├── package.json              # Extension manifest + tool/command declarations
├── tsconfig.json             # TypeScript configuration
└── README.md

Testing

Test Dependencies

# TypeScript unit tests (playwright-core + tsx)
npm install                    # installs playwright-core, tsx, typescript from package.json

# Python live browser integration tests
pip install playwright         # Python Playwright bindings (v1.58+)

# Browser binaries (shared by both TS and Python tests)
npx playwright install chromium

Running Tests

# TypeScript unit tests — 113 tests for the Element Rules Engine
npx tsx tests/unit/element-rules-engine.test.ts

# Bridge routing + parameter transformation unit tests
npm run test:unit

# Python live browser integration tests — 63 tests against real websites
# (demo.testfire.net, the-internet.herokuapp.com, Radix UI Themes Playground, W3C WAI-ARIA)
python3 tests/integration/live_engine_test.py

# Assertion helper integration tests — 46 tests against live websites
python3 tests/integration/test_assertions.py

# End-to-end recorded-style assertion test — 35 steps with pass/fail logging
python3 tests/integration/test_recorded_assertions.py

# Automated bridge integration tests (requires VS Code + extension active)
bash tests/bridge-integration-tests.sh

Troubleshooting

Extension doesn't activate

  • Make sure VS Code version is 1.95 or later (required for vscode.lm.registerTool)
  • Check the Output panel → select "WebCure Tools" for error messages

Browser doesn't launch

  • WebCure uses playwright-core which connects to your system Chrome (not a bundled Chromium)
  • Make sure Google Chrome or Microsoft Edge is installed
  • To use Edge: set environment variable WEBEXPLORER_BROWSER=msedge

Copilot doesn't use the tools

  • Ensure GitHub Copilot is active and you have a Copilot subscription
  • The tools should appear when you type #explorer_ in Copilot chat
  • Try asking Copilot explicitly: "Use explorer_navigate to go to https://example.com"

File bridge doesn't respond

  • Check that webcure.bridge.enabled is true in settings
  • Verify .webcure/ directory exists in your workspace root
  • Check that input.json is being created (the CLI writes it)
  • Look at the Output panel → "WebCure Tools" for errors

.vsix build fails

  • Run npm install first to ensure all dependencies are installed
  • If vsce is not found: npx @vscode/vsce package

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors