AgentBrowser

AgentBrowser

AI-Powered Browser Automation Platform

Give any AI agent a real browser. REST API + Web Dashboard + Vision AI.

Documentation · Quick Start · API Docs · AI Agent Guide · Architecture

What is AgentBrowser?

AgentBrowser is an open-source browser automation platform designed to be the hands and eyes of AI agents. It provides a REST API and real-time WebSocket interface for controlling real browser instances — letting LLMs, CLIs, and autonomous agents navigate the web, interact with pages, extract data, and perform complex multi-step tasks.

Built on Next.js 16, Playwright, and TypeScript, it ships with a professional web dashboard for real-time monitoring and control, a comprehensive REST API for programmatic access, and a built-in Vision AI system that provides screenshots, simplified DOM, accessibility trees, and interactive element detection.

Why AgentBrowser?

Most browser automation tools are designed for testing. AgentBrowser is designed for AI agents.
Clean REST API that any LLM can call — no browser-specific knowledge required.
Vision AI reduces web pages to structured data that language models can reason about.
Session persistence means agents can pick up where they left off.
Compatible with OpenClaw, Hermes, OpenAI Function Calling, and any custom agent framework.

Features

Browser Engine

Multi-browser support — Chromium, Firefox, and WebKit via Playwright
Session management — Persistent cookies, localStorage, and browser state across requests
Headless & headed modes — Run headless on servers, headed for debugging
Configurable viewports — Custom screen sizes and resolutions
Proxy support — Route traffic through HTTP/HTTPS proxies with authentication

25+ Browser Actions

Navigation — navigate, goBack, goForward, reload
Mouse — click, dblclick, hover, rightClick
Keyboard — type, press, select
Scrolling — scroll with direction and element targeting
Waiting — wait, waitForSelector, waitForNavigation
Screenshots — Full page, element-specific, PNG/JPEG
JavaScript — evaluate arbitrary JS in the page context
Cookies — getCookies, setCookies, clearCookies
Storage — getLocalStorage, setLocalStorage, clearLocalStorage
Info — getUrl, getTitle, getContent

Vision AI

Screenshots — Base64 PNG/JPEG screenshots on demand
Simplified DOM — Cleaned HTML with interactive elements highlighted
Accessibility tree — Full a11y tree for screen-reader-style understanding
Interactive element detection — Auto-detects buttons, links, inputs, and more with selectors and coordinates
Page metadata — Title, description, OG tags, favicon, language

Developer Experience

REST API — 8 clean JSON endpoints compatible with any LLM, CLI, or SDK
Real-time WebSocket — Live updates on actions, screenshots, and session changes via Socket.IO
Web Dashboard — Professional dark-themed UI with session management, live preview, and action logs
TypeScript — End-to-end type safety with exported types
SQLite + Prisma — Zero-config persistence for sessions and action logs
Action logging — Every browser action is recorded with timing and results

Quick Start

Prerequisites

Node.js 18+ or Bun 1.x
Playwright browsers (installed automatically via postinstall)

Install

# Clone the repository
git clone https://github.com/smouj/agent-browser.git
cd agent-browser

# Install dependencies
bun install
# or: npm install

# Set up the database
bun run db:push
# or: npx prisma db push

# Install Playwright browsers (if not already installed)
bunx playwright install chromium

Run

# Development mode (hot reload, port 3000)
bun run dev

# Production mode
bun run build
bun run start

Open http://localhost:3000 to access the web dashboard.

One-Line Setup

git clone https://github.com/smouj/agent-browser.git && cd agent-browser && bun install && bun run db:push && bunx playwright install chromium && bun run dev

AI Agent Integration Guide

How It Works

AgentBrowser follows a simple observe-think-act loop:

1. CREATE SESSION  → POST /api/browser/sessions
2. NAVIGATE        → POST /sessions/{id}/action  { action: "navigate" }
3. OBSERVE         → POST /sessions/{id}/vision   (screenshot + DOM + elements)
4. THINK           → LLM analyzes vision data and decides next action
5. ACT             → POST /sessions/{id}/action  { action: "click/type/scroll" }
6. REPEAT 3-5      → Until task is complete
7. CLEANUP         → DELETE /sessions/{id}

OpenClaw Integration

OpenClaw uses tool-calling to interact with external services. Register AgentBrowser as a set of tools:

1. Start AgentBrowser:

bun run dev  # http://localhost:3000

2. Create a tool definition in your OpenClaw project (tools/browser.yaml):

name: browser_navigate
description: "Navigate the browser to a URL"
endpoint: "http://localhost:3000/api/browser/sessions/{session_id}/action"
method: POST
parameters:
  session_id:
    type: string
    description: "Active browser session ID"
  action:
    type: string
    default: "navigate"
  target:
    type: string
    description: "URL to navigate to"

3. Register all tools: browser_navigate, browser_click, browser_type, browser_vision, browser_screenshot, browser_scroll

4. Create a session at startup and pass the session_id to all tool calls

5. Use Vision AI to let the agent "see" the page before deciding what to do

Hermes Integration

Hermes supports MCP (Model Context Protocol) servers. Add AgentBrowser to your Hermes config:

# hermes.config.yaml
mcp_servers:
  agentbrowser:
    type: "rest"
    base_url: "http://localhost:3000/api/browser"
    tools:
      - name: "create_session"
        path: "/sessions"
        method: "POST"
      - name: "execute_action"
        path: "/sessions/{session_id}/action"
        method: "POST"
      - name: "get_vision"
        path: "/sessions/{session_id}/vision"
        method: "POST"
      - name: "close_session"
        path: "/sessions/{session_id}"
        method: "DELETE"

OpenAI Function Calling

Define AgentBrowser as an OpenAI tool:

tools = [{
    "type": "function",
    "function": {
        "name": "browser_navigate",
        "description": "Navigate browser to a URL",
        "parameters": {
            "type": "object",
            "properties": {
                "url": {"type": "string"}
            },
            "required": ["url"]
        }
    }
}, {
    "type": "function",
    "function": {
        "name": "browser_vision",
        "description": "Get AI vision snapshot - screenshot, DOM, interactive elements",
        "parameters": {
            "type": "object",
            "properties": {
                "full_page": {"type": "boolean", "default": False}
            }
        }
    }
}, {
    "type": "function",
    "function": {
        "name": "browser_click",
        "description": "Click an element on the page using a CSS selector",
        "parameters": {
            "type": "object",
            "properties": {
                "selector": {"type": "string"}
            },
            "required": ["selector"]
        }
    }
}]

Python Agent Example

import requests

BASE = "http://localhost:3000/api/browser/sessions"

# 1. Create session
s = requests.post(BASE, json={
    "name": "python-agent",
    "browserType": "chromium",
    "headless": True
}).json()
sid = s["id"]

# 2. Navigate
requests.post(f"{BASE}/{sid}/action", json={
    "action": "navigate",
    "target": "https://news.ycombinator.com"
})

# 3. Get vision (for LLM)
vision = requests.post(f"{BASE}/{sid}/vision", json={}).json()

# 4. Pass interactive elements to your LLM
for el in vision["interactiveElements"]:
    print(f"{el['type']}: {el['text']} → {el['selector']}")

# 5. Clean up
requests.delete(f"{BASE}/{sid}")

REST API

All endpoints return JSON. Base URL: http://localhost:3000

Sessions

Method	Endpoint	Description
`POST`	`/api/browser/sessions`	Create a new browser session
`GET`	`/api/browser/sessions`	List all sessions
`GET`	`/api/browser/sessions/:id`	Get session details
`DELETE`	`/api/browser/sessions/:id`	Close and delete a session
`POST`	`/api/browser/sessions/:id/action`	Execute a browser action
`POST`	`/api/browser/sessions/:id/vision`	Get AI vision snapshot
`GET/POST`	`/api/browser/sessions/:id/cookies`	Get or set cookies
`GET`	`/api/browser/sessions/:id/logs`	Get paginated action logs

Create a Session

curl -X POST http://localhost:3000/api/browser/sessions \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my-agent-session",
    "browserType": "chromium",
    "headless": true
  }'

Request body:

Field	Type	Default	Description
`name`	`string`	Auto-generated	Session name
`browserType`	`"chromium" \| "firefox" \| "webkit"`	`"chromium"`	Browser engine
`headless`	`boolean`	`true`	Run without UI
`proxy`	`object`	`undefined`	Proxy config (`server`, `username?`, `password?`)
`viewport`	`object`	`{width: 1280, height: 720}`	Viewport size
`userAgent`	`string`	Browser default	Custom user agent
`locale`	`string`	`"en-US"`	Browser locale
`timezone`	`string`	`"America/New_York"`	Browser timezone

Execute an Action

# Navigate
curl -X POST http://localhost:3000/api/browser/sessions/{id}/action \
  -H "Content-Type: application/json" \
  -d '{"action":"navigate","target":"https://example.com"}'

# Click
curl -X POST http://localhost:3000/api/browser/sessions/{id}/action \
  -H "Content-Type: application/json" \
  -d '{"action":"click","target":"button#submit"}'

# Type
curl -X POST http://localhost:3000/api/browser/sessions/{id}/action \
  -H "Content-Type: application/json" \
  -d '{"action":"type","target":"input#search","value":"hello","options":{"pressEnter":true}}'

# Screenshot
curl -X POST http://localhost:3000/api/browser/sessions/{id}/action \
  -H "Content-Type: application/json" \
  -d '{"action":"screenshot","options":{"fullPage":true}}'

Get Vision Snapshot

curl -X POST http://localhost:3000/api/browser/sessions/{id}/vision \
  -H "Content-Type: application/json" \
  -d '{}'

Returns: screenshot (base64), simplified DOM, accessibility tree, interactive elements with selectors, and page metadata.

Supported Actions

Category	Action	`target`	`value`	`options`
Navigation	`navigate`	URL	—	`waitUntil`, `timeout`
	`goBack`	—	—	—
	`goForward`	—	—	—
	`reload`	—	—	`waitUntil`
Mouse	`click`	CSS selector	—	`button`, `clickCount`, `delay`
	`dblclick`	CSS selector	—	`delay`
	`hover`	CSS selector	—	—
	`rightClick`	CSS selector	—	`delay`
Keyboard	`type`	CSS selector	Text to type	`clear`, `pressEnter`, `delay`
	`press`	—	Key name	—
	`select`	CSS selector	Option value	—
Scroll	`scroll`	Direction	Pixel amount	`element`
Wait	`wait`	—	Milliseconds	—
	`waitForSelector`	CSS selector	—	`state`, `timeout`
	`waitForNavigation`	—	—	`waitUntil`, `timeout`
Capture	`screenshot`	—	—	`fullPage`, `element`, `quality`, `type`
JS	`evaluate`	—	JS expression	—
Cookies	`getCookies`	—	—	—
	`setCookies`	—	JSON cookies array	—
	`clearCookies`	—	—	—
Storage	`getLocalStorage`	—	—	—
	`setLocalStorage`	—	JSON key-value pairs	—
	`clearLocalStorage`	—	—	—
Info	`getUrl`	—	—	—
	`getTitle`	—	—	—
	`getContent`	—	—	—

WebSocket Events

AgentBrowser emits real-time events via Socket.IO for live dashboards and reactive agent loops.

const socket = io('http://localhost:3000');

socket.on('action', (event) => {
  console.log(`[${event.sessionId}] ${event.action} → ${event.result.success ? 'OK' : 'FAIL'}`);
});

socket.on('session_update', (event) => {
  console.log(`Session ${event.sessionId}: ${event.data.currentUrl}`);
});

socket.on('screenshot', (event) => {
  const img = Buffer.from(event.screenshot, 'base64');
});

Event types: session_created, session_update, session_closed, action, screenshot

Configuration

Environment Variables

Create a .env file in the project root:

# Database (SQLite)
DATABASE_URL="file:./custom.db"

# Server
PORT=3000
NODE_ENV="development"

# Browser defaults
DEFAULT_BROWSER_TYPE="chromium"
DEFAULT_HEADLESS=true
DEFAULT_VIEWPORT_WIDTH=1280
DEFAULT_VIEWPORT_HEIGHT=720

# Session limits
SESSION_TIMEOUT_MS=3600000
MAX_CONCURRENT_SESSIONS=10

Architecture

┌─────────────────────────────────────────────────────┐
│                    Clients                          │
│         LLMs · CLIs · Web Dashboard                │
└──────────────┬──────────────────┬──────────────────┘
               │  REST API        │  WebSocket
┌──────────────▼──────────────────▼──────────────────┐
│              Next.js API Routes                     │
│    /api/browser/sessions/*                         │
└──────────────┬─────────────────────────────────────┘
               │
┌──────────────▼─────────────────────────────────────┐
│           Browser Engine Layer                      │
│    Session Manager · Action Executor · Vision       │
└──────────────┬─────────────────────────────────────┘
               │
┌──────────────▼─────────────────────────────────────┐
│              Playwright                             │
│    Chromium · Firefox · WebKit                     │
└────────────────────────────────────────────────────┘
               │
┌──────────────▼─────────────────────────────────────┐
│          SQLite (via Prisma)                        │
│    Sessions · Action Logs                           │
└────────────────────────────────────────────────────┘

Module	Path	Description
API Routes	`src/app/api/browser/sessions/`	REST endpoints for session and action management
Browser Engine	`src/lib/browser/engine.ts`	Singleton session lifecycle manager
Action Executor	`src/lib/browser/actions.ts`	25+ browser actions with logging
Vision System	`src/lib/browser/vision.ts`	Screenshots, DOM simplification, a11y tree
Types	`src/lib/browser/types.ts`	TypeScript interfaces for all API types
Database	`prisma/schema.prisma`	Session and action log persistence
Dashboard	`src/app/page.tsx`	Web UI with session sidebar and live updates

Tech Stack

Technology	Purpose
Next.js 16	Full-stack React framework, API routes
TypeScript	End-to-end type safety
Tailwind CSS 4	Utility-first styling
shadcn/ui	UI component library
Playwright	Browser automation engine
Prisma	Type-safe database ORM (SQLite)
Socket.IO	Real-time WebSocket communication
Zustand	Client-side state management

Project Structure

agent-browser/
├── assets/                         # Logo, screenshots, OG image
├── docs/                           # GitHub Pages documentation
│   └── index.html                  # Full documentation site
├── src/
│   ├── app/
│   │   ├── api/browser/sessions/
│   │   │   ├── route.ts            # POST (create), GET (list)
│   │   │   └── [id]/
│   │   │       ├── route.ts        # GET (detail), DELETE (close)
│   │   │       ├── action/route.ts # POST (execute action)
│   │   │       ├── vision/route.ts # POST (vision snapshot)
│   │   │       ├── cookies/route.ts# GET, POST (cookies)
│   │   │       └── logs/route.ts   # GET (action logs)
│   │   ├── layout.tsx
│   │   ├── page.tsx                # Web dashboard
│   │   └── globals.css
│   ├── components/ui/              # shadcn/ui components
│   ├── lib/
│   │   ├── browser/
│   │   │   ├── engine.ts           # Session lifecycle manager
│   │   │   ├── actions.ts          # 25+ action executor
│   │   │   ├── vision.ts           # Vision AI system
│   │   │   ├── session.ts          # Session helpers
│   │   │   ├── types.ts            # TypeScript interfaces
│   │   │   └── utils.ts            # Utility functions
│   │   ├── db.ts                   # Prisma client
│   │   └── utils.ts                # General utilities
│   └── hooks/                      # React hooks
├── prisma/
│   └── schema.prisma               # Database schema
├── public/                         # Static assets
└── package.json

Comparison with Alternatives

AgentBrowser vs other AI browser automation tools:

Feature	AgentBrowser	Browser Use	Stagehand	Playwright MCP
Self-hosted	Yes	Yes	No (cloud)	Yes
REST API	8 endpoints	No	SDK only	MCP only
Multi-browser	Chromium, Firefox, WebKit	Chromium only	Chromium only	All 3
Vision AI	Screenshots, DOM, a11y tree, elements	Screenshot only	DOM only	None
Session persistence	Cookies, localStorage, DB	Memory only	No	No
Web Dashboard	Yes (dark theme)	No	No	No
WebSocket events	Yes	No	No	No
25+ browser actions	Yes	Limited	Limited	Basic
Agent agnostic	Any LLM/CLI	Python only	JS/Python	Any MCP client
License	MIT	MIT	Commercial	MIT
Database	SQLite + Prisma	None	Cloud	None

Notable alternatives not in the table:

LaVague — RAG-based, research-focused approach with no REST API.
Skyvern — Commercial cloud platform with no self-hosting option, workflow-only.
WebVoyager — Academic/research project, not production-ready.

AgentBrowser is the only open-source solution that combines a full REST API, Vision AI system, session persistence, AND a web dashboard — making it the most complete platform for AI browser automation.

Contributing

Contributions are welcome! Please read our Contributing Guidelines before submitting a pull request.

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

This project is licensed under the MIT License. See the LICENSE file for details. 2026

Built with Next.js, Playwright, and TypeScript

Made for AI agents, by developers who build with AI agents.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
docs		docs
mini-services		mini-services
prisma		prisma
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
components.json		components.json
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
tsconfig.json		tsconfig.json

Folders and files

Latest commit

History

Repository files navigation

AgentBrowser

What is AgentBrowser?

Features

Browser Engine

25+ Browser Actions

Vision AI

Developer Experience

Quick Start

Prerequisites

Install

Run

One-Line Setup

AI Agent Integration Guide

How It Works

OpenClaw Integration

Hermes Integration

OpenAI Function Calling

Python Agent Example

REST API

Sessions

Create a Session

Execute an Action

Get Vision Snapshot

Supported Actions

WebSocket Events

Configuration

Environment Variables

Architecture

Tech Stack

Project Structure

Comparison with Alternatives

Contributing

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages