Agent Browser: The Complete Guide to Vercel's Browser Automation CLI for AI Agents

Agent Browser is a browser automation CLI for AI agents built in Rust by Vercel Labs. Ref-based element targeting, semantic locators, agent mode with JSON output, persistent sessions, streaming preview, CDP mode, and experimental pure Rust native daemon. 20,200+ stars, Rust, Apache-2.0.

Agent Browser on GitHub

What Is Agent Browser?

A CLI tool that lets AI agents control browsers through simple commands. Instead of complex Playwright/Selenium scripts, agents issue one-line commands like agent-browser click @e2 using ref IDs from accessibility snapshots. Built with a Rust CLI frontend and daemon architecture for speed.

Language: Rust
License: Apache-2.0
Stars: 20,200+ ⭐
Forks: 1,178
Releases: 37
Author: Vercel Labs
Homepage: agent-browser.dev

Architecture

┌──────────────┐     ┌─────────────────────┐     ┌─────────┐
│  Rust CLI    │────▶│  Daemon             │────▶│ Browser │
│  (fast bin)  │     │  Node.js (default)  │     │Chromium │
│              │     │  or Native (Rust)   │     │Firefox  │
│              │     │                     │     │WebKit   │
└──────────────┘     └─────────────────────┘     └─────────┘

Rust CLI — Fast native binary, parses commands
Node.js Daemon (default) — Manages Playwright browser, supports Chromium/Firefox/WebKit
Native Daemon (experimental) — Pure Rust via CDP, no Node.js required
Auto-start — Daemon starts on first command, persists for fast operations

Core Features

1. Ref-Based Element Targeting

agent-browser snapshot          # Get accessibility tree with refs
agent-browser click @e2         # Click by ref
agent-browser fill @e3 "hello"  # Fill by ref
agent-browser get text @e1      # Get text by ref

The snapshot command returns an accessibility tree where every element gets a ref ID (@e1, @e2, etc.). Agents use these refs to interact — no CSS selectors needed.

2. Semantic Locators

Find elements by meaning, not structure:

Role-based locators
Text and XPath
CSS selectors (also supported)

3. Agent Mode (--json)

Machine-readable JSON output for programmatic agent consumption:

{"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"}}}}

4. Core Commands

Category	Commands
Navigation	open, close, navigate
Interaction	click, fill, select, hover, scroll, drag
Get Info	get text, get value, get attribute, get html
Check State	is visible, is enabled, is checked
Screenshots	screenshot, annotated screenshots
Find	Semantic locators, CSS, XPath, text
Wait	Wait for element, navigation, network
Mouse	Move, click, double-click, drag
Tabs & Windows	New tab, switch, close
Cookies & Storage	Get/set cookies, localStorage, sessionStorage
Network	Intercept, mock, monitor
Frames	Switch frame context
Dialogs	Handle alerts, confirms, prompts
Diff	Compare page states

5. Sessions & Persistence

Session management — Named sessions across commands
Persistent profiles — Reuse browser profiles with cookies/state
State encryption — Encrypt persisted session state

6. Streaming (Browser Preview)

WebSocket-based live browser preview. Enable streaming for real-time visual feedback during automation.

7. CDP Mode

Direct Chrome DevTools Protocol connection. Auto-connect to running Chrome instances.

8. Experimental Native Mode

Pure Rust daemon — direct CDP communication, no Node.js or Playwright dependency. Opt-in with --native.

9. Serverless Support

Works on Vercel and AWS Lambda with custom browser executables.

10. Integrations

Integration	Purpose
iOS Simulator	Mobile browser automation
Browserbase	Cloud browser infrastructure
Browser Use	Agent browser framework
Kernel	Browser runtime

11. AI Agent Workflow

Designed for Claude Code, Codex, and other AI coding assistants. Includes AGENTS.md / CLAUDE.md for agent-specific instructions.

Agent Browser vs Alternatives

Category: This is a browser automation CLI for AI agents.

Feature	Agent Browser	Playwright	Browser Use	Puppeteer
Focus	CLI for AI agents	Test framework	Agent browser framework	Node.js browser API
Stars	20.2K ⭐	~70K ⭐	~60K ⭐	~90K ⭐
CLI-First	✅	❌ Library	❌ Library	❌ Library
Ref-Based Targeting	✅ @e1, @e2	❌	✅	❌
Semantic Locators	✅	✅	✅	❌
JSON Agent Mode	✅ --json	❌	✅	❌
Rust CLI	✅ Fast native binary	❌ Node.js	❌ Python	❌ Node.js
Native CDP Mode	✅ No Node.js	❌	❌	✅ CDP only
Session Encryption	✅	❌	❌	❌
Streaming Preview	✅ WebSocket	❌	❌	❌
Serverless	✅ Vercel/Lambda	✅	❌	✅
iOS Simulator	✅	✅	❌	❌
Multi-Browser	✅ Chromium/Firefox/WebKit	✅	Chromium	Chromium
Author	Vercel Labs	Microsoft	Community	Google

When to choose Agent Browser: You want a fast CLI that AI agents call directly — snapshot, click, fill via refs with minimal overhead.

When to choose Playwright: You need a full test automation framework with assertions, fixtures, and CI integration.

When to choose Browser Use: You want a Python-based agent browser framework with built-in LLM integration.

When to choose Puppeteer: You need a Node.js library for programmatic browser control with CDP.

Conclusion

Agent Browser is the browser automation tool designed specifically for AI agents. With its Rust CLI + daemon architecture, ref-based element targeting from accessibility snapshots, JSON agent mode, session encryption, streaming preview, and serverless support, it gives agents fast, reliable browser control through simple commands. By Vercel Labs, with 20.2K stars and experimental native Rust mode eliminating Node.js entirely.

Explore Agent Browser on GitHub

Agent Browser: The Complete Guide to Vercel's Browser Automation CLI for AI Agents

Agent Browser: The Complete Guide to Vercel's Browser Automation CLI for AI Agents

What Is Agent Browser?

Architecture

Core Features

1. Ref-Based Element Targeting

2. Semantic Locators

3. Agent Mode (--json)

4. Core Commands

5. Sessions & Persistence

6. Streaming (Browser Preview)

7. CDP Mode

8. Experimental Native Mode

9. Serverless Support

10. Integrations

11. AI Agent Workflow

Agent Browser vs Alternatives

Conclusion

Resources

Tags

Claude Code Best Practice: The Complete Guide to Mastering Agentic Coding

Paperclip: The Complete Guide to Open-Source Orchestration for Zero-Human Companies

Crawlee Python: The Complete Guide to Web Scraping and Browser Automation