Agent Browser: The Complete Guide to Vercel's Browser Automation CLI for AI Agents
Agent Browser is a browser automation CLI for AI agents built in Rust by Vercel Labs. Ref-based element targeting, semantic locators, agent mode with JSON output, persistent sessions, streaming preview, CDP mode, and experimental pure Rust native daemon. 20,200+ stars, Rust, Apache-2.0.
What Is Agent Browser?
A CLI tool that lets AI agents control browsers through simple commands. Instead of complex Playwright/Selenium scripts, agents issue one-line commands like agent-browser click @e2 using ref IDs from accessibility snapshots. Built with a Rust CLI frontend and daemon architecture for speed.
- Language: Rust
- License: Apache-2.0
- Stars: 20,200+ ⭐
- Forks: 1,178
- Releases: 37
- Author: Vercel Labs
- Homepage: agent-browser.dev
Architecture
┌──────────────┐ ┌─────────────────────┐ ┌─────────┐
│ Rust CLI │────▶│ Daemon │────▶│ Browser │
│ (fast bin) │ │ Node.js (default) │ │Chromium │
│ │ │ or Native (Rust) │ │Firefox │
│ │ │ │ │WebKit │
└──────────────┘ └─────────────────────┘ └─────────┘
- Rust CLI — Fast native binary, parses commands
- Node.js Daemon (default) — Manages Playwright browser, supports Chromium/Firefox/WebKit
- Native Daemon (experimental) — Pure Rust via CDP, no Node.js required
- Auto-start — Daemon starts on first command, persists for fast operations
Core Features
1. Ref-Based Element Targeting
agent-browser snapshot # Get accessibility tree with refs
agent-browser click @e2 # Click by ref
agent-browser fill @e3 "hello" # Fill by ref
agent-browser get text @e1 # Get text by ref
The snapshot command returns an accessibility tree where every element gets a ref ID (@e1, @e2, etc.). Agents use these refs to interact — no CSS selectors needed.
2. Semantic Locators
Find elements by meaning, not structure:
- Role-based locators
- Text and XPath
- CSS selectors (also supported)
3. Agent Mode (--json)
Machine-readable JSON output for programmatic agent consumption:
{"success":true,"data":{"snapshot":"...","refs":{"e1":{"role":"heading","name":"Title"}}}}
4. Core Commands
| Category | Commands |
|---|---|
| Navigation | open, close, navigate |
| Interaction | click, fill, select, hover, scroll, drag |
| Get Info | get text, get value, get attribute, get html |
| Check State | is visible, is enabled, is checked |
| Screenshots | screenshot, annotated screenshots |
| Find | Semantic locators, CSS, XPath, text |
| Wait | Wait for element, navigation, network |
| Mouse | Move, click, double-click, drag |
| Tabs & Windows | New tab, switch, close |
| Cookies & Storage | Get/set cookies, localStorage, sessionStorage |
| Network | Intercept, mock, monitor |
| Frames | Switch frame context |
| Dialogs | Handle alerts, confirms, prompts |
| Diff | Compare page states |
5. Sessions & Persistence
- Session management — Named sessions across commands
- Persistent profiles — Reuse browser profiles with cookies/state
- State encryption — Encrypt persisted session state
6. Streaming (Browser Preview)
WebSocket-based live browser preview. Enable streaming for real-time visual feedback during automation.
7. CDP Mode
Direct Chrome DevTools Protocol connection. Auto-connect to running Chrome instances.
8. Experimental Native Mode
Pure Rust daemon — direct CDP communication, no Node.js or Playwright dependency. Opt-in with --native.
9. Serverless Support
Works on Vercel and AWS Lambda with custom browser executables.
10. Integrations
| Integration | Purpose |
|---|---|
| iOS Simulator | Mobile browser automation |
| Browserbase | Cloud browser infrastructure |
| Browser Use | Agent browser framework |
| Kernel | Browser runtime |
11. AI Agent Workflow
Designed for Claude Code, Codex, and other AI coding assistants. Includes AGENTS.md / CLAUDE.md for agent-specific instructions.
Agent Browser vs Alternatives
Category: This is a browser automation CLI for AI agents.
| Feature | Agent Browser | Playwright | Browser Use | Puppeteer |
|---|---|---|---|---|
| Focus | CLI for AI agents | Test framework | Agent browser framework | Node.js browser API |
| Stars | 20.2K ⭐ | ~70K ⭐ | ~60K ⭐ | ~90K ⭐ |
| CLI-First | ✅ | ❌ Library | ❌ Library | ❌ Library |
| Ref-Based Targeting | ✅ @e1, @e2 | ❌ | ✅ | ❌ |
| Semantic Locators | ✅ | ✅ | ✅ | ❌ |
| JSON Agent Mode | ✅ --json | ❌ | ✅ | ❌ |
| Rust CLI | ✅ Fast native binary | ❌ Node.js | ❌ Python | ❌ Node.js |
| Native CDP Mode | ✅ No Node.js | ❌ | ❌ | ✅ CDP only |
| Session Encryption | ✅ | ❌ | ❌ | ❌ |
| Streaming Preview | ✅ WebSocket | ❌ | ❌ | ❌ |
| Serverless | ✅ Vercel/Lambda | ✅ | ❌ | ✅ |
| iOS Simulator | ✅ | ✅ | ❌ | ❌ |
| Multi-Browser | ✅ Chromium/Firefox/WebKit | ✅ | Chromium | Chromium |
| Author | Vercel Labs | Microsoft | Community |
When to choose Agent Browser: You want a fast CLI that AI agents call directly — snapshot, click, fill via refs with minimal overhead.
When to choose Playwright: You need a full test automation framework with assertions, fixtures, and CI integration.
When to choose Browser Use: You want a Python-based agent browser framework with built-in LLM integration.
When to choose Puppeteer: You need a Node.js library for programmatic browser control with CDP.
Conclusion
Agent Browser is the browser automation tool designed specifically for AI agents. With its Rust CLI + daemon architecture, ref-based element targeting from accessibility snapshots, JSON agent mode, session encryption, streaming preview, and serverless support, it gives agents fast, reliable browser control through simple commands. By Vercel Labs, with 20.2K stars and experimental native Rust mode eliminating Node.js entirely.
Explore Agent Browser on GitHub
