Page Agent: The Complete Guide to Alibaba's In-Page JavaScript GUI Agent
Page Agent is a JavaScript in-page GUI agent that lets you control web interfaces with natural language — no browser extension, no Python, no headless browser needed. Just in-page JavaScript. Text-based DOM manipulation without screenshots or OCR. By Alibaba (official). 2,500+ stars, TypeScript, MIT.
What Is Page Agent?
Most browser automation tools require external processes — Puppeteer needs Node.js, Playwright needs a driver, Selenium needs a WebDriver. Page Agent takes a radically different approach: everything happens inside your web page. Drop in a <script> tag, and your page becomes controllable through natural language.
- Stars: 2,500+ ⭐
- Forks: 198
- Releases: 18
- Language: TypeScript
- License: MIT
- Author: Alibaba (Official)
- Website: alibaba.github.io/page-agent
Quick Start
One-Line Integration
<script src="https://cdn.jsdelivr.net/npm/page-agent@1.5.4/dist/iife/page-agent.demo.js" crossorigin="true"></script>
NPM Installation
npm install page-agent
import { PageAgent } from 'page-agent'
const agent = new PageAgent({
model: 'qwen3.5-plus',
baseURL: 'https://dashscope.aliyuncs.com/compatible-mode/v1',
apiKey: 'YOUR_API_KEY',
language: 'en-US',
})
await agent.execute('Click the login button')
Key Features
🎯 Truly In-Page
No browser extension. No Python. No headless browser. Just JavaScript — everything happens inside your web page. The simplest integration path for adding AI control to any web app.
📖 Text-Based DOM Manipulation
No screenshots needed. No OCR. No multi-modal LLMs required. No special permissions. Page Agent reads the DOM directly — faster, cheaper, and more reliable than vision-based approaches.
🧠 Bring Your Own LLM
Works with any LLM provider. Default support for Qwen (Alibaba's model) via DashScope, but you can plug in OpenAI, Anthropic, or any compatible API.
🎨 Pretty UI with Human-in-the-Loop
Visual feedback showing what the agent is doing. Users can approve, modify, or reject actions before execution.
🐙 Chrome Extension (Optional)
Extend your agent's reach across browser tabs for multi-page workflows. Optional — core functionality works without it.
4 Use Cases
| Use Case | Description |
|---|---|
| SaaS AI Copilot | Ship an AI copilot in your product in lines of code. No backend rewrite needed. |
| Smart Form Filling | Turn 20-click workflows into one sentence. ERP, CRM, admin systems. |
| Accessibility | Make any web app accessible through natural language. Voice commands, screen readers. |
| Multi-Page Agent | Extend across browser tabs with the Chrome extension. |
Page Agent vs Alternatives
Category: This is an in-page JavaScript GUI agent for browser automation.
| Feature | Page Agent | Playwright | Puppeteer | Agent Browser (Vercel) |
|---|---|---|---|---|
| Focus | In-page JS agent | E2E testing | Browser automation | CLI agent |
| Stars | 2.5K ⭐ | ~70K ⭐ | ~90K ⭐ | 20.2K ⭐ |
| Author | Alibaba | Microsoft | Vercel | |
| In-Page (no driver) | ✅ | ❌ External | ❌ External | ❌ CLI |
| Script Tag Deploy | ✅ One line | ❌ | ❌ | ❌ |
| No Screenshots/OCR | ✅ Text DOM | N/A | N/A | Visual |
| Natural Language | ✅ | ❌ Code only | ❌ Code only | ✅ |
| BYOLLM | ✅ | N/A | N/A | ❌ |
| Human-in-the-Loop UI | ✅ | ❌ | ❌ | ❌ |
| SaaS Copilot Ready | ✅ | ❌ | ❌ | ❌ |
| Multi-Page | ✅ Chrome ext | ✅ | ✅ | ✅ |
When to choose Page Agent: You want to add AI control to your web app with one <script> tag — no external processes, no screenshots, human-in-the-loop.
When to choose Playwright/Puppeteer: You need programmatic E2E testing with full browser control.
When to choose Agent Browser: You need a CLI-based agent for terminal automation.
Conclusion
Page Agent's radical simplicity — one <script> tag and your web app becomes AI-controllable — sets it apart from every other browser automation tool. By operating directly on the DOM instead of screenshots, it's faster, cheaper, and doesn't need multi-modal models. From Alibaba, with BYOLLM support and a human-in-the-loop UI, it's purpose-built for turning any SaaS product into an AI-copilot-equipped application.
