Firecrawl: The Complete Guide to the Web Data API for AI
Firecrawl is the Web Data API for AI — turn entire websites into LLM-ready markdown or structured data. Scrape, crawl, search, map, extract, and use an AI agent to get web data. Pydantic schema extraction, browser actions, screenshots, batch processing. 89,500+ stars, TypeScript, AGPL-3.0.
What Is Firecrawl?
An API-first platform that converts web content into clean, structured data ready for LLMs. Instead of wrestling with HTML parsers and headless browsers, you get a single API call that returns markdown, HTML, JSON, screenshots, or brand identity from any URL.
- Language: TypeScript
- License: AGPL-3.0
- Stars: 89,500+ ⭐
- Forks: 6,252
- Releases: 32
- Homepage: firecrawl.dev
- Deployment: Cloud (hosted) or Self-hosted (open source)
Core Features
1. Scrape
Convert any URL to clean markdown, HTML, or structured data:
from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")
doc = app.scrape("https://docs.firecrawl.dev", formats=["markdown", "html"])
print(doc.markdown)
Available formats: markdown, html, rawHtml, screenshot, links, json, branding
2. Extract Structured Data (JSON Mode)
Extract structured data using a Pydantic schema:
from pydantic import BaseModel
class CompanyInfo(BaseModel):
company_mission: str
is_open_source: bool
is_in_yc: bool
result = app.scrape(
'https://firecrawl.dev',
formats=[{"type": "json", "schema": CompanyInfo.model_json_schema()}]
)
print(result.json)
# {"company_mission": "Turn websites into LLM-ready data", "is_open_source": true, "is_in_yc": true}
Or extract with just a prompt (no schema needed):
result = app.scrape(
'https://firecrawl.dev',
formats=[{"type": "json", "prompt": "Extract the company mission"}]
)
3. Actions (Interact Before Scraping)
Click, type, scroll, wait, and take screenshots before extracting:
doc = app.scrape(
url="https://example.com/login",
formats=["markdown"],
actions=[
{"type": "write", "text": "user@example.com"},
{"type": "press", "key": "Tab"},
{"type": "write", "text": "password"},
{"type": "click", "selector": 'button[type="submit"]'},
{"type": "wait", "milliseconds": 2000},
{"type": "screenshot"}
]
)
4. AI Agent
Describe what you need — no URLs required. The agent searches, navigates, and extracts:
curl -X POST 'https://api.firecrawl.dev/v2/agent' \
-H 'Authorization: Bearer fc-YOUR_API_KEY' \
-d '{ "prompt": "Find the pricing plans for Notion" }'
Returns structured data with source URLs automatically.
5. Crawl
Crawl entire websites and get all pages as structured data:
job = app.crawl("https://docs.firecrawl.dev", formats=["markdown"])
for doc in job.data:
print(doc.metadata.source_url)
6. Map
Get a site's URL structure without scraping content:
links = app.map("https://firecrawl.dev")
7. Search
Search the web and get content from results:
results = app.search("best LLM frameworks 2026")
8. Batch Scraping
Scrape multiple URLs at once:
job = app.batch_scrape([
"https://firecrawl.dev",
"https://docs.firecrawl.dev",
"https://firecrawl.dev/pricing"
], formats=["markdown"])
9. Screenshots & Branding
Capture visual screenshots or extract brand identity:
doc = app.scrape("https://firecrawl.dev", formats=["screenshot"])
print(doc.screenshot) # Base64 image
doc = app.scrape("https://firecrawl.dev", formats=["branding"])
print(doc.branding) # {"colors": {...}, "fonts": [...], "typography": {...}}
SDKs
| Language | Type |
|---|---|
| Python | Official |
| Node.js | Official |
| Go | Community |
| Rust | Community |
| Ruby | Community |
| PHP | Community |
| Java | Community |
| C# | Community |
| Elixir | Community |
| Swift | Community |
Firecrawl vs Alternatives
Category: This is a Web Data API for AI / web scraping platform.
| Feature | Firecrawl | Crawl4AI | BeautifulSoup |
|---|---|---|---|
| Focus | API-first web data for AI | Open-source AI crawler | HTML parser library |
| Stars | 89.5K ⭐ | ~30K ⭐ | ~30K ⭐ |
| API Service | ✅ Cloud + self-hosted | ❌ Self-hosted only | ❌ Library only |
| LLM-Ready Output | ✅ Markdown/JSON/HTML | ✅ Markdown | ❌ Raw HTML |
| JSON Extraction | ✅ Pydantic schema + prompt | ✅ Schema | ❌ |
| AI Agent Mode | ✅ No URLs needed | ❌ | ❌ |
| Browser Actions | ✅ Click/type/scroll/wait | ✅ | ❌ |
| Screenshots | ✅ | ✅ | ❌ |
| Branding Extraction | ✅ Colors/fonts/typography | ❌ | ❌ |
| Batch Scraping | ✅ | ✅ | Manual |
| Site Mapping | ✅ /map endpoint | ❌ | ❌ |
| Search Integration | ✅ Search + scrape | ❌ | ❌ |
| SDKs | ✅ 10+ languages | Python | Python |
| Crawl Entire Sites | ✅ | ✅ | Manual |
| Self-Hosted | ✅ | ✅ | N/A |
When to choose Firecrawl: You need an API-first solution for converting web data to LLM-ready formats — with JSON extraction, AI agent mode, batch processing, and multi-language SDKs. Best for production AI pipelines.
When to choose Crawl4AI: You want a fully open-source, self-hosted crawler with no cloud dependency and good AI extraction capabilities.
When to choose BeautifulSoup: You need a lightweight HTML parser for simple scraping tasks without AI extraction features.
Conclusion
Firecrawl is the most mature and feature-complete web data API for AI. With 89.5K stars, it offers everything from simple URL-to-markdown conversion to AI agent-powered extraction that doesn't even need URLs. The Pydantic schema extraction, browser actions, batch processing, site mapping, and 10+ language SDKs make it the go-to choice for feeding web data to LLMs at scale.
