Firecrawl: The Complete Guide to the Web Data API for AI

Firecrawl is the Web Data API for AI — turn entire websites into LLM-ready markdown or structured data. Scrape, crawl, search, map, extract, and use an AI agent to get web data. Pydantic schema extraction, browser actions, screenshots, batch processing. 89,500+ stars, TypeScript, AGPL-3.0.

Firecrawl on GitHub

What Is Firecrawl?

An API-first platform that converts web content into clean, structured data ready for LLMs. Instead of wrestling with HTML parsers and headless browsers, you get a single API call that returns markdown, HTML, JSON, screenshots, or brand identity from any URL.

Language: TypeScript
License: AGPL-3.0
Stars: 89,500+ ⭐
Forks: 6,252
Releases: 32
Homepage: firecrawl.dev
Deployment: Cloud (hosted) or Self-hosted (open source)

Core Features

1. Scrape

Convert any URL to clean markdown, HTML, or structured data:

from firecrawl import Firecrawl
app = Firecrawl(api_key="fc-YOUR_API_KEY")

doc = app.scrape("https://docs.firecrawl.dev", formats=["markdown", "html"])
print(doc.markdown)

Available formats: markdown, html, rawHtml, screenshot, links, json, branding

2. Extract Structured Data (JSON Mode)

Extract structured data using a Pydantic schema:

from pydantic import BaseModel

class CompanyInfo(BaseModel):
    company_mission: str
    is_open_source: bool
    is_in_yc: bool

result = app.scrape(
    'https://firecrawl.dev',
    formats=[{"type": "json", "schema": CompanyInfo.model_json_schema()}]
)
print(result.json)
# {"company_mission": "Turn websites into LLM-ready data", "is_open_source": true, "is_in_yc": true}

Or extract with just a prompt (no schema needed):

result = app.scrape(
    'https://firecrawl.dev',
    formats=[{"type": "json", "prompt": "Extract the company mission"}]
)

3. Actions (Interact Before Scraping)

Click, type, scroll, wait, and take screenshots before extracting:

doc = app.scrape(
    url="https://example.com/login",
    formats=["markdown"],
    actions=[
        {"type": "write", "text": "user@example.com"},
        {"type": "press", "key": "Tab"},
        {"type": "write", "text": "password"},
        {"type": "click", "selector": 'button[type="submit"]'},
        {"type": "wait", "milliseconds": 2000},
        {"type": "screenshot"}
    ]
)

4. AI Agent

Describe what you need — no URLs required. The agent searches, navigates, and extracts:

curl -X POST 'https://api.firecrawl.dev/v2/agent' \
  -H 'Authorization: Bearer fc-YOUR_API_KEY' \
  -d '{ "prompt": "Find the pricing plans for Notion" }'

Returns structured data with source URLs automatically.

5. Crawl

Crawl entire websites and get all pages as structured data:

job = app.crawl("https://docs.firecrawl.dev", formats=["markdown"])
for doc in job.data:
    print(doc.metadata.source_url)

6. Map

Get a site's URL structure without scraping content:

links = app.map("https://firecrawl.dev")

7. Search

Search the web and get content from results:

results = app.search("best LLM frameworks 2026")

8. Batch Scraping

Scrape multiple URLs at once:

job = app.batch_scrape([
    "https://firecrawl.dev",
    "https://docs.firecrawl.dev",
    "https://firecrawl.dev/pricing"
], formats=["markdown"])

9. Screenshots & Branding

Capture visual screenshots or extract brand identity:

doc = app.scrape("https://firecrawl.dev", formats=["screenshot"])
print(doc.screenshot)  # Base64 image

doc = app.scrape("https://firecrawl.dev", formats=["branding"])
print(doc.branding)  # {"colors": {...}, "fonts": [...], "typography": {...}}

SDKs

Language	Type
Python	Official
Node.js	Official
Go	Community
Rust	Community
Ruby	Community
PHP	Community
Java	Community
C#	Community
Elixir	Community
Swift	Community

Firecrawl vs Alternatives

Category: This is a Web Data API for AI / web scraping platform.

Feature	Firecrawl	Crawl4AI	BeautifulSoup
Focus	API-first web data for AI	Open-source AI crawler	HTML parser library
Stars	89.5K ⭐	~30K ⭐	~30K ⭐
API Service	✅ Cloud + self-hosted	❌ Self-hosted only	❌ Library only
LLM-Ready Output	✅ Markdown/JSON/HTML	✅ Markdown	❌ Raw HTML
JSON Extraction	✅ Pydantic schema + prompt	✅ Schema	❌
AI Agent Mode	✅ No URLs needed	❌	❌
Browser Actions	✅ Click/type/scroll/wait	✅	❌
Screenshots	✅	✅	❌
Branding Extraction	✅ Colors/fonts/typography	❌	❌
Batch Scraping	✅	✅	Manual
Site Mapping	✅ /map endpoint	❌	❌
Search Integration	✅ Search + scrape	❌	❌
SDKs	✅ 10+ languages	Python	Python
Crawl Entire Sites	✅	✅	Manual
Self-Hosted	✅	✅	N/A

When to choose Firecrawl: You need an API-first solution for converting web data to LLM-ready formats — with JSON extraction, AI agent mode, batch processing, and multi-language SDKs. Best for production AI pipelines.

When to choose Crawl4AI: You want a fully open-source, self-hosted crawler with no cloud dependency and good AI extraction capabilities.

When to choose BeautifulSoup: You need a lightweight HTML parser for simple scraping tasks without AI extraction features.

Conclusion

Firecrawl is the most mature and feature-complete web data API for AI. With 89.5K stars, it offers everything from simple URL-to-markdown conversion to AI agent-powered extraction that doesn't even need URLs. The Pydantic schema extraction, browser actions, batch processing, site mapping, and 10+ language SDKs make it the go-to choice for feeding web data to LLMs at scale.

Explore Firecrawl on GitHub

Firecrawl: The Complete Guide to the Web Data API for AI

Firecrawl: The Complete Guide to the Web Data API for AI

What Is Firecrawl?

Core Features

1. Scrape

2. Extract Structured Data (JSON Mode)

3. Actions (Interact Before Scraping)

4. AI Agent

5. Crawl

6. Map

7. Search

8. Batch Scraping

9. Screenshots & Branding

SDKs

Firecrawl vs Alternatives

Conclusion

Resources

Tags

Claude Code Best Practice: The Complete Guide to Mastering Agentic Coding

Paperclip: The Complete Guide to Open-Source Orchestration for Zero-Human Companies

Crawlee Python: The Complete Guide to Web Scraping and Browser Automation