How to Scrape Any Website Using Firecrawl in 2026
Learn how to scrape any website using Firecrawl in 2026 — from your first API call to crawling sites and extracting structured AI data with code examples.
The web holds more data than ever, but extracting it cleanly has always been the hard part. Traditional scrapers break the moment a site ships a new layout, loads content with JavaScript, or throws up an anti-bot wall. With over 1.1 billion websites online and a growing share rendered entirely client-side, developers need a tool that turns messy HTML into clean, structured data without endless maintenance.
That is exactly what Firecrawl does. It is an AI-powered scraping API that takes any URL and returns LLM-ready markdown, HTML, or structured JSON — handling JavaScript rendering, pagination, and crawling for you. Instead of writing brittle CSS selectors, you point Firecrawl at a page and get usable data back.
In this guide you will learn how to scrape any website using Firecrawl — from your first API call to crawling entire sites and extracting structured data with AI. We will also cover when to pair it with proxies and the mistakes that trip up beginners. If you are new to the field, our introduction to web scraping is a useful companion read.
What Is Firecrawl?
Firecrawl is a developer-first web scraping and crawling API that converts websites into clean, structured data. Give it a URL and it renders the page in a real browser, strips away navigation and clutter, and returns the content in the format you ask for — markdown, raw HTML, screenshots, or schema-defined JSON.
What sets it apart is its AI-native design. It was built to feed large language models and retrieval pipelines, so its default output is markdown that is ready to drop straight into a RAG system or prompt. It also handles the tedious parts of scraping — JavaScript execution, dynamic loading, and following links across a whole domain — through a single endpoint. For a deeper feature breakdown, see our full Firecrawl review.
Why Use Firecrawl for Web Scraping?
The biggest cost in scraping is not writing the first script — it is maintaining hundreds of them as sites change. Firecrawl removes most of that burden by abstracting the page structure away entirely. You request content, not specific DOM nodes, so a redesign rarely breaks your pipeline.
It also solves the JavaScript problem out of the box. Modern sites built with React, Vue, or Next.js render content after the initial HTML loads, which defeats simple HTTP scrapers. Firecrawl runs a real headless browser so dynamic content appears just as it would for a human visitor.
Finally, it scales. A single call can crawl an entire documentation site or blog, returning every page as clean markdown — a task that would take days to build reliably by hand. Compared with stitching together your own stack, it is one of the most efficient options among the best web scraping APIs available today.
Getting Started: Setting Up Firecrawl
You can be making real scrape calls within five minutes. Here is the setup, step by step.
1Get Your API Key
Create a free account on the Firecrawl dashboard and copy your API key from the settings page. The free tier includes a pool of credits — enough to test scraping and small crawls before you commit to a paid plan. Keep the key in an environment variable rather than hard-coding it.
2Install the SDK
Firecrawl offers official SDKs for Python and Node.js, plus a plain REST API. Install whichever matches your stack:
; Python
pip install firecrawl-py
; Node.js
npm install @mendable/firecrawl-jsBoth SDKs wrap the same endpoints, so the concepts below apply regardless of language. We will use Python for the examples.
3Make Your First Scrape
With the SDK installed and your key set, a single page scrape takes just a few lines:
from firecrawl import FirecrawlApp
app = FirecrawlApp(api_key="fc-YOUR_API_KEY")
data = app.scrape_url(
"https://example.com",
params={"formats": ["markdown", "html"]}
)
print(data["markdown"])That call returns the page as clean markdown and raw HTML, ready to store or feed to an LLM. No selectors, no browser setup, no parsing logic required.
Scraping a Single Page with the Scrape Endpoint
The scrape endpoint is the workhorse for grabbing one URL at a time. Beyond format selection, you can control how the page is fetched — waiting for dynamic content, taking screenshots, or returning only the main article body.
data = app.scrape_url(
"https://news.example.com/article",
params={
"formats": ["markdown"],
"onlyMainContent": True,
"waitFor": 2000
}
)Here onlyMainContent strips out menus, footers, and ads, while waitFor pauses two seconds so JavaScript-loaded content finishes rendering. These two options alone solve the majority of "my scraper returns empty content" problems.
Crawling an Entire Website
When you need more than one page, the crawl endpoint follows links across a domain and returns every page it finds. This is ideal for ingesting documentation, knowledge bases, or an entire blog into a dataset.
crawl = app.crawl_url(
"https://docs.example.com",
params={
"limit": 100,
"scrapeOptions": {"formats": ["markdown"]}
}
)
for page in crawl["data"]:
print(page["metadata"]["sourceURL"])The limit caps how many pages are crawled so you do not burn credits unexpectedly. Firecrawl handles the link discovery, queuing, and deduplication internally, so you get a tidy list of pages back without managing the crawl frontier yourself.
Extracting Structured Data with AI
The real power move is asking Firecrawl for structured JSON instead of raw text. You define a schema, and its extraction engine pulls matching fields from the page — perfect for product prices, job listings, or contact details.
from pydantic import BaseModel
class Product(BaseModel):
name: str
price: float
in_stock: bool
data = app.scrape_url(
"https://store.example.com/item/123",
params={
"formats": ["json"],
"jsonOptions": {"schema": Product.model_json_schema()}
}
)
print(data["json"])Instead of writing parsing rules for each site, you describe the shape of the data you want and let the model find it. This makes scraping resilient to layout changes — the schema stays the same even when the page markup shifts entirely.
Best Proxies to Pair with Firecrawl
Firecrawl handles rendering and extraction, but for high-volume or geo-targeted scraping you will still want your own proxy pool to avoid rate limits and access region-locked content. Pairing Firecrawl with quality residential proxies keeps your success rate high at scale. Here are three providers from our proxy directory that pair well.
1Decodo
Decodo offers a massive 115M+ IP pool across 195 countries with a clean, developer-friendly dashboard. Its residential network is reliable for large crawls, and granular geo-targeting lets you scrape localized pricing or search results that change by region.
For Firecrawl users, Decodo is a strong default: easy authentication, generous session control, and competitive per-GB pricing make it simple to route your scrape traffic through fresh IPs and avoid the rate limits that single-IP scraping inevitably hits.
2IPRoyal
IPRoyal is known for its non-expiring residential traffic, which is ideal for irregular scraping jobs where you buy data once and use it over months. With 32M+ IPs in 195 countries, it covers virtually any geo you need to target.
The pay-as-you-go model suits hobbyists and small teams pairing proxies with Firecrawl for occasional crawls, since you are not locked into a monthly commitment. Sticky sessions help when a target site expects consistent behavior across multiple requests.
3Oxylabs
Oxylabs is the enterprise choice, with a 102M+ residential IP pool and infrastructure built for serious, large-scale data collection. Its network reliability and high success rates justify the premium for teams running mission-critical pipelines.
When you graduate from testing to scraping millions of pages, Oxylabs paired with Firecrawl gives you the throughput and stability to do it without constant babysitting. Its advanced targeting and dedicated support are valuable when scraping the most heavily protected sites.
Common Mistakes to Avoid When Scraping with Firecrawl
Firecrawl removes a lot of complexity, but a few habits still separate smooth pipelines from frustrating ones. Avoid these and you will save credits and debugging time.
1Forgetting to set a crawl limit
Launching a crawl without a limit on a large site can consume your entire credit balance in one run. Always start with a small limit, inspect the output, then scale up. Treat the first crawl of any new domain as a reconnaissance run rather than a full extraction.
2Ignoring JavaScript wait times
If a page returns empty or partial content, the dynamic elements likely had not finished loading. Use the waitFor parameter to give scripts time to render. Beginners often blame the tool when the real fix is a one-line timeout adjustment to match the site's loading behavior.
3Skipping structured extraction
Pulling raw markdown and then writing regex to parse it defeats the purpose. If you need specific fields, define a schema and use JSON extraction from the start. It is more resilient to layout changes and far less code to maintain than post-processing text by hand.
4Scraping at scale without proxies
Hammering a target from a single IP invites rate limits and blocks. For anything beyond light use, route traffic through residential proxies and respect reasonable request rates. This is especially true for Cloudflare-protected sites, where IP reputation heavily influences success.
5Disregarding a site's terms and robots.txt
Just because you can scrape a page does not always mean you should. Check the site's terms of service and robots.txt, avoid collecting personal data without a lawful basis, and throttle your requests so you do not degrade the target's performance for real users.
Tips for Scraping Any Website Successfully
- Start with the scrape endpoint to understand a single page before launching a full crawl.
- Use
onlyMainContentto strip boilerplate and keep your output focused and token-efficient. - Define schemas early so structured data stays consistent even when site layouts change.
- Pair with rotating proxies for volume — see our proxy provider directory for residential options that scale.
- Cache results to avoid re-scraping unchanged pages and wasting credits on every run.
Real-World Use Cases for Firecrawl
Understanding what Firecrawl is good at helps you decide where it fits in your stack. These are the scenarios where it consistently outperforms a hand-rolled scraper.
1Building RAG and AI knowledge bases
Because Firecrawl returns clean markdown by default, it is a natural fit for retrieval-augmented generation pipelines. You can crawl an entire documentation site or knowledge base, chunk the markdown, and embed it into a vector database without writing any HTML-cleaning code. This is the single most popular use case among AI developers today.
2Price and product monitoring
E-commerce teams use Firecrawl's JSON extraction to pull product names, prices, and stock status on a schedule. Paired with rotating proxies for geo-specific pricing, it becomes a reliable competitive-intelligence engine that survives the frequent layout changes retail sites are known for.
3Lead generation and research
Sales and research teams scrape directories, company pages, and listings to build prospect databases. A defined schema lets Firecrawl pull contact fields and company details consistently across thousands of pages, turning days of manual copy-paste into a single automated job.
4Market research datasets
Analysts assemble large datasets from public listings, reviews, and directories to study pricing trends and consumer sentiment. Firecrawl's schema-based extraction keeps these datasets clean and consistent across thousands of pages, so the data is analysis-ready the moment a crawl finishes rather than after hours of manual cleanup.
5Content aggregation and SEO
Marketers aggregate articles, monitor competitor blogs, and track SERP changes by crawling target sites regularly. For large jobs that span many domains, combine Firecrawl with the techniques in our guide to bypassing Cloudflare when scraping to keep access reliable on protected targets.
Frequently Asked Questions
Conclusion
Firecrawl makes scraping any website dramatically simpler by turning messy, JavaScript-heavy pages into clean markdown or structured JSON through a single API. You can scrape one page, crawl an entire domain, or extract schema-defined data with just a few lines of code — and skip most of the maintenance that breaks traditional scrapers.
For serious, large-scale work, pair Firecrawl with reliable residential proxies and follow good scraping etiquette to keep your success rate high and stay on the right side of site policies. Ready to build your pipeline? Read our full Firecrawl review, explore proxy options in our provider directory, or brush up on web scraping with Python to round out your toolkit.


