Puppeteer
Puppeteer is a Node.js library by Google that controls Chrome and Chromium through the DevTools Protocol. It is popular for scraping JavaScript-rendered pages and automating Chrome tasks.
Definition
Puppeteer is an open-source browser automation library maintained by Google's Chrome team. It drives Chrome and Chromium (with experimental Firefox support) via the Chrome DevTools Protocol, and is one of the most widely used tools for web scraping dynamic sites and automating browser workflows.
Language support
Puppeteer is primarily a Node.js (JavaScript/TypeScript) library. A community port called Pyppeteer exists for Python, but the JavaScript ecosystem is the most mature and actively maintained.
How it works and why it matters
Puppeteer launches a full headless browser that runs JavaScript and renders the DOM, so it can scrape content that plain HTTP requests cannot reach. It supports clicking, typing, navigation, screenshots, and PDF generation.
- Proxy support: Pass
--proxy-server=host:portin launch args; proxy authentication is handled withpage.authenticate(). - Stealth: The puppeteer-extra-plugin-stealth package helps reduce automation fingerprints and bot detection.
Routing Puppeteer through rotating residential proxies lets you scrape at scale while avoiding IP-based rate limiting and bans.
Examples
Launching with a proxy: puppeteer.launch({ args: ['--proxy-server=http://proxy:8000'] })
Authenticating a proxy via page.authenticate({ username, password })
Using puppeteer-extra-plugin-stealth to evade bot detection while scraping
Common Use Cases
Frequently Asked Questions
Keep Learning
All termsWeb Scraping
Web scraping is the automated extraction of data from websites — fetching pages programmatically and parsing their content into structured data.
Read definitionAnti-Detect Browser
An anti-detect browser lets you run many isolated browser profiles, each with its own fingerprint, cookies and proxy, so sites see them as separate, genuine users.
Read definitionRate Limiting
Rate limiting restricts how many requests a client can make in a given time, and it is one of the most common defenses scrapers must work around.
Read definitionHeadless Browser
A headless browser is a real browser that runs without a visible interface, controlled by code — the workhorse for scraping JavaScript-heavy sites and automation.
Read definition