What is covered in this tutorial?

A beginner-friendly guide to setting up a web automation and scraping framework using Puppeteer, NodeJS, and JavaScript.

Introduction

🎯 Quick Answer

Puppeteer and Node.js provide a high-performance, native-feeling automation experience for Chrome and Chromium. Unlike Selenium, Puppeteer communicates directly with the browser via the DevTools Protocol, enabling faster execution, better stability, and advanced features like network interception and PDF generation. It is the preferred choice for modern web scraping, performance analysis, and UI testing in the JavaScript ecosystem.

For a long time, I wanted to look into the Puppeteer tool which is developed by the Chrome DevTools team. Unlike Selenium, Puppeteer can perform browser actions directly.

📖 Key Definitions

Puppeteer: A Node.js library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol.
DevTools Protocol: A protocol that allows for tools to instrument, inspect, debug, and profile Chromium, Chrome, and other Blink-based browsers.
Headless Mode: Running a browser without a visible UI, which is faster and uses fewer resources, ideal for CI/CD environments.
Web Scraping: The process of using bots to extract content and data from a website.

What is Puppeteer?

Puppeteer is a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.

What can be done using Puppeteer?

Most of the operations that you can do manually in the browser can be done using Puppeteer! Here are a few examples to get you started:

Generate screenshots and PDFs of pages.
Automate form submission, UI testing, keyboard input, etc.
Create an up-to-date, automated testing environment.
Capture a timeline trace of your site to help diagnose performance issues.
Test Chrome Extensions.
It is also widely used in Web Scraping.

🚀 Step-by-Step Implementation

Prerequisites

Install Node.js and Visual Studio Code.

Initialize Project

Create a folder named puppeteer-nodejs-javascript and run npm init --yes to generate a package.json.

Install Puppeteer

Run npm install --save puppeteer to download the library and a compatible version of Chromium.

Create Spec Folder

Create a specs folder to house your automation scripts.

Write Your Script

Create a .js file using require('puppeteer') and implement an async function to launch the browser and navigate to a URL.

Execute Script

Run your script using node specs/yourfilename.js in the terminal.

Building the Framework

Let's go through the step-by-step process of creating an automation framework using Puppeteer, NodeJS, and JavaScript.

Install Node.JS
Install Visual Studio Code
Create a folder named puppeteer-nodejs-javascript.
Create a .gitignore file:

Code Snippet

node_modules/
temp/
test-results/
downloads/*
log/*

Create a default package.json file:

Code Snippet

npm init --yes

Install Puppeteer:

Code Snippet

npm install --save puppeteer

Install dev dependencies:

Code Snippet

npm install --save-dev @types/node

Creating Your First Test

Create a folder named specs. Inside it, create getPageScreenshot.js:

Code Snippet

const puppeteer = require('puppeteer');

async function getPageScreenshot() {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    await page.goto('https://scrapethissite.com/pages/forms/');
    console.log('User navigated to site');

    await page.screenshot({
        path: './screenshots/HockeyTeams.png'
    });
    console.log('Page screenshot taken');

    await browser.close();
    console.log('Browser closed');
}

getPageScreenshot();

⚠️ Common Errors & Pitfalls

Chromium Download Failure
Sometimes npm install fails to download Chromium due to network restrictions. Use PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true and point to a local Chrome instance if needed.
Zombie Processes
If your script crashes before browser.close(), Chromium processes might stay alive. Always use try...catch...finally to ensure the browser closes.
Selector Not Found
Puppeteer is fast. If you try to click an element before it's rendered, it will fail. Use page.waitForSelector() before interacting.

✅ Best Practices

✔
Use headless: false during development to see what's happening in the browser.
✔
Always use await for every Puppeteer action to maintain synchronous execution flow.
✔
Implement a proper screenshots directory and ensure it exists before running scripts that save images.
✔
Use page.setViewport() to ensure consistent rendering across different environments.

Frequently Asked Questions

Does Puppeteer support Firefox?

Yes, Puppeteer has experimental support for Firefox, but it is primarily optimized for Chrome/Chromium.

How do I handle multiple tabs?

Use browser.pages() to get an array of all open pages or browser.waitForTarget() to detect new tabs opening.

Can I use Puppeteer with Jest?

Absolutely! Puppeteer is often paired with Jest for a complete testing solution (Jest-Puppeteer).

Running the Test

To run the test, open the terminal and type:

Code Snippet

node specs/getPageScreenshot.js

Code Repository

The sample framework is hosted on GitHub: puppeteer-nodejs-javascript

Have a suggestion or found a bug? Fork this project to help make this even better.

Conclusion

Puppeteer is a game-changer for web automation. Its direct connection to the browser engine provides unparalleled speed and control. By following this guide, you've taken the first step toward mastering a tool that is essential for modern web engineering and data extraction.

📝 Summary & Key Takeaways

This guide introduced Puppeteer as a high-level API for controlling Chrome/Chromium via Node.js. we covered the fundamental definitions, provided a step-by-step setup guide for a basic automation framework, and demonstrated a practical example of capturing a page screenshot. By highlighting best practices like using waitForSelector and addressing common errors like zombie processes, this tutorial provides a solid foundation for beginners to explore the vast capabilities of Puppeteer in web testing and scraping.