Running Headless Chrome with Puppeteer and Docker

TAGS: DOCKER BUILD

Headless Chrome has been published by Google’s Chrome team, causing a lot of excitement all across the board and immediately causing major shifts in the tooling landscape, such as a complete stop in the development of PhantomJS.

Puppeteer is the Node API for Headless Chrome. It can be used to control Headless Chrome over the DevTools protocol.

Most things that you can do manually in the browser can be done using Puppeteer!

You can use it for crawling, scraping, automation, generating PDFs, automated testing and much much more. It’s an amazing tool, and will probably become the de-facto standard for most of the above tasks.

The Goal

In this post, I’d like to introduce my take on a quick dockerized Puppeteer development environment, to work with the Headless Chrome Node API locally.

To run Puppeteer on your machine, you’ll need to install all kinds of libraries and packages. This may lead to a bit of fumbling around, especially if you are on a less-well-supported OS and have to find out the names of all packages by yourself, just to discover that you need to apply all kinds of fixes to make it work properly.

That’s time you can save, and start working with Puppeteer and Headless Chrome right away instead! Don’t clutter your system, just type in a single command and skip the boring setup part.

Docker makes it possible. The following is the result of creating a very quick prototyping environment. You can find the final code in this GitHub repository. It’s meant for development only, and should be treated with caution :)

Starting Out

You’ll need to have Docker installed, as well as Docker Compose.

After cloning the GitHub repository, issue a

$ make rebuild

to build the images and bring up the container. By default, the containers run in the foreground, so you’ll need to switch to a new terminal tab. The image is based on the official Node image, and contains additional libraries and tools which are needed by Puppeteer and Headless Chrome.

Once the container is up, you can enter it with:

$ make enter

The only thing to do, is to install the Node packages with

$ yarn install

and you can run the example app with

$ node index.js

Understanding the Setup

The Dockerfile simply defined what additional libraries, apart from Node itself are needed.

FROM node:8

RUN apt-get update

# for https
RUN apt-get install -yyq ca-certificates
# install libraries
RUN apt-get install -yyq libappindicator1 libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libnss3 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6
# tools
RUN apt-get install -yyq gconf-service lsb-release wget xdg-utils
# and fonts
RUN apt-get install -yyq fonts-liberation

What glues everything together, and configures the containers is the docker-compose.yml file.

version: '3'

services:
  puppeteer:
    build: .
    volumes:
    # mount code and output directories into the container
      - ./output:/app/output
      - ./code:/app/code
    working_dir: /app/code
    shm_size: 1gb #512M
    # just run the container doing nothing
    entrypoint: ["sh", "-c", "sleep infinity"]

Among others, it makes sure that the local code directory gets mounted into the container, so you can edit everything as you’re used to with your editor of choice, but the project is immediately available inside of the container.

The Node dependencies, reside in the code folder as well, and are ignored by Git. This means, that restarting containers does not require reinstalling everything from scratch, keeping the startup time low and iteration cycles as quick as possible.

To make sure that results don’t mess with the code, an extra folder for outputs is created and mounted, so you can view the output of your code on the local machine without copying it out of the container.

Ways Forward

Right now, docker-compose.yml only defines a single service. We might add a database or message queue - any supporting services which our code needs.

The current Node code, does not require many additional packages or libraries. You can install anything you like, to create a really cool project. If you need system libraries or tools, simply add them to the Dockerfile and execute a

$ make rebuild

followed by a new

$ make enter

session.

Of course, the current setup is not meant for anything but development environments. There’s an emphasis on quickly iterating on the project and making changes immediately. For a production setup, you’d need to put in a bit more work to make sure that you’re running everything according to security requirements, best practices and in a practical fashion which makes maintenance possible.

In Conclusion

This post introduced my take on a quick dockerized Puppeteer development environment, to work with the Headless Chrome Node API locally. Clone the project and try Headless Chrome and Puppeteer yourself!

As stated above, this is my take on a quick dockerized version, which is meant solely for development. Headless Chrome is started disregarding possible security issues, and your code will run as root within the container. Use it with caution!

Subscribe to my newsletter!
You'll get notified via e-mail when new articles are published. I mostly write about Docker, Kubernetes, automation and building stuff on the web. Sometimes other topics sneak in as well.

Your e-mail address will be used to send out summary emails about new articles, at most weekly. You can unsubscribe from the newsletter at any time.

Für den Versand unserer Newsletter nutzen wir rapidmail. Mit Ihrer Anmeldung stimmen Sie zu, dass die eingegebenen Daten an rapidmail übermittelt werden. Beachten Sie bitte auch die AGB und Datenschutzbestimmungen .

vsupalov.com

© 2024 vsupalov.com. All rights reserved.