Puppeteer’ing in Firebase & Google Cloud Functions

Earlier this year, the Google Cloud team announced support for the Node.js 8 runtime in Google App Engine Standard. I had been stoked about this for a while because most of my server work these days happens in Node. More importantly, I had been eager to run Puppeteer and headless Chrome in the cloud…at Google scale!

The headless “browser as a service”

I got Paul Kinlan excited about the idea of running the browser in a web server. He tends to get excited about many things, but this notion of the “Browser as a Service” (BaaS) really geeked us both out. Imagine being able to set up URL endpoints that use headless Chrome and leverage features in the browser. You could do all sorts of neat stuff. For example, you could create a PDF generation web service or a handler that takes screenshots of a web page by passing it a ?url parameter. The list is truly endless!


Shortly after our geek-out came coding. We put together a POC site that shows how to use headless in the cloud. It’s called “Puppeteer As A Service” https://pptraas.com/ and has lots of examples + shows off what the browser can do when you bake it into a backend. The site runs on Google App Engine Standard (Node).

Some of our endpoint examples:

  • creating screenshots
  • generating PDFs
  • pre-rendering JS apps
  • producing a DevTools timeline traces
  • collecting performance metrics.

Check out the full source on Github.


Running headless Chrome in a Cloud Function

For the longest time, you couldn’t run headless Chrome / Puppeteer inside a Google Cloud Function. That’s because their environment was using Node 6 and the Linux runtime was missing the required OS packages needed by Chromium.

Although Puppeteer can run in Node 6 without transpiling, the missing OS dependencies made it impossible to run headless in Cloud Functions.

Fast forward to today…things have improved! Now, Google Cloud Functions (GCF) and Cloud Functions for Firebase (FCF) both use the same Node.js 8 runtime as App Engine Standard. This means that you can write cloud functions that use Puppeteer and headless Chrome, and seamlessly author serverless apps that utilize all the features in a web browser!

Using Puppeteer in Google Cloud Functions

Simply install Puppeteer from npm (npm i puppeteer) and use it. The library comes with everything you need and works out of the box on App Engine, GCF, and FCF.

Puppeteer is an ideal way to control headless Chrome in environments like Google Cloud Functions and Cloud Functions for Firebase because you spend no time configuring Chrome (and its required dependencies) and more time writing your own code.

The key to running Puppeteer in GCF is to include it as a dependency in your package.jsonand deploy the functions using the --runtime nodejs8flag. For example:

gcloud beta functions deploy screenshot \
--trigger-http --runtime nodejs8 --memory 1024MB

The rest remains the same as before: write your cloud functions and deploy.

Using Puppeteer in Firebase Functions

Another option is to use Cloud Functions for Firebase. It’s slightly nicer to use than GCF but for all intents and purposes, they’re the same thing under the hood.

The key to running Puppeteer in a Firebase function is to specify Node 8 as the runtime in package.json. My functions/package.json looks like this:

{
"name": "myfunctions",
"version": "0.0.1",
"engines": {
"node": "8"
},
"dependencies": {
"express": "4.16.3",
"firebase-admin": "5.13.1",
"firebase-functions": "2.0.2",
"puppeteer": "1.7.0"
}
}

That’s it!

Example

The example below sets up a little Express server and a few endpoints to handle requests. One is a screenshot service and the other simply prints the version of Chrome being used by Puppeteer.

Main functions/index.jsfile:

const express = require('express');
const functions = require('firebase-functions');
const puppeteer = require('puppeteer');
const app = express();
// Runs before every route. Launches headless Chrome.
app.all('*', async (req, res, next) => {
// Note: --no-sandbox is required in this env.
// Could also launch chrome and reuse the instance
// using puppeteer.connect()
res.locals.browser = await puppeteer.launch({
args: ['--no-sandbox']
});
next(); // pass control to next route.
});
// Handler to take screenshots of a URL.
app.get('/screenshot', async function screenshotHandler(req, res) {
const url = req.query.url;
  if (!url) {
return res.status(400).send(
'Please provide a URL. Example: ?url=https://example.com');
}
  const browser = res.locals.browser;
  try {
const page = await browser.newPage();
await page.goto(url, {waitUntil: 'networkidle2'});
const buffer = await page.screenshot({fullPage: true});
    res.type('image/png').send(buffer);
} catch (e) {
res.status(500).send(e.toString());
}
await browser.close();
});
// Handler that prints the version of headless Chrome being used.
app.get('/version', async function versionHandler(req, res) {
const browser = res.locals.browser;
res.status(200).send(await browser.version());
await browser.close();
});
const opts = {memory: '2GB', timeoutSeconds: 60};
exports.screenshot = functions.runWith(opts).https.onRequest(app);
exports.version = functions.https.onRequest(app);

Lastly, you probably want to setup human readable URLs for your endpoints instead of the default ones GCF gives you. To do that, create a .firebaserc file to use Firebase Hosting and map the cloud functions to a URL path.

Here’s what that looks like:

{
"hosting": {
"public": "public",
"cleanUrls": true,
"rewrites": [ {
"source": "/version",
"function": "version"
}, {
"source": "/screenshot",
"function": "screenshot"
}],
"ignore": [
"firebase.json",
"**/.md",
"**/node_modules/**"
]
},
"functions": {
"source": "functions"
}
}

This gives users a nicer URL to use:

https://pptr-functions.firebaseapp.com/screenshot vs.https://us-central1-pptr-functions.cloudfunctions.net/screenshot

and works great even if you’re using a custom domain with Firebase Hosting.

Full source code available on Github.

Additional ways to go headless on Google’s Cloud

Before we depart, it’s worth mentioning that Puppeteer is well supported on a number of different platforms across Google’s cloud. They range from running on a low-level environment with complete control over a VM (GCE), to fully managed VMs (GAE Flex), to pure serverless platforms like GCF. It’s kind of daunting so here’s a breakdown:

  1. Compute Engine (GCE) — full control over a VM. Deploys take minutes.
  2. App Engine Flex (Node.js) — Dockerized container with full benefits of App Engine (e.g. auto scaling). You can use any Node version you want. Deploys take minutes. Example app.
  3. App Engine Standard (Node.js 8) — Regular App Engine (free tier, scaling benefits, etc). Uses Node 8. Deploys take seconds. Example app.
  4. Cloud Functions / Cloud Functions for Firebase — serverless environment. Can use Node 8 by using engines in package.json. Deploys take minutes.

More resources

For more information, check out these resources.

Documentation & source

My articles on headless Chrome:

Like what you read? Give Eric Bidelman a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.