Using Puppeteer in Google Cloud Functions
This is part of a Google Cloud Functions Tutorial Series. Check out the series for all the articles
Ever since I heard the term headless Chrome, I have been curious about what that exactly means and the kind of applications that it can help write. Recently I checked out an excellent talk by Eric Bidelman from Google IO 2018 titled “The power of Headless Chrome and browser automation”. I recommend you watch this video for the first half atleast to understand what headless Chrome is and how it works.
In summary, headless Chrome is :
- Allows you to run Chrome in a headless environment (without the visible UI shell)
- Very useful for automated testing
- Allows you to do several things with web page i.e. query specific elements, take screenshot, create a PDF, etc.
For more information, you can check out the following article:
Support for headless Chrome in Google Cloud Functions environment
When Google Cloud Functions was first released, the only runtime that it supported was Node.js version 6 and the OS was missing several packages that made it difficult to run Chrome in this fashion.
A couple of months back came the announcement that headless Chrome support was now available in App Engine standard and Cloud Functions. This was made possible by the release of Node.js 8 runtime on App Engine standard and which was the same runtime used for Google Cloud Functions too. Check out the official blog post that announced it:
Enter Puppeteer
To make things dead simple for developers, we have a npm package called Puppeteer that makes working with headless Chrome a breeze. The default installation even comes bundled with a version of Chromium, so that it is self-contained and has everything to get you started.
To install Puppeteer, simply use the following command:
npm install --save puppeteer
If you would like to learn more about Puppeteer, check out https://pptr.dev/. There is even a Puppeteer playground at https://try-puppeteer.appspot.com/
Our Google Cloud Function : Love is Comic
While there are multiple ways in which one could have done this, I wanted to try this out with Puppeteer and see how it goes.
Our Cloud Function will return just the Comic Strip from the page at https://loveiscomix.com/random. In short it will return HTML content that will contain just the comic strip, an example of which is shown below:
Let us check out the index.js
file that contains our Cloud Function:
Let us understand the key pieces of the code and before we jump to that, what I am trying to do is the following:
- Launch Chrome in headless mode.
- Visit the Random Comic page at https://loveiscomix.com/random.
- Extract out the specific comic strip image via the DOM query selector.
- Return back the Image URL that I got from (3) above and send back a simple HTML page that contains the
img
HTML element with thesrc
attribute set to the comic strip image value.
Steps (1) + (2) + (3) happen via the following code snippet via the Puppeteer package:
The package.json
file is standard stuff and it contains the dependency for our puppeteer
package.
{
"name": "loveiscomic",
"version": "1.0.0",
"description": "LoveIsComic Puppeteer Script",
"dependencies": {
"puppeteer": "^1.9.0"
}
}
Deploying the Cloud Function
Ensure that both index.js
and package.json
file that we have created above are present in the same directory.
We can use the gcloud functions
command to deploy the function as shown below. Note that we will be having a HTTP Trigger for our cloud function, the runtime will be Node version 8 and we will be giving it ample memory (more on that later in the post):
$ gcloud functions deploy --region=us-central1 --runtime=nodejs8 \
--memory=1024MB --trigger-http sendComic
Once it is deployed, you can check that the sendComic
function is available via the gcloud functions list
command.
Executing our Cloud Function
You can get the details on the functions via the describe
command as shown below:
$ gcloud functions describe sendComic
The output from the above command will contain the httpsTrigger
property, an example of which is shown below:
httpsTrigger:
url: https://<region>-<projectid>.cloudfunctions.net/sendComic
In your case, the <region>
and projectid
above will contain the appropriate values. Simply invoke the above URL in your browser and it should invoke the HTTP Trigger based Google Cloud Function sendComic
and give you a random Love Is Comic strip. This is what I got:
Few points to note
I hit a few blocks while trying to run this Google Cloud Function. I erred in not paying too much attention to the official blog post and learnt a couple of things about deploying Google Cloud Functions that using Puppeteer.
Chrome in Sandbox mode
When I first wrote the function, I launched the headless browser as shown below:
const browser = await puppeteer.launch();
This resulted in an exception as shown below:
Error: Failed to launch chrome! [1020/082529.501520:ERROR:zygote_host_impl_linux.cc(89)] Running as root without — no-sandbox is not supported.
This was corrected by providing the following flag--no-sandbox
while launching Chrome:
const browser = await puppeteer.launch({args: [‘--no-sandbox’]});
Allocate more memory to your function
I deployed the function with the default memory allocated to it i.e. 256MB and that is definitely not enough. I got the following error during function execution:
Error: memory limit exceeded. Function invocation was interrupted.
I had to deploy the function with --memory=1024MB
option while using the gcloud functions deploy
command.
Do keep in mind that allocating more memory to your function is definitely going to reflect in your costs.
Hope you enjoyed this article. Do share what you plan to run with Puppeteer.