Gain web performance insights with GCP Cloud Functions and Puppeteer

David Verdejo
bluekiri
Published in
5 min readSep 9, 2019

It has been a long time since my last story in Medium…

source: https://www.bradycarlson.com

But the wait is over. I keep working with my previous story: Create a multiregional http monitor in less than five minutes with google cloud function and this time I will be using Puppeteer and Cloud Function to gain web performance insights for our web sites.

Puppeteer

Puppeteer is “a Node library which provides a high-level API to control Chrome or Chromium over the DevTools Protocol. Puppeteer runs headless by default, but can be configured to run full (non-headless) Chrome or Chromium.”

The main benefits of using Puppeteer are:

  • Faster than Selenium
  • Easy to install and update
  • Maintained by Google

The cons are:

  • Only support Chrome tests
  • Only works with Node.js

If you want to try it online, go to https://try-puppeteer.appspot.com and have fun.

We can use Puppeteer to:

  • Rendering
  • Web scraping
  • Web performance
  • Testing

We are going to focus on the web performance of our web sites. We want to use JavaScript APIs in the Puppeteer to gather statistics on how our site perform for real user from different locations. Two specific APIs measure how fast documents and resources load for users:

  • Navigation Timing collects performance metrics for HTML documents.
  • Resource Timing collects performance metrics for document-dependent resources. Stuff like style sheets, scripts, images, etcetera.

For example, the output of the navigation timing could be:

{
"connectEnd": 152.20000001136214,
"connectStart": 85.00000007916242,
"decodedBodySize": 1270,
"domComplete": 377.90000007953495,
"domContentLoadedEventEnd": 236.4000000525266,
"domContentLoadedEventStart": 236.4000000525266,
"domInteractive": 236.2999999895692,
"domainLookupEnd": 85.00000007916242,
"domainLookupStart": 64.4000000320375,
"duration": 377.90000007953495,
"encodedBodySize": 606,
"entryType": "navigation",
"fetchStart": 61.600000015459955,
"initiatorType": "navigation",
"loadEventEnd": 377.90000007953495,
"loadEventStart": 377.90000007953495,
"name": "https://example.com/",
"nextHopProtocol": "h2",
"redirectCount": 0,
"redirectEnd": 0,
"redirectStart": 0,
"requestStart": 152.50000008381903,
"responseEnd": 197.80000008177012,
"responseStart": 170.00000004190952,
"secureConnectionStart": 105.80000001937151,
"startTime": 0,
"transferSize": 789,
"type": "navigate",
"unloadEventEnd": 0,
"unloadEventStart": 0,
"workerStart": 0
}

The following figure illustrates the timing attributes defined by the PerformanceTiming interface and the PerformanceNavigation interface with or without redirect, respectively.

source: https://www.w3.org

Cloud Functions + Puppeteer = Perfect match

We are going to create a new Cloud Function. We need enough memory to execute Puppeteer and we are going to trigger the execution via HTTP and the most important field is the runtime (we have to select Node.js 8 to execute Puppeteer)

Next, we need to setup of dependencies in our package.json file

{
"name": "puppeteer",
"version": "0.0.1",
"dependencies": {
"puppeteer": "^1.8.0"
}
}

And put the code in the index.js:

const puppeteer = require('puppeteer');const openConnection = async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox']});
const page = await browser.newPage();
await page.setUserAgent(
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/74.0.3729.169 Chrome/74.0.3729.169 Safari/537.36 - QA testing'
);
//set screen resolution: await page.setViewport({ width: 1680, height: 1050 });
return { browser, page };
};
const extractDataFromPerformanceTiming = (timing, ...dataNames) => {
const navigationStart = timing.navigationStart;
const extractedData = {};
dataNames.forEach(name => {
extractedData[name] = timing[name] - navigationStart;
});
return extractedData;
};
const closeConnection = async (page, browser) => {
page && (await page.close());
browser && (await browser.close());
};
exports.webperformance = async (req, res) => {
const url = req.query.url;
if (!url) {
return res.send('Please provide URL as GET parameter, for example: <a href="?url=https://example.com">?url=https://example.com</a>');
}
let { browser, page } = await openConnection();
try {
await page.goto(url, { waitUntil: 'load' });
const performanceTiming = JSON.parse(
await page.evaluate(() => JSON.stringify(window.performance.timing))
);
const data = extractDataFromPerformanceTiming(
performanceTiming,
'responseEnd',
'domInteractive',
'domContentLoadedEventEnd',
'loadEventEnd'
);
res.status(200).send(data);
}
catch (err) {
res.status(500).send(err.message);
} finally {
await closeConnection(page, browser);
}
};

Finally, select the function to execute, in this case webperformance:

Before move ahead, we are going to explain the code. The first step is to require puppeteer:

const puppeteer = require('puppeteer');

Next we are going to move to the main function, webperformance. Firstly, we are going to check if we have a url to check and if we have one, we are going to call openConnection to create a Puppeteer instance. Inside, we can setup the arguments to launch puppeteer (Note: — no-sandbox is required in this environment) and we can force to use an user-agent (remember: only works with Chrome) and set the screen resolution for our test:

const openConnection = async () => {
const browser = await puppeteer.launch({args: ['--no-sandbox']});
const page = await browser.newPage();
await page.setUserAgent(
'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/74.0.3729.169 Chrome/74.0.3729.169 Safari/537.36 - Bluekiri QA'
);
//set screen resolution: await page.setViewport({ width: 1680, height: 1050 });
return { browser, page };
};

After Puppeteer is ready, then we going to request the page and we are examine the navigation timing performance. Finally, we are going to filter the fields received and focus on some fields (“as you wish…”)

await page.goto(url, { waitUntil: 'load' });
const performanceTiming = JSON.parse(
await page.evaluate(() => JSON.stringify(window.performance.timing))
);
const data = extractDataFromPerformanceTiming(
performanceTiming,
'responseEnd',
'domInteractive',
'domContentLoadedEventEnd',
'loadEventEnd'
);
res.status(200).send(data);

Finally, close the Puppeteer instance to clean our environment

   await closeConnection(page, browser);

Let’s try our function. Go to Trigger tab, and open the provided url in a new browser tab

And then: https://<gcp_region>-<project_id>.cloudfunction.net/<cloud_function>?url=<page to test>

One more thing…

We can send our test results to Cloud PubSub and then use Cloud Dataflow to read from the Pubsub and store the results on Cloud BigQuery and finally create a report on Data Studio and share with our teams.

First of all, we need to modify our package.json to include pubsub libraries:

{
"name": "puppeteer",
"version": "0.0.1",
"dependencies": {
"puppeteer": "^1.8.0",
"@google-cloud/pubsub": "^0.30.1"
}
}

And then a message with the results:

const data = extractDataFromPerformanceTiming(
performanceTiming,
'responseEnd',
'domInteractive',
'domContentLoadedEventEnd',
'loadEventEnd'
);

const message = JSON.stringify(data);
const dataBuffer = Buffer.from(message);
const messageId = await pubsub.topic(topicName).publish(dataBuffer);
console.log("Send to PubSub...");

After that, you can use the a Dataflow template to save the results to BigQuery

And that’s all. Don’t miss the next episode of the Bluekiri’s team

--

--