Server side rendering with puppeteer and headless chrome— a practical guide

Dejan Blazeski
The Startup

--

TL;DR — serve JS to users, server side rendered content to bots (the source code is available here — http://bit.ly/2m6HN8w)

While building Binge, my framework of choice was VueJs (and single page application). Happy with both choices.

If you (assuming a dev) view the source code — it would only show a couple of lines of code — some meta tags and an almost empty <body> tag with a root div and a bundled js file — so when people visit it, it’s a typical website, browsers download the bundle and render the page.

When Google (and other search engines or social networks) parses it, it’s that (almost) empty <body />.

Google understands JS, just needs a couple of days to index and process it (as this google’s webmaster videos says)

The solution I used: server side rendering with puppeteer and google chrome. My goal for implementing SSR was:

  • When people share a link on a social media platform, the link & html should be pre-rendered so a pretty preview (you know, text & image) is shown instead of just text or link. And if facebook can’t parse the html, they won’t make it look pretty.
  • Search engines understand html. When bots visit the site, pre-render it and show the rendered version
  • Should be resource friendly and fast to load

The setup we will use:

  • node server (with expressjs)
  • puppeteer with reusable headless chrome instance
  • yarn package manager instead of npm in the examples

Let’s use the existing github repo to speed up the setup, explaining the important bits.

#clone the repo
git clone https://github.com/dblazeski/express-chrome-ssr.git
#install the dependencies
cd express-chrome-ssr
yarn install

Our entry file is ./src/index.js Importing express for our server, puppeteer for managing our headless chrome instance and rendering the url, and booting our app using express looks like this:

That’s the minimum setup we need for the server.

Let’s add a route that will accept url parameter/ssr?url=http://google.com Once we start the server, we will pass our url’s and get the rendered html as response.

Here’s the code:

Our /ssr route handled by express and passed to the ssr function we imported above

What happens?

  • We’re registering our /ssr?url route
  • browserWSEndpoint (was initialized = null in the first file above) is our headless chrome instance. We talked about performance — we’re reusing the same instance as it’s a lot faster — just managing new “pages” (or tab’s if you will) on it. In my tests, this saves >0.5s on a 2s total response time. The only times chrome is initialized is if it crashes or the first time we ping our server.
  • We’re calling the async function ssr which we imported above. Will go over this function in the next embed

Let’s take a look at the ssr function that actually does the render:

Our ssr function, see the github repo for the full version

What happens?

  • #1 Our async function accepts two params — the url and the existing browserWSEndpoint chrome instance.
  • #3 We init our browser and open a new page (or tab)
  • #[6–10] We wait for chrome to fetch the url and render the page. networkidle0 comes handy for SPAs that load resources with fetch requests — networkidle2 comes handy for pages that do long-polling or any other side activity.
  • #[13–18] We’re adding the base tag to make sure relative links work
  • #[20–24] Remove all scripts as they’re already executed
  • #[26–29] Get the page content and close the page (tab)

Once the render is complete, the response is passed back to our index.js file and sent back to the server #[18–20] in ssr-2.js — that’s the html we’re after!

The content can then be printed in the browser and bots can parse it 🎉

Sending the content to the browser

This final step can vary depending on your programming language / framework. I use Laravel, so the example will be in Laravel / PHP — but I’m sure it’s easy to understand.

  • Check if the visitor is a bot
  • Ping our nodejs server with the url and get the html
  • Output the html directly to the browser

A package for php that’s good for user agent detection is CrawlerDetect (and it has support for all popular frameworks).

Pseudo code example:

The source code with examples you can use is available on http://bit.ly/2m6HN8w. The repo also has ready server scripts (see package.json) you can use with nodemon or pm2.

Started using SSR in attempt to serve the content bots require for rich links preview. If you’re an avid movie fan, check out my latest project Binge.

Thanks for reading.

On an unrelated note, are your Macbook Pro animations lagging? You should try switching to the dedicated (more powerful) graphic card when on power — automate it with this app https://gum.co/mac-auto-gpu

--

--

Dejan Blazeski
The Startup

Founder binge.app. Curious dev. Laravel / Angular / VueJs / React Native / Docker. Love stand up shows.