Implementing a pre-renderer for SPA’s (No, it’s not rocket science)

Published in

Frontend Weekly

4 min readDec 2, 2017

A current problem you will find when building a SPA is SEO. As the html is dinamically rendered in the front, web crawlers (also called spiders) can have trouble parsing your website html. Is true, now the google bot can lead with javascript, but it’s also true that it’s not perfect, and by experience I can tell you that in most of cases you will see a blank page in the “fetch as Google” tool. A popular solution is to use server side rendering. It’s a good option if you are using a framework that support it, but if you need to build a generic and transversal solution, or if the cost of implementing SSR is too high, then you should consider to use a pre-renderer instead.

What is a website pre-renderer?

A website pre-renderer is a tool capable to render your website as the final user will see it and save the result as html or even as a pdf or image file. So, it is basically like running a web browser server-side.

No, a pre-renderer is not rocket science as some people say, and I will show you how easy is to implement one by yourself using Puppeteer.

Puppeteer

In this article we will be using Puppeteer. Puppeteer is a Headless Chrome Node API created maintained by the Chrome DevTools team. I tested other libraries available, but Puppeteer is the one I liked the most.

Generating and saving snapshots

With puppeteer you can do all things you can do in a web browser, open a page, navigate to an url, and then, you can access the content and even execute some javascript to modify the DOM.

So the first thing we will do is to write a small function that will fetch an url and return the rendered html. First, we need to instantiate a Puppeteer browser, this will run a chromium process and connect to it. Then we will navigate to the web page, wait until all network connection are closed, and return the html. Most of the API calls return promises, so I will be using async await everywhere.

To serve static files we need to save the rendered html in a file. I will be using fs-pathto generate the file and the path at the same time. Also I will use the url pathname to save the file in folders. So, if your website url looks like this:

https://mycoolwebsite.com/a/very/long/pathname/283749837472

The path in the file system will be:

./a/very/long/pathname/283749837472.html

For simplicity we will assume that the url doesn’t contain a query string or invalid characters. A root pathname will be saved as index.html.

Then we will call to our crawler in a loop to get all files, but instead of creating a new browser instance every time, I will pass a browser instance as parameter in the crawler function.

And then the main function:

If you already have a sitemap, you can use it to get all routes. Other approach is to get all the url’s from the website anchors. You can do it with puppeteer.

Serving the snapshots

If you are using express, you can write a small middleware to serve the files. Static files should only be sent to web crawlers, so I will use spider-detector, a small utility that includes a list of most common web crawlers user agents and an express middleware. Also, you should exclude some routes, like static files or api endpoints if you have any. I will let you that logic to you.

The express middleware should look something like this:

Performance concerns

Web crawlers are heavy tasks, you should run this in a different process. They take a lot of time also and the fetching error ratio is high. You should take in consideration all this cases to prevent serving a file that failed to be generated.

The code examples in this article are for demonstration only, real world code should have better error handling and configuration options. For example, could be interesting to remove your app script from the DOM to prevent API calls in the version you send to the web spiders. Also, you should consider url’s with query strings or characters that are not allowed by the file system.

Conclusion

SEO is a problem for SPA. Web spiders, like the google bot are not able to wait to the page to be fully rendered, so in most cases they will not see your content but a blank page instead. It’s important you send a fully rendered page, and pre-renderers are a good and easy solution to this problem, since they are framework agnostic and can be implemented everywhere.