Single Page Applications, commonly referred to as SPAs, are a way to reduce page load and increase user experience. This is achieved by not reloading the page. Absence of page reload ensures that, after the initial load, only data is send over the wire. The result is a drastic decrease in time to navigate and display data.
Our website, Forcit.be, is a Single Page Application written in Angular 4. After some analysis, our team found out that Googlebot was poorly crawling the website. I would like to share some methods for SEO on SPAs with you. We will take a look at Angular 4 in particular followed by a general method of providing indexable pages.
SEO in Angular 4
Angular has its own way of dealing with SEO called Angular Universal. This technique pre-renders HTML on the server and transmits it to the client. Server-side rendering ensures that the spider crawling your page receives old-fashioned HTML, which they are build to handle.
There are some disadvantages to this method:
- The server hosting your website should have Node enabled. Most hosting providers don’t offer Node yet.
- Checks on the environment should be made before accessing the window, document and navigator variables as these variables only exist in the browser.
There were some problems implementing this into our website. Therefore, and for the sake of experimentation, I started my search for a method to provide indexable pages.
A general method for indexable pages
The requirements for my solution were:
- No changes to the current codebase
- Low load & response time for serving the spiders
- And last but not least, serve an easily indexable page
Earlier in this post I have noted that spiders are build to process HTML. Therefore, the decision to serve static HTML was easily made. I had to serve static HTML to the spiders and present the SPA to normal users.
Making a distinction between spiders and normal users can be done via an HTTP header that is sent along with requests called the user agent. This string is used by the spider to ID itself. Examples of spider’s user agent are:
- Google: googlebot
- Bing: bingbot
- DuckDuckGo: duckduckbot
- Facebook: facebookexternalhit
Generating static HTML
I was feeling a little bit worried about this endeavor until I decided to look into headless browsers. A headless browser behaves like a normal browser, but does not have a graphical user interface. While this makes it pointless for normal users, this is ideal for testing as the processing overhead for rendering the view is absent.
With renewed courage I found Puppeteer, a Node library to control headless Chrome. This library was exactly what I needed. It allows surfing to a page and extracting its HTML after the SPA finished rendering. All you need is the routes to all your pages.
The code beneath visits all supplied routes and stores them into the following tree.
│ ├── full-product.html
│ ├── ideation.html
│ ├── mvp.html
│ └── prototyping.html
│ ├── x.html
│ └── y.html
We now have an HTML file for each route of our application. However, we are not done. The HTML we extracted still contains the SPA’s bootstrap code. This code needs to be removed for the page to be static as it can try to bootstrap but fail, leaving us a blank page.
Removing bootstrap code
The following approach works for Angular 4, your mileage may vary.
To remove bootstrap code, we need to remove the script tags containing the code. This can be done via cheerio, an implementation of jQuery for server side usage.
Serving the static HTML
All that is left now is to serve the static HTML to the spiders. This can be done using a
.htaccess file, a configuration file that is used by Apache which is the most common web server provided by hosting services.
Assuming the static files are located in
/static you can use the following configuration.
The example does not contain all the spiders out there, but the most common ones are present.
Testing the solution
There are multiple ways to change your browser’s user agent. Chrome has a built-in setting to change it using the Chrome Developer Tools. This setting is located in
settings > more tools > network conditions.
The scraping of the routes and rendering of the static HTML is integrated in our Continuous Integration pipeline. A change to the website’s content will queue the static HTML build process. That being said, CI is a topic for another post.
Learned something? Click the 👏 to say “thanks!” and feel free to share stories with your network and be part of our journey to inspire people, adventuring the future together and have a lasting impact.