SPA and SEO: Google (Googlebot) properly renders Single Page Application and execute Ajax calls

I run some test to understand how Google Search Engine handle a Single Page Application. I built the website for running the test in Elm but the same result should be valid also for React, Angular or any other language/framework.

Findings Overview

  1. Googlebot run the Javascript on the page and the Ajax calls are properly executed
  2. Googlebot waits between 5 to 20 seconds before taking a snapshot of each page
  3. The fetching done on request from the Search Console (I call these “T5”) and the “natural” fetching done by Google (I call these “T20”) are different
  4. T5 take a snapshot after around 5 seconds, T20 after around 20 seconds
  5. Different sections of the page are snapshotted at different time. For example in the T20 case, the title has always T19 and the meta-description has T20
  6. There are mysterious situations where the snapshot are taken in impossible cases. For example the snapshot is taken after 5 seconds but the page already show the result of the Ajax call that arrived after 10 seconds

Test methodology

The website used for this test is a Single Page Application that:

  • is built in Elm 0.18
  • uses pushState to navigate across pages
  • uses forward slashes for the Url structure

These website has 5 pages:

  1. http://elm-spa-seo-testing.guupa.com/
  2. http://elm-spa-seo-testing.guupa.com/section1
  3. http://elm-spa-seo-testing.guupa.com/section2
  4. http://elm-spa-seo-testing.guupa.com/section3
  5. http://elm-spa-seo-testing.guupa.com/sitemap

Pages automatically update the title and the meta-description so when Googlebot is indexing them, is possible to verify in which status they were. There are three events that change the status. These are:

  1. Time: every seconds
  2. Type A Ajax calls: these calls are initiated with several delays
  3. Type B Ajax calls: these calls are initiated all the the beginning and they replay with several delays

The delays for both Ajax types calls are set at 0, 1, 3, 6 and 10 seconds.

This is the sequence of the calls:

Sequence of Ajax calls

This is the history of the title changes. As you can see, from bottom to top, it starts from “No JS, No Ajax”, that is what Search Engines would index if they don’t execute Javascript.

Model of Application where is possible to follow the history of the Title changes

Note: this screen is from the Elm debugger. Click on the button in the lower right corner of the screen to activate it.

Results

These are some of the first results that I got:

╒═════╤══════╤════╤══════╤══════╤══════════════════════╤═══════════╕
│Vers.│ Time │Hist│ Type │ Type │ Date │ Page │
│ │ │ory │ A │ B │ │ │
╞═════╪══════╪════╪══════╪══════╪══════════════════════╪═══════════╡
│ 1 │ │ 1 │ 0 │ NaN │ 2017-08-18 │ / │
│ 1 │ │ 1 │ 10 │ 6 │ 2017-08-19 │ / │
│ 3 │ │ 1 │ 0 │ 3 │ 2017-08-20 │ / │
│ 4 │ │ 1 │ 0 │ NaN │ 2017-08-21T19:35:57Z │ / │
│ 4 │ 5 │ 1 │ 0 │ 1 │ 2017-08-24T15:32:53Z │ /section1 │
│ 4 │ 5 │ 1 │ 0 │ 1 │ 2017-08-24T15:32:57Z │ / │
│ 7 │ 5 │ 1 │ 0 │ 6 │ 2017-08-28T07:44:57Z │ / │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-08-28T00:00:00Z │ / │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-08-30T00:00:00Z │ /sitemap │
│ 7 │ 19 │ 1 │ 10 │ 10 │ 2017-09-03T00:00:00Z │ /sitemap │
│ 7 │ 20 │ 1 │ 10 │ 10 │ 2017-09-06T00:00:00Z │ /section1 │
│ 7 │ 20 │ 1 │ 10 │ 10 │ 2017-09-07T00:00:00Z │ /section2 │
│ 7 │ 5 │ 1 │ 0 │ 1 │ 2017-09-09T13:33:50Z │ /section3 │
│ 7 │ 20 │ 1 │ 10 │ 11 │ 2017-09-10T00:00:00Z │ /section2 │
╘═════╧══════╧════╧══════╧══════╧══════════════════════╧═══════════╛

Googlebot wait between 5 and 20 seconds before taking a snapshot of the page. Ajax calls result, both Type A and Type B, seem not agreeing with this assertion in case the waiting time is in the 5 seconds range.

This is an example of the search result on 24 August 2017:

You can extrapolate the data from the title or description of the page:

V5,T5,H7,A0,B3,2017-08-24T16:46:04Z,/section1
  • V5 The version of the code
  • T5 The second passed before Googlebot take a snapshot of the page
  • H7 The number of clicks (or items in the History). This number increase while browsing the site. Googlebot would probably always get “1” as value because it doesn’t “click” on links but send new http requests.
  • A0 The Type A Ajax call got only the first reply at 0 seconds
  • B3 The Type B Ajax call got the reply at 3 seconds
  • 2017–08–24T16:46:04Z Date and time when the page was indexed
  • /section1 The path of the page

I replicated the Title also in the body of the page in large font so is possible to read it also in the small previews of “Fetch as Google” in the Google Search Console. The value that show on this page are not the same on the search result. Probably Google use different program to create these snapshot compare to the one used for the search engine.

Screenshot of the “Fetch as Google” section of the Google Search Console

Mysterious impossible state

Another thing to note in the screenshot above is that the time of the snapshot is T5 but the time of the second Ajax answer was B10. This should be an impossible state of the page. It means that the screenshot was taken after 5 seconds but the Api of type B had 10 seconds of time to answer. This is a typical title history and you can see that there is no such a thing like T5 and B10 at the same time:

V7,T6,H1,A6,B6,2017-08-28T12:57:20Z,/
V7,T6,H1,A6,B3,2017-08-28T12:57:20Z,/
V7,T6,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T5,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T4,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T3,H1,A3,B3,2017-08-28T12:57:20Z,/
V7,T3,H1,A3,B1,2017-08-28T12:57:20Z,/
V7,T3,H1,A1,B1,2017-08-28T12:57:20Z,/
V7,T2,H1,A1,B1,2017-08-28T12:57:20Z,/
V7,T1,H1,A1,B1,2017-08-28T12:57:20Z,/
V7,T1,H1,A1,B0,2017-08-28T12:57:20Z,/
V7,T1,H1,A0,B0,2017-08-28T12:57:20Z,/
V7,T0,H1,A0,B0,2017-08-28T12:57:20Z,/
V7,T0,H1,A0,B[NaN],2017-08-28T12:57:20Z,/
V7,T0,H1,A[NaN],B[NaN],2017-08-28T12:57:20Z,/
No JS, No Ajax

When T5, typically there are A3 and B3.

Fetching done on request and the “natural” fetching are different

The fetching done on request (from the Search Console) and the “natural” fetching done by Google are different.

  • The fetch done on request doesn’t reach the timeout of 10 seconds but it usually wait for around 5 seconds (this is why the name “T5”) before taking a snapshot. The time stamp in this case is an exact time, for example 13:55:50
  • The “natural” fetch wait longer (19~20 seconds, this is why the name “T20”) before taking the snapshot of the page. It has always a time stamp of 00:00:00

Search result on 30 August 2017

Search result on 10 September 2017

Title are mixed. Some of them reflect the Html Title element, others are extracted from the page content.

Search result on 26 September 2017

For the first time all pages are indexed at T20. Note that title is always as T19 while the description is at T20. Both A and B are 10. I noted that while the title is T19, the content of the page is T20. This is a weird behaviour because these two values should be the same.

Search result on 2 October 2017

There are two updated entries compare to 26 September. One entry has a date of 25 September but you now showing on 26 September. It seems that there is some delay between crawling and publishing.

Search result on 19 October 2017

Search result on November 8th

It seems that the sitemap and other pages got blocked by a weird robots.txt. Maybe my account has been hacked? [Edited: it came out that I was not hacked but surge.sh changed their policy, read below.] I restored the original robots.txt, let’s see what happen during the next days.

Original robots.txt

User-agent: *
Disallow:
Sitemap: http://elm-spa-seo-testing.surge.sh/sitemap.txt

Wrong robots.txt

User-agent: *
Disallow: /

Update 27 November 2017

Unfortunately it seems that surge.sh changed their policy about robots.txt. So all the pages under surge.sh domain are not indexed anymore. I moved this site under https://elm-spa-seo-testing.guupa.com/ for the moment. Google has not indexed it yet. I just created a new account in the search console and submitted the new url.

In few minutes Google already indexed the new site:

Search result on 28 November 2017

After 24 hours the first “T20” start to appear:

Search result on 30 November 2017

Google updated other two pages. For one of them it decided to create its own title. I believe that Google generates new title in case it believes that the original titles are not significant. In this case I think it get confused about all these letters and numbers.

Also interesting to see how Google is consistent is rendering T19 in the title and T20 in the meta-description. It seems that different section of the page are rendered at different point in time.

Updates 5 December 2017

Because Google was not reindexing my new version V9 I decided to request an indexing from the Search Console.

Again from the screenshots of “Fetch as Google” I get an impossible state where the snapshot seems taken after 5 seconds (T5) but the second Ajax call got already a 10 seconds result (B10).

I wonder how this impossible state, “T5,B10”, could happen. If you have any idea, leave a comment below.

The mystery of the impossible state “T5,B10”

After few seconds Google updated the search result. The new version of the page is there, at the first position.

The content of the TITLE element in the HEAD section should be something like “SPA and SEO Testing — V9,T5,H1,A0,B[NaN],2017–12–05T08:07:51Z,/” but Google decided to go with a simpler title, probably coming for the H1 element.

Note also that the impossible state “T5,B10” has been replaced with a possible (but strange) state “T5,B[NaN]”. NaN means that Elm created the page but no Ajax call has been returned yet. At T5, both A and B should already received the 3 seconds result (A3,B3).

Updates 13 December 2017

A page has been indexed with two different version. This is clearly another impossible state. It seems that Google render different part of the page in different moments. This is the same as T19 vs T20 issue that I mentioned earlier

Thank you for reading!

If you enjoyed this article give it some claps and share it.

If you are interested in Elm and SEO you may be also interested in