SEO Fail: Figuring out why I can’t get my content into Google

Michael Flynn
3 min readNov 14, 2014

--

TL;DR: For Single Page Applications, Googlebot and AWS S3 don’t play nice with each other.

I’ve been working on a project that uses Amazon Web Services (AWS) as our infrastructure. Here are the main services of the project used:

- Amazon’s Elastic Compute Cloud (EC2) hosted the custom Application Programming Interface (API).
- Amazon’s Relational Database Service (RDS) hosted the database.
- Amazon’s Simple Storage Service (S3) hosted the static website with a Single Page Application (SPA).

With these services setup, we launched an iOS app and the website with only minor issues. For the website, I used the AngularJS framework to develop a SPA. This is where one web page will display all the dynamic content that the user’s browser requests from the API. To help the project get some traction, I attempted to do some Search Engine Optimization (SEO) on the website. At this point, I learned that the Googlebot might have issues with SPA sites. If Google can’t crawl your website then you’re going to be missing from search results.

To help Google index the website, I tried their Webmaster Tools to get everything straightened out. Their documentation indicated that HTML snapshots was the way to get the content indexed. This mean that a process or service would have to create these snapshots of the website. PhantomJS seemed to be a good candidate for creating the snapshots. Unfortunately, this solution seems to need a more robust web server than I had with a S3 website.

Further digging into this problem, revealed that Google had improved the Googlebot. Now when Google crawled your website, they would run the Javscript included on your web pages. It seems I shouldn’t have to do anything to get the dynamic content into the search index. Though the “Fetch as Google” feature wouldn’t render the content I expected. No matter how I configured my SPA, it always displayed the default content.

This led me to create some simple examples that would cover all the configuration option for Angular.js. It wasn’t clear to me if Googlebot could handle HTML5's pushState. This allows SPA to have normal looking URLs and not need the “#” present. In any case no matter how I configured the SPAs, “Fetch as Google” would always show the default content when hosting on S3. For a normal browser, all configurations work and displayed the correct content.

At this point, I decided to take the examples and host them on a different server. Through a hosting plan I already have, an Apache web server would host the examples. Testing with the “Fetch as Google” resulted in the correct content rendered for Google! Now I needed to identify what changed between the two servers to find the cause of my problem.

The HTML code didn’t change between the servers, but there was a difference in the configuration. On the Apache web server, an .htaccess file uses the rewrite directive to serve up the main HTML file if a file isn’t found. While on the S3 server, if a file isn’t found a routing rule would redirect the browser to the main HTML file. I’m convinced the redirect is causing Google to render the default content. The S3 server’s redirect includes all the information in the URL needed to render the correct content. Unfortunately, Googlebot seems to drop some of that information and thus ends up with the default content. Since S3 doesn’t support the rewrite directive, I couldn’t see if that would fix the problem.

An Angular.js Single Page Application will work on Amazon’s S3 server for the end user. Unfortunately, the need for inclusion in Google’s search index means I should avoid using the S3 server. It is possible that a Content Delivery Network might solve this problem. Thought that is not something the project is planning on using at this point due to costs. The website will most likely end up moving to a more traditional web server to make SEO easier.

--

--

Michael Flynn

Hacker / Dad / Web Developer / Consultant / Wannabe entrepreneur / In search of ideas I can implement