Using an NGINX Reverse Proxy to Host WordPress in a Subdirectory: SEO for Blogs

And how we solved the obstacles along the way

--

By Evan O’Connor — Haven Life Software Developer

Haven Life has a marketing blog that is built on Wordpress and hosted by a third-party called WP Engine. Although we use WP Engine, to improve our Search Engine Optimization (SEO) ranking we wanted the blog to be accessible on a subdirectory of our main site, havenlife.com/blog. Our problem was that WP Engine does not support blogs on subdirectories, and only allows hosting blogs on a subdomain; for example, blog.havenlife.com.

The debate surrounding the impact on Search Engine Optimization (SEO) between hosting on a subdirectory versus hosting on a subdomain is ongoing and has valid points on both sides. But after researching case studies citing SEO improvements by hosting on a subdirectory¹, we decided to host our blog with this method.

To achieve this, we built an NGINX reverse proxy that directs requests for havenlife.com/blog to the WP Engine host. When you first set up a blog on WP Engine, it is hosted at “{install_name}.wpengine.com” (for us, that was havenlifeblog.wpengine.com).

The Obstacles

However, our decision led to a few complications related to WP Engine’s lack of support for this use case. While this initial reverse proxy setup did successfully serve the blog as a subdirectory of havenlife.com, we noticed that search engines were having difficulty rendering our site.

Google provides a mobile usability tool (https://search.google.com/test/mobile-friendly) that lets you preview how your site appears to their web crawlers. After testing with this tool, we saw that Google did not believe our site to be mobile friendly.

Failed Mobile-Friendly Test results for havenlife.com/blog

This result was frustrating because our blog did load successfully on real mobile phones. This failure was happening on every page of our blog, so we knew there would be a negative impact on SEO. Upon viewing the details from this test, we saw that Google was unable to load our site’s JavaScript/CSS files because they were being blocked by the robots.txt file. The robots.txt file tells web crawlers which files should and should not be crawled. Without access to these files, crawlers see our site stripped of its themes and styles which would make it difficult to navigate on mobile devices.

To solve this obstacle, we first checked our main site’s (havenlife.com) robots.txt file; but we found the file allowed crawlers to access every file, so that couldn’t have been the culprit:

havenlife.com/blog allows crawlers to access all files on this domain

Upon further research we found that, because our blog’s JS/CSS files are hosted on WP Engine, there is another robots file on the domain havenlifeblog.wpengine.com that was blocking crawlers from every file:

havenlifeblog.wpengine.com/robots.txt blocking crawlers from all files on this domain

This robots.txt file was created by WP Engine, and because our JS/CSS were hosted on this domain, web crawlers were blocked from viewing them. The simple solution would have been to change this file, however WP Engine does not allow that. The reason is that domains like havenlifeblog.wpengine.com are the default install domains for new blogs, and if you are using this default install (with “wpengine” in the URL), your blog is considered “not production-ready” and thus crawlers are not given access. In order to edit the robots.txt file we had to configure a custom domain in WP Engine.

The Solution

We chose to configure blog.havenlife.com as a custom domain for our site in WP Engine. This process involves adding the domain in the WP Engine console and setting up a DNS entry pointing to the WP Engine domain. Because we host our main site in AWS, we set this up using Route 53. Once this custom domain was set up, we were given access to edit a new robots file for that domain.

blog.havenlife.com/robots.txt file allows crawlers access only to files in /wp-content/ and /wp-includes directories

Because our site is accessible at both havenlife.com/blog and blog.havenlife.com, we do not want web crawlers thinking that this is duplicate content. The file at havenlife.com/robots.txt makes the blog visible via the subdirectory “/blog”. However, we wanted to block bots from crawling blog.havenlife.com and only allow them to view the JS/CSS hosted there (in order to pass the mobile-friendly test). For this reason, the robots.txt at blog.havenlife.com only allows /wp-content/ and /wp-includes/ directories that contain the blog’s JS/CSS.

To prove that we would not be marked down for duplicate content, we verified that crawlers were blocked from our custom domain blog.havenlife.com.

Google web crawler is blocked from our custom domain blog.havenlife.com to prevent duplicate content

The final steps were to change our NGINX proxy to direct requests to our custom domain (blog.havenlife.com) instead of the WP Engine default domain, and to change the value of “WordPress Address (URL)” to the same (via WP Admin). These changes moved our JS/CSS files to be hosted on the custom domain, which allows crawlers to access them. After these changes, Google’s mobile testing tool reported havenlife.com/blog as mobile friendly.

Successful Mobile-Friendly Test results for havenlife.com/blog

Our NGINX reverse proxy

The Aftermath and Future Considerations

In the end, we were pleased to see that Google was reporting all of our blog pages as mobile friendly. However, there was a fair amount of work involved in getting this reverse proxy working. The reverse proxy adds a layer of complexity to our blog and because it is not supported by WP Engine, prevents us from asking their support team certain questions. Because Google’s algorithms are constantly in flux and how they rank sites is somewhat of a black blox, the benefits of hosting on a subdirectory vs. subdomain are somewhat unclear. We did see an uptick in search engine rankings, however this change was one of many improvements to our blog, so it is unclear how much impact this had. In the future, we may opt for a simpler setup.

We also considered hosting our blog in-house on AWS, however we decided to stick with WP Engine for the time being. WP Engine allows for simple access control for our Wordpress developers and comes with useful features like 1-click staging. The downside of using WP Engine is that we lack visibility into their system when debugging issues. Fortunately, their support team is available 24/7 via online chat and has been able to help us sort out any issues that have come up.

--

--