Server Side Rendering and its Relationship with SEO
How Airtable Improved SEO with Server Side Rendered React
Searching the web is a common way for new users to discover new businesses and solutions. Successful user acquisition strategies usually involve finding ways to increase traffic to your website, rank higher than your competitors, and ultimately convince users to sign up for your product or install your app. Small optimizations have the potential for massive impact, which is why search engine optimization (SEO) is crucial.
During 2021, we set out to create a new SEO-focused infrastructure for our marketing landing pages to optimize our search engine rankings and engage more potential users of Airtable. Higher rankings means new impressions, clicks, and conversions from people using search engines.
Historically, Airtable’s logged-in product and logged-out marketing landing pages lived in the same repo and primarily utilized client-side rendered React. When we decided to split the logged-out pages into their own repo, we had the opportunity to redesign the infrastructure of the landing pages to better address SEO needs.
Optimizing how HTML is generated is the foundation for SEO on top of which all other improvements will be built. In other words, if your web app framework and build system do not facilitate SEO very well, it becomes much harder to invest in SEO efforts on other levels. We conducted an exploration into the different infrastructure options for our marketing pages:
- Static web app
- Client-side rendered web app
- Server-side rendered web app
We ultimately decided that server-side rendering best set up a codebase for SEO success; and selected NextJS as the framework to facilitate it. Let me walk you through why we made this decision and how it could benefit teams focused on improving their SEO infrastructure.
Project Goals
- Make it as easy as possible for search engines to index our web pages
- Ensure changes to our pages can be indexed in a timely manner
- Enable dynamically rendered content without re-deploying the web app
- Continue to utilize React and Typescript for consistency across the two repos
How Google crawls and indexes pages
In order to evaluate between different web-app infrastructure options we need to understand how search engines discover and parse web pages to produce organic search results. Google accounted for more than 90% of all search engine searches in 2021. Because of this our SEO evaluation will focus almost exclusively on Google search.
When a page is updated or a new page is added it can take between 4 days and 4 weeks for the new content to appear in Google search results. Google’s web crawler, Googlebot, has three processing phases before sending page data to the ranking algorithm:
- Crawling
- Rendering
- Indexing
The Crawling Queue
The crawler is responsible for searching the internet to discover URLs. It sends an HTTP request to a known URL from the crawl queue to retrieve the page’s HTML. The crawler then parses the HTML to look for more URLs to add to the crawl queue.
For example, if a URL was retrieved from the crawl queue and the below HTML were returned, then the /store
URL would be discovered and added to the crawl queue after checking the robots.txt
file to ensure this page is allowed to be indexed.
<html lang="en">
<head>
<script src="main.js" />
</head>
<body>
<div>
<h1>Kitty Litter</h1>
<a href="/store">Online store</a>
</div>
</body>
</html>
The crawler does not execute any JavaScript and will only parse through the HTML it is provided. This means if the page contains all the necessary content prior to JS executing then Googlebot already has all of the information needed to index and rank the page.
On the other hand, if JS needs to run in order to produce the page’s content then Googlebot needs to do the extra work of rendering the website before it can properly crawl and index the page.
The JS Rendering Queue
Googlebot’s renderer will use a headless Chromium instance to parse and execute the JS in order to produce the complete HTML page. The HTML is sent to the crawler so it can be parsed again and URLs can be discovered. The rendered HTML is also sent to the indexer for processing.
Imagine how much processing power would be required to continuously crawl and render all of the internet daily! Even Google has its limits. How quickly your website is indexed is limited by availability of Googlebot instances, bandwidth, and time.
As with any code though, there may be bugs in the JS which prevents Googlebot from indexing the page. Businesses need to be proactive in monitoring the Google Search Console to check that Googlebot is not encountering JS errors.
The Indexing Queue
The indexer is responsible for parsing and analyzing a page’s content before adding it to the search index. In this step Googlebot will attempt to understand the intent of the content based on the headings, keywords, and overall page content.
Optimizing content for the indexer becomes the main focus point of SEO after setting up the initial web app infrastructure.
Comparing Infrastructure Options
This table summarizes the comparison between the three infrastructure options we considered and the benefits and drawbacks of each, which I will discuss in more detail.
Static HTML
A static website is arguably the easiest to set up since, in its simplest form, it consists of HTML files with fixed content that can be served as static, cacheable assets. No server logic is required and the files are sent to the browser exactly as stored. The term Jamstack is sometimes used to describe this architecture since a deployment is simply a stack of files.
Although a static website site can be built without any web libraries or frameworks, static site generators (SSG) such as Next.js, Hugo, Gatsby, Jekyll, and more allow the use of complex web development features such as components. These frameworks convert code into static HTML at build time which can then be deployed to a content delivery network (CDN). The process of producing markup during a build instead of on-demand on the web-server is called prerendering.
Initial HTML contains all content
The same HTML is provided to every visitor on first page load, including search engine bots. This makes it very easy for Googlebot to crawl and index the page without parsing and executing the JS. This doesn’t mean that we can’t have JS on the page to make elements interactive, but since we aren’t using JS on the client to produce the HTML we are setting ourselves up for SEO success.
Unbeatable speed and performance
Because a static web app project is simply a stack of files, they can be distributed directly from a CDN. The lack of an app server or any pre-processing and the fact that the static files can be cached by the user’s browser means the app will have very fast performance. Fast loading times means a better user experience and better SEO enablement.
Requires deployment to change content
Updating a page means updating one or more files on the file system through a code deployment. This may be fine for simple projects, but for larger more complex projects that update very frequently, without continuous deployment (CD) this can become a huge hassle.
Client-side rendered (CSR) HTML
A client-side rendered app means that the HTML content is almost entirely rendered in the browser with JS. The initial GET request for a page returns a partial HTML file with a <script>
tag that references the main JS file. The JS code runs and injects the contents of the page into the browser’s DOM.
Instead of having different HTML files per page, each route is created dynamically in the browser. React, Vue, Angular, and Ember are all examples of client-side rendered frameworks.
Similar to a static site, no web server logic is needed and the JS bundle and other build artifacts can be deployed to a static server or a CDN.
Initial HTML is missing crucial page content
If you were to disable JS in your browser and visit a client-side rendered web page the page would most likely be missing crucial page content. This is because the initial HTML retrieved from the server is incomplete until the JS referenced in a <script>
tag is executed to inject the contents of the page into the browser’s DOM.
For example, if the below HTML is fetched by the crawler it would have no content to crawl and Google would need to add the URL to the rendering queue. The element with id=”root”
is the DOM container where the page’s content will be injected at runtime after the JS is executed.
<html lang="en">
<head>
<script src="main.js" />
</head>
<body>
<div id="root">
Loading…
</div>
</body>
</html>
Performance implications
Since the browser is doing the heavy lifting of producing the HTML any 3rd party JS dependencies being used need to be downloaded to the client which increases bandwidth and decreases performance. The more work the browser needs to do, the larger impact on the user experience and SEO. Because the JS must run before the HTML can be crawled, CSR web pages take longer to appear in search results.
Dynamic content
Since the HTML is produced dynamically on the client there is a lot of flexibility to call APIs and display different content to different users, or update content without a code deploy by updating a CMS or database.
Server-side rendered (SSR) HTML
Server-side rendering is the process of rendering a client-side JavaScript application to static HTML on the server. SSR is a large category with overlap with both static sites and CSR. It is similar to static site generators in that JS frameworks are used to produce HTML from the server, but SSR web apps can do so dynamically at runtime on a web server. This means that when a user or bot visits a web page, the server receives the http request and dynamically generates HTML specific to that request that it then returns to the client. This “flavor” of SSR that I am referring to is sometimes called classic SSR. To avoid confusing this infrastructure with prerendering, the main distinction is that the rendering occurs at runtime and not build time.
All popular modern web frameworks such as React, Vue, and Angular support some form of SSR and frameworks such as Next.js are specifically built around supporting server side rendering.
Initial HTML contains all content
Since the initial GET request for a page returns all of the content in the HTML it makes it very easy for Googlebot to crawl and index the page.
Dynamic content
Since the HTML is produced dynamically on the server there is a lot of flexibility to call APIs within the web server, display different content to different users, or update content without a code deploy using a CMS or database.
Performance implications
Since the server is doing the heavy lifting of producing the HTML there will be a performance impact. This could be mediated by adding some caching of the HTML, but it won’t be as fast as a static site.
Airtable’s SEO optimized marketing stack
Airtable concluded that an SSR infrastructure would meet all of our goals for a new marketing site. In the past we experimented with a custom React SSR solution using ReactDOMServer but found it cumbersome, so we opted to use an SSR framework that would handle the boilerplate for us. Next.js would allow us to continue to use React and Typescript, but also provides some additional performance benefits such as incremental static regeneration. Static regeneration allows Next.js to operate as a hybrid between being a static site generator that produces static HTML at build time, and a SSR framework that can produce updated HTML at runtime.
The logged-in Airtable product would continue to live on https://airtable.com and be client-side rendered, but all marketing landing pages would be migrated to the subdomain https://www.airtable.com. Since search engine crawlers don’t log in we only need to optimize the logged-out pages for search engines.
Over the majority of 2021, Airtable’s growth and marketing teams built this infrastructure, migrated existing pages to the new stack, and created brand new marketing pages such as https://www.airtable.com/solutions and https://www.airtable.com/enterprise. In a future post, we will cover some of the challenges Airtable faced during the subdomain migration, resolving backlinks between the domains, and how we manage multiple projects inside our new monorepo. The project was a large success and Google has been happily indexing our new, blazingly fast, https://www.airtable.com/ marketing pages. We will continue to build on top of our SEO focussed infrastructure and look forward to seeing the long-tail effects it has on our search engine rankings!
If you found this interesting, we’d love to tell you more about what we’re up to and how you can be a part of this! Do reach out by clicking here.