MASTER YOUR TOOLS: GATSBY

Published in

Coverwallet Engineering

12 min readDec 22, 2021

In Coverwallet I work in the Acquisitions team. We are in charge of the landing pages, which are the first contact that the users have with our company. And as we know the customers’ experience navigating the website has a big impact on their overall satisfaction.

Some years ago, 5 years to be precise, Coverwallet was (and in its soul still is) a startup. The first initiatives that the company started to develop were the landing pages since the hardest mission to achieve, in any entrepreneur’s adventure, is to get new clients.

So far so good, because the first projects usually are the ones that receive the most love but are also the ones that quickly become obsolete.

Initially, our landing pages were developed in Ruby on Rails, for the ones that are not familiar with it, it’s an open-source server-side application written in Ruby. It also provides defined structures to create databases or web pages. We combined this monolith with some lines of code of Jquery for some simple pieces of interaction.

The dynamic content of landing pages is generated and obtained via Contentful (https://www.contentful.com) which is a headless content management system (CMS), where we have generated several Content Models associated with different kinds of web pages for SEO and SEM (Insurance, Industry, Business…). This enables the Marketing team to easily update the content of the pages.

Looking at it this way, Ruby didn’t seem like a bad idea to build a few landing pages. And it wasn´t.

The downsides

Now that we are part of Aon, we are expanding globally and launching several initiatives at the same time while maintaining the ones that were launched in the past. So, now that the number of landing pages has been equated with our opportunities, we faced some issues that could prevent us from continuing using the same approach:

1. Hard scalability

Part of the pages still had hard-coded content and that was not created in a very dynamic way. A fast-growing environment implies fast development, and this usually implies a low code quality.

2. Complex architecture in backend language: Ruby vs React for simple landing pages that work as a SPA (Single Page Application)

We had a complex Backend architecture just to build a bunch of SPA´s that used a CMS to get the content and did not really need a database.

Besides that, most frontends prefer to work with the latest javascript libraries like React over rails and Jquery.

3. Different repositories with different architectures and logics for different tenants: multi-tenancy

In Coverwallet we had different tenants that consumed all our applications, so we needed to create a scalable code that’s able to adapt automatically to different configurations of data or styles.

The proposal

A few months ago, we decided to rethink our landing pages and we proposed to the stakeholders to work with the framework Gatsby https://www.gatsbyjs.org/, a free and open-source code based on React that builds static HTML files. It easily connects with Contentful and other CMS via GraphQL to get the content.

It will allow us to create landing pages in a faster and better-structured way.

Initially, it sounded like the perfect option, and it really was, but when we started to dig deeper into the framework, we found several complex challenges and lots of new learnings.

Fortunately, Gatsby has very good documentation support, thousands of plugins developed by the community, and a strong network of developers supporting it.

The challenges

Those are the main 4 challenges that we faced:

1. Server-Side Rendering: How to make it work properly

The first time I heard the acronym SSR was when I started working with Gatsby. And of course, I did not understand what it really meant. What I basically discovered is that SSR is a look back to the past where the websites were composed just by a plain HTML file and a CSS file.

Let’s go deeper

With the current pages built with frameworks like Angular, or libraries like React or Vue, the javascript is compiled in the browser, which means that we are not really serving an entire HTML file with all the content of the page that we access on a defined route. The routes of the pages are being created directly in the browser and we are rendering the page at runtime, that is while the program is being executed.

This is called Client-Side Rendering (CSR), and it is slightly different from SSR.

With the Server Side Rendering, instead of creating the pages in the browser, when we access a website, the request information is sent directly to the server which responds with a file with the complete HTML code.

There is a lot of documentation on the Internet titled “SSR VS CSR”. I can’t really tell you which one is better, but all I know is that for our purpose, SSR seemed like the right choice.

Why?

We wanted to build simple pages without a complex interaction on the same page. We also wanted to improve our performance as much as we could, so by serving the full HTML pages, already built in the server, it seemed like it would save us a bunch of operations in the browser which will help us serve the pages faster.

The framework

So far so good, but Gatsby is a framework, which means that you need to understand how to use it properly, and sometimes this is not so straightforward.

We started creating our first landing pages without really noticing that we weren’t really creating all our content on the server-side. It took us some time to discover it, but we found it by taking a look at the page request response in the browser in the Network tab of the developer's tool.

It turned out that Gatsby puts all of the content between a <div> with an id called ___gastby. If there is nothing inside this <div>, it means that nothing has been previously built.

Which is the way to properly build pages on SSR?

As I have said there is a lot of documentation, and some of it can be found here https://www.gatsbyjs.com/docs/reference/config-files/gatsby-ssr/.

But summing it up, Gatsby has two important files: gatsby-browser.js and gatsby-ssr.js. It is necessary to instantiate our main layout in both of the files for SSR to work properly, and for the files to be properly hydrated in the browser.

2- Hydration: what is this?

First of all, what is hydration?

When we render our page in Client-Side within a React application, we use ReactDOM as an entry point to the application, and then all the components will be loaded one by one.

But when we build the application on the Server-Side, and before deploying it, the pages will be generated based on the React components defined in the code. At that point, it would be impossible to render those pre-created components with the usual ReactDOM, because in this case what we use is ReactDOMServer, which works in a node environment.

After rendering the page with ReactDOMServer what is left is to add the functionality to the components and this is done by ReactDOM.hydrate on the client-side. Calling this method entails that all the elements have been already rendered, and it is only expected to take care of attaching event-listeners for the interaction.

The problem that we faced here is that at some point this render method seemed to fail during the build process, and we detected that after the deployment, some elements just disappeared. We could not find a specific pattern for this, sometimes it was related to some texts, and other times some images, or even an entire component.

Honestly, we could not find a proper way to fix this, and after some extensive research, we found out that the easiest approach to those failing components was just to render them on the client-side. So we created a react hook just to detect if we were on the server-side or on the client-side and we only rendered this component through the usual render method if we are on the browser.

It is just a simple hack rather than an actual solution, but there is a lot more research needed to understand how to properly fix this hydration problem, and we are still learning.

3- Public folder vs multi-tenancy

When Gatsby builds all your pages, it automatically creates two folders: public & .cache.

The public folder is the place where all the essential files, to serve the landing pages, are created during the build process. Usually, you can find there all the compiled javascript files, and a folder for each page with the assets and the main HTML with all the code already compiled. On each build, all the assets contain unique hashes.

The .cache folder is fundamental during the developing process because it is where the data persists between builds.

Each time we re-run our gatsby-develop (which emulates the build process) or we build it from scratch, those folders are being removed and created again. That’s why working with multi-tenancy was a real “pain in the ass”.

Why multi-tenancy?

Imagine that you need to build landing pages for different affinities. And each affinity has several landing pages under the same domain but with a different pathname. All of them are sharing the same source code, which means that it is the same Github repository.

However, we only have one folder, the public folder to build them under the same application. One ring to rule them all…

At some point, we needed to build all of them and differentiate their assets, so our first approach was, ok let’s build them, and each time we finish a build we copy all the files generated in the public folder and we paste them to a folder with a specific name to the corresponding affinity that we can create dynamically.

For an optimal solution, we decided to make the process asynchronous. What’s the problem with that?: race-condition.

Since we did not have control over the different build processes, and all the affinities were sharing the same public folder, at some point while we had the assets for one of them, another started, and we were mixing assets.

Ok, let’s do it synchronously. One build at a time. And this time, we move all the files and assets from the public folder to our custom folder for each affinity, and we ensure that with each new build the folder is clean.

This sounded like the best option at the very beginning, but the process was never perfect, and we always ended up with a race-condition at some point.

What would be the perfect solution? That Gatsby would be able to create dynamically different public folders. And we found some interesting attempts for the Gatsby community to make it work, but none of them were successful.

So finally we decided to create independent micro-Gatsby applications for each affinity inside our application, each of them with all the needed files to make the build process work, and with their own public & .cache folders.

Later on, we have improved the process to remove the unneeded files after the build, in order to only leave the static files public and the .cache folder in the server.

3. GraphQL: How it works

I have to say I love GraphQL, it simplifies the way of requesting data. No need to think of endpoints, requests, promises…

Very simplified, but basically, all the content is stored in JSON files, and you just need to access it like if you were accessing an object.

The good thing is, the JSON files are created based on a schema, which is created based on your request. Which means that you only get the data that you need. Nothing less, nothing more.

And Gatsby makes all this process for you https://www.gatsbyjs.com/docs/graphql/, it creates a schema based on what you are requesting, in our case to Contentful, and makes it easy to access the data in all the components.

Apart from this, it has a UI (User Interface) to check all the GraphQL schema that has been created during the build process.

Here is a visual example of how a simple query looks like when executed inside of the GraphiQL UI. This simple UI helps a lot with emulating the queries of your project.

As I said, in Gatsby, during the build process, a GraphQL schema is created based on our query.

In our case, we use a specific plugin to connect Gatsby with Contentful, so in this schema, we get all the content of the entries that we have requested and that are stored in Contentful.

What was the challenge here?

GraphQL needs to know in advance which are the types of requested data in order to properly create the schema. To properly define the types, the data needs to exist.

Let’s understand quickly how Contentful works. It has two important elements: Content Models and Entries.

A Content Model is a schema of fields and their corresponding types, in which we can create entries with content based on this schema.

Example of Content Model

Example of entry creation

Here’s a simple example: we have the Content Model Authors, which is composed of two fields: name and age. Both of them are of type short text (string with limited characters). So, I can create all the author´s entries that I need, and they will be always following this schema. This means, all of them can have name and/or age, and the corresponding content needs to be a short text.

But of course, I could also create entries and only fill in the field name, leaving the field age empty if it is something optional.

Here comes the problem. Whenever we define our GraphQL query, we would want to request both fields: name and age, because eventually one of them will have content.

But what happens if none of the entries have the age field? The GraphQL does not know which type it should allocate to this field and thus triggers an error ruining the build.

What could we do?

The initial solution, and the most suggested in most of the forums, was to create Dummy entries for all the Content Models, this way there will always be content to interpret the types. And we used this approach for a long time until the problems with updating and changing the content started to appear, and our query was broken so frequently.

That’s when we found that the best option would be to create a schema with a type definition of all the types that will be used during the build process to define all the previous types, independently from the content.

In fact, Gatsby has a specific method for this purpose in its internal node API, that needs to be used in the gatsby-node file, that allows you to define all the types:

exports.createSchemaCustomization = ({ actions }) => {
   const { createTypes } = actions;
   createTypes(schema);
};

By using this method and properly defining all the types of all the fields, our project became much more robust and reliable. And we also discovered that the build process time decreased drastically!.

4- Performance: Not so easy to improve

In the earlier phase of our Gatsby project, we were delighted to see that our landing pages had a very good performance on the Google speed test https://developers.google.com/speed/pagespeed/insights/, the tool that Google uses to enable developers to test the performance of their websites while also providing a lot of documentation and information about how to improve them.

Note: It is a bit tricky, I mean the percentage result can change depending on the computer and the Internet connection.

So we were happy until we started to add third-party libraries to our project for tracking purposes, and this killed our performance. We needed to improve our code to accurately load all our external scripts asynchronously to avoid blocking the DOM rendering.

This made our performance improve quite a lot, but we noticed that the difference between the desktop and mobile percentage was huge, so we needed to do something else, and we decided to refactor all our styles to mobile-first.

What is mobile-first?

The styles that we define as “default”, are the ones that the browser interprets first, and when we resize our screen then it re-calculates all the media queries to re-paint the DOM in order to adapt the site.

What happens if we open the page directly on mobile? The browser needs to interpret the default styles first, and later re-calculate the media queries. So this was really affecting our performance on mobile.

So we basically refactored our styles to redefine the mobile styles as the default behavior and to recalculate the media queries on tablet or desktop. And yes, it was affecting the desktop performance but not so much, and it definitely was worth it.

So, in conclusion, there was much more to learn than what we had originally estimated, but it definitely was worth it. A journey to learn lots of new concepts, and most of all to learn about our true aim. As my good friend Andrea Carrozzo has told me thousands of times, “Master your tools”.

MASTER YOUR TOOLS: GATSBY

The downsides

The proposal

The challenges

Written by Marta Palau