Top lessons to learn from refactoring code with a headless CMS

Published in

Harry's Engineering

9 min readMay 2, 2024

Background

The Harry’s European website (for Harry’s customers in the UK and the rest of Europe), like most websites, is a content-heavy website providing customers with information about our products and services, which means we seek to keep our content management separate from our codebase to enable both to change independently of each other as well as allowing content to be edited without any engineering intervention.

Having worked on a variety of websites using different Content Management Systems (CMS) over the years from WordPress, to Drupal, to SiteCore among others, there is always the challenge of deciding on good abstractions to build between the code I write while not being constrained by what’s possible within the CMS. The tricky part is striking the right balance between being fully dynamic and being overly complicated resulting in the maintenance of the codebase becoming a nightmare for the team.

What is a Headless CMS?

What makes a particular CMS “headless” is in regards to how you implement it into your codebase. Unlike traditional Content Management Systems, a headless CMS is agnostic to your codebase, focusing only on the data you want. Returning the data as JSON on a request. This allows more freedom and flexibility for the developer who is not limited by the access library or any templating language.

We’re currently using DatoCMS as our headless content management system, which basically means DatoCMS provides a useful dashboard and portal for managing content which enables our Product, Design and Marketing partners to manage content for the website without having to write or know any code. We can then access that content via APIs that DatoCMS provides to pull into our site at build time. This is different to some traditional CMSs because of this decoupling and doesn’t prescribe a particular architecture for your site (for example, WordPress or Drupal).

NOTE: It’s worth mentioning at this point that our Harry’s European website is currently a statically generated JavaScript-based Single-Page Application (SPA) written in TypeScript with Gatsby and React.

What is the Trial Builder?

One of the most important flows that we have on our site is the ‘Trial Builder’ which is the name given to the customer journeys where you subscribe for a Harry’s trial set (e.g. razor handle, blades and shave gel) and then decide on your subscription cadence for refill blades as well as other products.

See it for yourself, here: https://www.harrys.com/en/gb/signup/customize?step=1

This flow references no less than 15 pieces of content in DatoCMS made of approx, 10 images and 25 strings. We also invest a lot of time in experimenting with A/B tests in this customer journey to further optimise it to surface additional features so that customers can get the most benefit from Harry’s products. We can then measure the impact on our top ecommerce funnel metrics such as conversion rate (CVR), average order value (AOV) and bill-through rate (BTR).

The Trial Builder is also leveraged by the Subscription Signup flow. They both have a similar customer journey but with an alteration to the steps or content. Though the Trial Builder is the most important of these flows, any changes need to support the other journey to facilitate code reuse. This leads to requirements for the logic to be flexible to handle these variations. With multiple variations, continuous A/B tests and changes based on country, we can quickly start to gain tech debt, spaghetti code and a generally bloated codebase.

What problem are we solving?

As mentioned earlier, the engineering challenge for working with Headless CMSs is to strike the right balance between code maintainability and flexibility (or dynamism). You want to be able to realise the ideas that the Product Managers come up with without having to re-architect your website every time but you also want to make sure you don’t over-engineer your code when it’s likely you’ll never need 100% full flexibility either.

Back in 2023, we got this balance wrong. We attempted to make the most flexible code possible to handle as many possibilities as we could think of but the reality was we never made use of it. We had a clever setup which enabled entire subscription flows to be created from DatoCMS. From how many steps, to what elements would appear on the page. The idea was to remove any engineering effort when setting up changes or new alternative versions of the journey. This did work well at first, the complexity came when we started to make UI changes based on A/B tests by locale. Implementing these changes without breaking the complex dynamic logic became incrementally more difficult over time. Eventually, it started to block requirements.

We decided to make the code interacting with the DatoCMS APIs as flexible as possible but by doing so it caused the codebase to become the opposite.

Though the intention was to enable a customer journey to be completely configurable via the CMS, over time most of the UI changes still needed engineering implementation, but now each change had to work around this complex condition logic setup in the codebase to support our super dynamic relationship with DatoCMS.

Over time this caused each update to take longer and even blocked us from even implementing some features. As the code became overly complex, this also started to introduce bugs and inconsistencies across devices.

How did we tackle the refactor?

1. Aligning with stakeholders on the problem

To begin, I looked back at DatoCMS and broke down what we needed based on the last 6 months of recent work as well as future / planned work coming up. Firstly, I noticed that a lot of the dynamic features setup for DatoCMS were not used much and actually slowed down content editors.

Before any architectural changes, it was important to align with my Engineering Manager and Product Managers who use the CMS each day to agree on what the pain points are and to ensure we did not just tackle the refactor from a purely engineering perspective alone.

The overall plan was to simplify the complexity in the CMS, making it easier for content editors to manage changes while moving any dynamic complexity to be controlled by the codebase. The expectation being that this will actually bring more creative flexibility for Product Design and engineering and shorter lead times for adding new features to the Trial Builder.

2. Simplify models for DatoCMS

We wanted to simplify the data models within DatoCMS to focus only on content and no longer tightly couple it to the Trial Builder flow or structure of the UX. The new content was setup to be a simple set of key / value pairs one-level deep (i.e. not nested). This made it easier to add extra fields when needed.

Before: The previous data structure contained multiple levels of nested data models, controlling the layout and flow of the trial builder as well as its content.

**Before**: Representation of our data structure before the refactor

After: The new data Model only concerns itself with content required by the trial builder. No longer affecting the structure or layout of the user journey and simplifying its overall complexity.

**After**: Representation of our data structure after the refactor

Refactoring the Trial Builder code

In order to update the Trial Builder codebase while minimising the impact to planned Sprint work, I was able to break down the job into smaller chunks in collaboration with our Product Manager. This helped me to make sure we had alignment on what work would be done and in which forthcoming sprint.

I was able to split this into the following 6 high-level tasks

[CMS] Import the new data model into the code via GraphQL.
[CMS] Replace all references to the data in our current codebase
[Refactor] Duplicate the existing feature code.
[Refactor] Refactor the code to no longer need any logic required by the old version.
[Cleanup] Delete the older version, routing users to the new version
[Cleanup] Delete the old CMS model.

Each of these tasks was broken into smaller incremental tickets but this order allowed us to do it over a series of months and in such a way that any member of the team could pick it up.

Learnings & Takeaways

Based on this project our team collected the following learnings that we think were worth sharing with other teams thinking of doing a refactor.

1. Aligning team members on the expected outcome

The aim of the process at each step was to be transparent and keep the team updated. The scope and goal for the refactor were initially not well defined. Thanks to this process of alignment and transparency, we defined the goal and made quick progress afterwards. Ensuring a good shared understanding earlier on between Product & Engineering of the problem we’re solving, the benefit to the business and the timeline will have helped to speed up the process.

2. Duplication of code is OK, in the short-term

During the refactor, it was helpful to have the older and newer implementations running in parallel with the main drawback being that it did result in duplicating a large portion of our codebase. While the duplication helped, we could have been more diligent in deprecating any repetitive code more incrementally without leaving it all to be done last. However, at the point we deleted the old code, we were confident we no longer needed any of it. (Just be sure to actually delete the dead code!)

3. Smaller tasks don’t always mean small PRs

As the Trial Builder is a critical part of our customer journey we struggled on occasion to keep our pull requests (PRs) small due to interdependencies with other parts of the codebase. Large PRs even if they have small functional changes during refactoring made peer reviewing PRs slower than we would like as we had to be confident that it didn’t result in a regression for customers in different locales. Instead of being constrained by the scope of the task, we could have scoped by the complexity of the change and broken things down further into smaller PRs.

4. Balance of flexibility against complexity

Working with a headless CMS like DatoCMS opens up lots of potential for creative freedom and there is always a strong desire to find interesting and clever solutions that help your Content Editors realise those dreams without being bottlenecked by engineering effort.

However, this leads to a potential pitfall of over-engineering your solutions when the reality of those needs is likely to be much more modest. Taking the time to meet with your Product Manager and review your Product Roadmap to discuss exactly what features will be needed, it’s possible you’ll start to see that, ‘You Ain’t Gonna Need It’ (YAGNI). This is where striking the right balance between flexibility with complexity is important for codebase maintainability and long-term team velocity.

Conclusion

Overall we completed the Trial Builder refactor in July 2023. In total, this project spanned about 5 months with small incremental deliveries each sprint to avoid a big-bang refactor. We preferred an approach of upgrading the code by chipping away at the underlying tech debt each sprint which will be sustainable long-term. We managed to achieve this consistency by collaborating closely with our Product partners to negotiate bringing in specific tech debt tickets each sprint so we made regular progress.

In the end, we were able to drastically improve our CodeClimate score for tech debt by deleting over 30K lines of dead or duplicated code (partially due to code we chose to duplicate to make the refactor easier).

As we were able to simplify our codebase we’ve been able to streamline our approach to experimentation in the checkout as now setting up an A/B test in the Trial Builder takes approximately 2–3 days to launch instead of the previous 5+ days, which was half of our sprint. We see this as a big win for the business because by increasing our team’s experimentation velocity, we can learn faster about what features customers like or don’t like to further optimize their checkout experience.

We know there is still more room to improve here but we now have a good basis for evolving the codebase with better practices and hygiene around how we setup and cleanup experimental code after our tests conclude.