How to migrate digital projects from WordPress to Contentful

The obvious and the not so obvious you need to know about moving content around

Published in

Apolitical Engineering

10 min readAug 24, 2021

It’s been a while since my last blog post about Building a People API for a Growing Platform, and a lot has happened since then. The Apolitical platform allows public servants across the globe to discover knowledge and learn from the community, and it continues to grow.

Let’s take a minute to talk about the context of this new blog post. Big changes were underway at the start of 2021 regarding Content Management Systems (CMS), driven by frustration with our self-hosted CMS’s inability to scale, and a desire to build and ship faster. The blog post about the Journey to Finding the Right CMS has covered the reasons behind the decision-making process to adopt Contentful into Apolitical’s tech stack. As a result, the focus of this blog post will be to look at how the engineering team accomplished the migration of already existing digital projects from WordPress to Contentful.

TL;DR: Do not underestimate the time and effort required to migrate digital projects. Start by designing your content model, then spend some time building internal tooling to handle the migrations. And finally, make use of the Contentful APIs to present the content on your website or any other channel. There is light at the end of the tunnel. And some trees.

Tunnel in The Smokies. Photo by Ryan Wallace. Unsplash

Getting started with Contentful

This section guides you through some steps to help you get started with Contentful as well as providing useful resources to help you manage the projects successfully and smoothly.

If you are new to Contentful, there’s no better way to learn about it than by taking some of the free courses at the Learning Center. Contentful has a very low barrier to entry: you can create your own account for free and start experimenting with it straightaway. The course Managing Content at Scale can be a great source of inspiration to understand how to use Spaces and Environments on Contentful. Start by putting the focus on choosing the best space modelling strategy, and on using environments to create an agile development process.

Contentful spaces allow you to group all the related resources for a project together. Space modelling is the concept of utilizing spaces to design the best architecture for your project. A space is not only a collection of content, it can be configured individually using Webhooks, API keys, Locales, and more. There are different space modelling patterns, such as one space to rule them all; separate space per project, and multi-tier. At Apolitical, it was decided to implement the space modelling pattern of one space to rule them all, which was a perfect fit for all of our requirements, including content types and entries being reusable across projects that have the same localization needs, and multiple teams being able to collaborate across projects. From the expenses perspective, it is also important to bear in mind that because spaces are billed individually, this space modelling pattern is more affordable.

Now that you have seen different space architecture, it’s time to think about how to use environments to implement an agile deployment pipeline that adheres to CI/CD best practices. Space environments are entities within a space that allow you to create and maintain multiple versions of the space-specific data, and make changes to them in isolation. By default, every space has one environment, called master. Additionally, multiple sandbox environments can be created. And sandbox environments start as exact copies of the master environment.

At Apolitical, it was required to host multiple Staging Environments for testing, and, as a consequence, one sandbox environment for each of the staging environments was created. In addition, environment aliases were activated, causing the master to no longer be an environment but rather an alias for a different environment. In this case, the master environment initially pointed to live-v0.0.1 and allowed serving the production content from it.

You might wonder: why would I do that? Well, there are lots of benefits from using environment aliases, such as minimizing downtime and allowing instant rollbacks.

Promote an environment to master. Graphic by Contentful

Our biggest challenge was to automate keeping the staging environments synchronised with all the content that would be created day after day by the content team on the live environment. For this task, there is no built-in way of solving the synchronization process, and the engineering team decided to create an internal Command-line Interface (CLI) tool to handle it. The CLI tool was implemented in Node.js with the use of the contentful-management module, which allows you to use Contentful’s Content Management API (CMA) to expose a recreate command, which automatically deletes a staging environment and creates a new instance of it from live. The CLI tool was integrated with GitLab CI/CD scheduled pipelines so that all the content types and entries are transferred from live to the staging environments nightly. It’s still very impressive how quickly and reliably Contentful can handle the environment recreation process.

Finally, another command was added to the CLI tool to backup the content from Contentful to Google Cloud Storage (GCS). The backup command was implemented with the use of the contentful-export module, which nightly exports everything from the live environment into a JSON file, and then uploads the file to GCS.

Designing the content model

If that sounded a bit discouraging and boring, don’t worry, here comes the fun stuff. In this section, you will delve into the content modelling design process. Depending on the circumstances, you may want to follow a strict Content Modeling Design Process, involving steps such as: agreeing on strategic objectives and creating a content model. However, in Apolitical’s case, the digital projects were already well defined and live on the platform, which allowed for this journey to be shorter, and the process could start straight from the creation of the Content Model.

Content modelling design process. Graphic by Contentful

In the beginning, as described in this related blog post, it’s difficult to switch from the traditional CMSes approach, where the content and code are still combined in web-centric frameworks, to the headless CMSes approach, where the content is stored separately from the code used to build the presentation layer.

When thinking about content, it is easy to focus too much on where the content will live, leading to a fixation on the interface: a website, a mobile device, a digital screen. But this interface-centric thinking is what often leads to trouble, endless redesigns, siloed content, and wasted time and money.

Using content models can help you to separate the content from context (the interface or presentation layer). Start by taking a look at the Intro to Content Modelling and Content Modelling Design Patterns courses to understand all the required and relevant concepts and vocabulary. The key to succeeding at creating your content model is breaking down the content into small, well defined and reusable pieces. The use of contentmodel.io was incredibly useful to visualise and iterate the content model for Apolitical’s digital projects.

The first content model was designed by the Apolitical engineering team for the Microcourses digital project. It was not a simple content model. For example, the Microcourse content type includes multiple fields with nested content types such as the Summary, Editions, and Sections. And at the same time, the Section content type includes the field Lessons linked to another nested content type. The following image partially illustrates the mapping between the Microcourse content type and the Microcourse Overview page for one of the Mircorourses on the platform: Applying Strategic Thinking in Government.

Microcourse Overview page and Microcourse content type mapping

Building the content migration script

Once you have designed and created the content model on Contentful, it’s time to write some more code. The focus of this section is the transfer of the content from source to target, or in other words, from WordPress to Contentful.

The migration scripts are exposed through the internal CLI tool, for instance through the migrate-microcourse command being implemented. There are numerous ways of storing your content on WordPress and even more ways to access the content. For example, content can be accessed through REST APIs, such as the Articles API on the Apolitical platform, or even by directly connecting to the database. Yet, to keep the blog post relevant to all readers, emphasis will not be placed on how to access the content, but rather on how to structure the migration script and how to reformat and convert the content.

The first problem is around how to reformat WordPress posts from HTML to markdown. To solve that, a helper function was implemented based on this related blog post and with the use of the turndown, turndown-plugin-gfm modules, and the node-wpautop module. Additionally, it is required to extract all the references to assets stored in WordPress and convert them into Contentful assets. You can imagine how many of the WordPress posts would contain images or any other media files. Those files would need to be uploaded to Contentful. The asset.type.js file handles uploading media files, and, in the same way, the microcourse.type.js file handles uploading the Microcourse content type. At this point, the concept of .type.js files needs to be introduced. The concept of .type.js files is an arbitrary structure designed to contain all the necessary logic that is required to upload individual content types. The following image partially illustrates the microcourse.type.js file.

Each .type.js function exposes an object with the build function which handles the complete process, and internally, the build function would first execute the preprocess function and then the upload function:

The preprocess function is in charge of parsing the inputs, content coming from WordPress and mapping the content to match the content type fields. In the case of fields with nested content types, calling the build function of the relevant .type.js file would make sure the nested entry is uploaded and available on Contentful before uploading the current one. This guarantees that the links between entries are in place.
The upload function is in charge of uploading the entries by calling the uploadEntry function which uses the CMA to upload and publish entries.

Apart from that, the assetModel and entryModel would simply read the entry sys.id and create the JSON object required to link nested assets or entries.

Presenting the content from apps

You have made it to the last section! It is time for the most rewarding part of the migration process. All the content has now been stored on Contentful and all the benefits from using Contentful APIs will start to shine.

It is a good idea to study the different content delivery architectures and their optimal use cases. The APIs and Delivery Architectures course covers everything you need to know. The frontend of the Apolitical platform is implemented with single-page React apps, and those apps are bootstrapped with Create React App (CRA). For that reason, with the use of the dynamic on-device delivery architecture, the apps directly make Contentful API calls and fully leverage Contentful’s Content Delivery Network (CDN). It is important to note that Search Engine Optimisation (SEO) support requires more effort when using this architecture.

Dynamic on device delivery architecture. Graphic by Contentful

The contentful.js module can fetch the content by making use of Contentful’s Content Delivery API (CDA), which is very easy to integrate with React apps. However, at Apolitical, a wrapper function was internally developed to flatten the JSON objects returned by the module because they are cumbersome to work with, given that every entry contains system data that is rarely used.

Conclusion

This blog post has covered the obvious and the not so obvious of moving content around. You have taken a close look at the steps required in the process of migrating digital projects from WordPress to Contentful: getting started with Contentful; designing the content model; building the content migration script; and presenting the content from apps.

It would be very unfair to have come so far and to end the blog post without an explanation of our results at Apolitical. On one hand, it’s important to highlight that the migration process is not as easy as you may think it is. You should not underestimate the time and effort required to complete it. On the other hand, the user experience was notably improved by reducing the Time to Interact (TTI) by half. The Lighthouse reports showed an increase in performance of about 20%. The overall scalability and maintainability of the platform were improved. There is no more having to deal with database CPU usage spikes caused by the Articles API, as desired at the start of the Journey to Finding the Right CMS.

Perhaps the most important benefit from the migration was that the content team workflow was completely revamped. The introduction of a new headless CMS with a great visual editor (the Contentful Web App) was a great success and makes it really stand out. Some of the major improvements here include the ability for content to be previewed before publishing; the navigation and search of content run smoothly; localisation can be easily extended; and many more additional content management capabilities.