The Great Migration: WordPress to Contentful

Part 2: Leveraging Contentful Rich Text Fields

Published in

Flatiron Labs

8 min readSep 13, 2019

Welcome to part 2! If you missed part 1, you might want to read it so you can gain a fuller understanding of what we aimed to accomplish. Otherwise, keep reading for the much anticipated ending to our journey!

Contentful Rich Text Fields

The rich text field on Contentful is the most fancy field type you can get on their platform. It provides a nice UI for adding extra formatting to text. It also allows you to embed other content types either inline or as a block right in the field as long as you handle its rendering in your codebase.

https://www.contentful.com/developers/docs/concepts/rich-text/

We wanted to push up directly to a rich text field, but because of its complex structure, we decided to push our markdown up to a markdown field and use Contentful’s rich-text-from-markdown library to convert our markdown to rich text and then push it into a rich text field.

Upload Time!

It’s time to finally upload our blogs to Contentful. We used the Contentful Management API via their contentful-management gem to accomplish this. Here’s what we did:

require 'contentful/management'
require 'csv'blogs = CSV.read("./results/blog_posts_with_tags.csv", headers: true)# To prevent the creation of duplicates, we created the content model entries for each of the tags, campuses and authors and stored their IDs here so that we can do a simple lookup to see if we already created oneTAG_IDS = {
  "Example tag" => 'tagId'
  # more tag ids omitted for brevity
}CAMPUS_IDS = {
  "Example campus": "campusId"
}AUTHOR_IDS = {
  "Example author": "authorId"
}# Contentful clientclient = Contentful::Management::Client.new('CONTENTFUL_API_KEY', raise_errors: true)# To find the correct environment, we search for it based on the client stored in the variable aboveenvironment = client.environments('CONTENTFUL_SPACE_ID').find('CONTENTFUL_ENV_ID')# Gets content types that will be used to find or create entries of these typesblog_type = environment.content_types.find('blogPost')
person_type = environment.content_types.find('person')
campus_type = environment.content_types.find('campus')
tag_type = environment.content_types.find('tags')blogs.each do |blog|
  begin
    tags_arr = []# There are a finite amount of campuses and all of them are in the CAMPUS_IDS hash aboveif (CAMPUS_IDS[blog['campus']])
      campus_entry =
environment.entries.find(CAMPUS_IDS[blog['campus']])
end# For the tags, the logic is a bit more involved because if there is no tag present, we want to use a default tag, (we made it a required field for our blogs and if we didn't provide a tag, things would break). However, if it exists we either find it using our hash of tags above or create a new one.if (!blog['tag'])
      tag_entry = environment.entries.find(TAG_IDS['Default Blog Tag'])
      tags_arr << tag_entry
    else
      slug = blog['tag'].downcase.gsub(' ', '-')if (TAG_IDS[blog['tag']])
        puts 'found tag'
        tag_entry = environment.entries.find(TAG_IDS[blog['tag']])
        tags_arr << tag_entry
      else
        puts 'creating tag from blog'
        tag_entry = tag_type.entries.create(name: blog['tag'], slug: slug)
        TAG_IDS[blog['tag']] = tag_entry.id
        tags_arr << tag_entry
      end
    end# Find or create authorif (AUTHOR_IDS[blog['author']])
      author_entry = environment.entries.find(AUTHOR_IDS['Flatiron School'])
    else
      author_entry = person_type.entries.create(name: blog['author'], jobTitle: 'Blog Post Author')
      AUTHOR_IDS[blog['author']] = author_entry.id
    end# Creates the blog postentry = blog_type.entries.create(
      title: blog['title'],
      publishedAt: DateTime.parse(blog['publishedAt']),
      markdown: blog['content'],
      slug: blog['slug'],
    )# Associates tags, campus and author to blog postentry.update(tags: tags_arr)
    entry.update(campus: campus_entry)
    entry.update(author: author_entry)# Throttle the request so we don't get rate limit errorssleep 0.15# Print out the blogs that didn't successfully upload to Contentfulrescue => error
    puts '______________________________________'
    puts blog['id'], blog['status'], blog['slug']
    puts error
  end
endputs 'DONE 🎉'

Amazing. At this point, all of our blogs are on Contentful with their content in the markdown field. The next step is to convert those markdown fields to rich text.

Converting to Rich Text 🤑

Time to switch to using JavaScript! We used Contentful’s contentful-migration and rich-text-from-markdown libraries to handle the conversion of markdown to rich text. The contentful-migration library handles the actual passing of data between the fields and the rich-text-from-markdown library handles converting the markdown itself. There’s only a bit of setup for the contentful-migration portion:

// convert_markdown_to_rich_text.jsconst runMigration = require('contentful-migration/built/bin/cli').runMigration
const dotenv = require('dotenv')
dotenv.config()// We define some options that allow the library to find our space and environment on Contentful and then point it to where our migration lives in our file treeconst options = {
  filePath: 'data/migration-test.js',
  spaceId: process.env.CONTENTFUL_SPACE_ID,
  accessToken: process.env.CONTENTFUL_MANAGEMENT_API_KEY,
  environmentId: process.env.CONTENTFUL_ENVIRONMENT_ID,
  yes: true
}// Runs the migrationrunMigration({...options})
  .then(() => console.log('Migration Done!'))
  .catch((e) => console.error)

Now for what you probably clicked on this post for:

Here’s the migration itself:

// migration-test.jsconst { richTextFromMarkdown } = require('@contentful/rich-text-from-markdown')
const { createClient } = require('contentful-management')// Our function takes in the migration for free from the runMigration function in config.js. We also get our space id, environment id and access token.module.exports = async function(migration, { spaceId, accessToken, environmentId }) {// We need to find our client, space and environment because, like we saw when we used the ruby gem above, to get to the environment which is where we create entries, we need our space and client first.const client = await createClient({ accessToken: accessToken })
  const space = await client.getSpace(spaceId)
  const environment = await space.getEnvironment(environmentId)// We call the transformEntries function on our migration to ask the library to find our blog post content model and for each one, take its markdown field, do something to it (defined below) and push that result into its content field. The shouldPublish attribute set to true also publishes it rather than leaving it as a draft.migration.transformEntries({
    contentType: 'blogPost',
    from: ['markdown'],
    to: ['content'],
    shouldPublish: true,// The transformEntryForLocale attribute's value is an anonymous function that is called with the value of the current field (fromFields) and that field's locale (currentLocale)transformEntryForLocale: async function(fromFields, currentLocale) {// If the currentLocale isn't 'en-US' or if the markdown field is empty we want to move on and process the next field rather than waste time trying to process something that isn't thereif (
        currentLocale !== 'en-US' ||
        fromFields.markdown === undefined
      ) {
        return
      }// This is where more ✨magic✨ happens. Here we call on the powers of the rich-text-from-markdown library to convert the nodes of our markdown field into nodes that the rich text field can understand. If it comes across a node that it can't automatically parse, it's passed into the second argument of our richTextFromMarkdown function which then passes it into a switch statement that is able to determine what kind of node it is. In our case, code blocks and images were the ones we had to define manually.const content = await    richTextFromMarkdown(fromFields.markdown['en-US'], async (node) => {
        switch (node.type){
          case 'code':
            return processCode(node)
          case 'image':
            return await processImage(environment, node)
        }
      })// This is where the regular text nodes are handledtry {
        return {
          content: content
        }} catch (error){
        console.error
      }}
  })
}// If the richTextFromMarkdown comes across a code block, the node is passed into this helper function that converts it to a format that the rich text field can understandconst processCode = async (node) => {
  return {
    nodeType: "blockquote",
    content: [
      {
        nodeType: "paragraph",
        data: {},
        content: [
          {
            nodeType: "text",
            value: node.value,
            marks: [],
            data: {}
          }
        ]
      }
    ],
    data: {}
  }
}// If the richTextFromMarkdown comes across a image, the node is passed into this helper function that creates an asset in our Contentful environment, uploads and publishes that image and returns it in a format that the rich text field can understandconst processImage = async (environment, node) => {
  const title = node.url.split('/').pop()
  const ext = title.split('.').pop()const asset = await environment.createAsset({
    fields: {
      title: {
        'en-US': `Blog post image: ${title}`
      },
      description: {
        'en-US': node.alt || `Blog post image: ${title}`
      },
      file: {
        'en-US': {
          contentType: `image/${ext}`,
          fileName: title,
          upload: node.url
        }
      }
    }
  }).catch(e => console.log('in create asset catch'))asset.processForAllLocales()return {
    nodeType: 'embedded-asset-block',
    content: [],
    data: {
      target: {
        sys: {
          type: 'Link',
          linkType: 'Asset',
          id: asset.sys.id
        }
      }
    }
  }
}

DONE 🎉

And that’s that.

This was a ton of work with a lot of trial and error. I excluded all of the rabbit holes and just included what works. That being said here are some key learnings:

Ask for extra eyes from colleagues sooner rather than later. This project took me about a month. The length was due in part to the amount of trial and error I endured and the lack of resources available on the internet. However, I’m sure it could have been shortened if I asked for extra eyes sooner rather than later.

Sometimes the work you do as an engineer will just straight up suck. There were parts of this project (the gloomy Friday afternoon) where I felt really uninspired and burned out. However, I stuck through it, and now I can look back at that experience and realize how much I learned from it.

Growth lies outside of your comfort zone. This is probably obvious for most people, but it really rang true for me during the course of this project.

Take a vacation. The burnout was real after this one so I made sure to take 1.5 weeks off to recuperate. Luckily, I work with a team of very understanding engineers and managers so this wasn’t an issue at all.

Wait, there’s more

Here’s a bonus script from when we needed to downgrade headings for accessibility purposes:

require 'contentful/management'client = Contentful::Management::Client.new('CONTENTFUL_API_KEY', raise_errors: true)environment = client.environments(CONTENTFUL_SPACE_ID).find('CONTENTFUL_ENV_ID')entries = client.entries(CONTENTFUL_SPACE_ID, CONTENTFUL_ENV_ID).all(content_type: "blogPost", limit: 100)while entries.next_page
  entries.each do |blog|
      puts blog.title
      blog.markdown = blog.markdown.gsub(/(^# )/, "### ")
      blog.markdown = blog.markdown.gsub(/(^## )/, "#### ")
      blog.save
  end
 
  entries = entries.next_page
endputs 'DONE 🎉'

Thanks for reading! Want to work on a mission-driven team that loves the JAM stack? We’re hiring!

To learn more about Flatiron School, visit the website, follow us on Facebook and Twitter, and visit us at upcoming events near you.

Flatiron School is a proud member of the WeWork family. Check out our sister technology blogs WeWork Technology and Making Meetup.