Let’s look at a basic example of migrating content from a Drupal site to a Wagtail site. This post explains how to prepare your Drupal data for exporting and how to write a content importer for Wagtail so moving won’t be the most stressful time of your life.
A Drupal migration is scary and usually, the type of task that’s reserved for Senior Developers. Just tinkering with the Drupal migrate module was enough to put me off learning more about it. Migrating content from one system to another presents a lot of problems to solve and pre-written solutions won’t catch them all. So how hard is it to migrate content into Wagtail?
Understanding what we need to do
The easiest way for me to think about this task is to break it down like so:
- Create a Drupal view to give us an idea of how much content we are dealing with, listing node types and counts.
- Create a mapping sheet to understand the differences between each content type in Drupal and Wagtail.
- Export the Drupal node data.
- Run a python script to programatically create content using the Drupal data export.
Steps 1 and 2 are important, you’ll find this information out in step 3 anyway, but a skilled developer once told me that at least 30% of your work should be planning. I like to do step 1 because it gives me an idea of the overall migration. Step 2 is essential. A mapping sheet with a tab for each node type will give a really good idea of targets and sources at a field level and will help you spot any possible tricky bits way up front. Here is an example of what that might look like (what we are actually migrating in this post is a lot simpler):
Step 3 can be done using Drupal modules, for this example we will be using core Drupal 8 module — Restful Web Services. For step 4 we’ll be working with a few django, wagtail and python modules. Essentially the Wagtail solution will be split into two main areas:
These handle formatting/converting the source data and creating the destination models.
Django management commands
Responsible for instantiating the importer classes with correctly formatted JSON data and calling the main process() method to perform the import.
Producing the Drupal source data
With steps 1 and 2 done, we should have a good idea of what we are exporting. So it’s time to dive into Drupal and set up a data export. For this example we will be migrating the ‘article’ content type. Field wise it’s pretty basic:
If you’re using Drupal 8, turn on the RESTful Web Services core module. For Drupal 7, you can use some community modules that prodive JSON exports:
- The Drupal services module: https://www.drupal.org/project/services
- Drupal Views Data Export: https://www.drupal.org/project/views_data_export_json
My example source data is from a Drupal 8 site, so after enabling the above, we can create a view showing all the node data as JSON by using the ‘REST export’ display provided by the module:
- To change the label value in the JSON change the views format settings.
- Depending on your core version, you might need to append your URL, e.g:
news/export?format_jsonwhen viewing that actual page, not in the view path itself
- For images and files, you’ll want to add a relationship to the view
Here is an example of a node from that view, note we are getting more than just the image and body fields:
There should be one source data file per content type, which will map to one importer class per content type, e.g. the NewsImporter class will import from the news.json file.
The page model we are migrating to in Wagtail consists of the following:
Nothing too special here, but a few things to note:
- legacy_id and legacy_url will some in handy when checking if the content already exists and creating a url slug.
Here’s a tree view of the the data_migration app we are going to use:
│ └── news.json # our exported JSON data
│ ├── base.py # Base importer class
│ ├── __init__.py
│ └── news.py # The news importer (inherits base importer)
│ ├── commands
│ │ ├── base.py # Base class for importing file into Wagtail
│ │ ├── import_news.py # Specific import managment for news
│ │ └── __init__.py
│ └── __init__.py
└── README.md # Instructions for running
How we are going to actually run the code
Imports are run via management commands that are responsible for providing the importer with the source data in JSON format and, for pages, the parent page to import under. The data file needs to be uploaded to a location that the server can access them. By default the commands will look for files relative to manage.py, e.g:
/manage.py [importer] [parent_page_id] [data_file.json]/manage.py import_news 4 wagtailmigration/data_migration/data/news.json
Here is a link to the example data_migration app on github. But here’s some highlights of the code
BaseCommand and we target this in our command above. So for each content type we would create more command classes. Along with the
BaseCommand methods, here we are instantiating a new importer object, NewsImporter…
Our importer extends the BasePageImporter (from importers/base.py). Using a base importer will ensure we don’t repeat anything we are likely to need in every importer. For example, node_id, titles, etc. This also means when we extend the BasePageImporter, we can build in any specific funcionality needed for the import we are running. So, for example, if a blog post has an extra field we need we can override the format_data method (like above, where we add in the body and publication_date). Have a scan of importers/base.py to get more of an idea of this.
We’ve had a look at migrating your Drupal content to a Wagtail site by giving an example of a very simple content type. There are many other aspects to migrations that will be more complex, like page relationships, Drupal taxonomy, and exporting Drupal paragraphs into Wagtail Streamfields but hopefully, this will give you an idea on how to start the process.
Check out the full data_migration example app on github