Importing an Atom Feed with the Drupal 8 Migrate API and Paragraphs (Part 1)
I recently worked on a Drupal 8 site where we had to create blog posts (nodes) from an Atom feed that updates regularly. In earlier versions of Drupal, this would have been a job for the Feeds module, but in Drupal 8 we use the core Migrate API instead.
Here are some of the features of this project.
- Handle an Atom feed with the Migrate framework.
- Download images and create File and Media entities in Drupal.
- Skip some entries based on a custom field.
- Split the main body into separate image and text paragraphs.
The last part is the most fun, but I will postpone that to Part 2 of this post.
There are several code snippets in this post. The full code is available at https://github.com/isovera/atom_migrate.
Getting Started
We already had a Drupal 8 site set up. I added the following contrib modules:
I also added a custom module, atom_migrate
, to hold the configuration and a little bit of custom code for the migration.
This is pretty standard for using the Migrate API in Drupal 8. The Migrate Plus module adds several features, including the framework for handling XML sources. The Migrate Tools module provides drush
commands for managing migrations.
I use the Features module to update the site configuration after making updates to my module. I declare the module as a feature, and then
drush fim atom_migrate
after making changes. Another method is to uninstall and re-install the module (ugh). I am told that you can put your YAML files under atom_migrate/migrations/
instead of atom_migrate/config/install/
, and then clear caches (the plugin cache, to be specific), but I have not tested this. I have also seen the Configuration development recommended, but I have not tested that, either.
Finally, this migration is supposed to run periodically, so we need a way to trigger that. One way would be to set up a cron
job on the server that invokes
drush migrate-import --group=atom --update
On this project, we decided it would be more portable to avoid server dependencies, so I implemented hook_cron
:
File atom_migrate/atom_migrate.module
(excerpt)
use Drupal\migrate\MigrateExecutable;
use Drupal\migrate\MigrateMessage;/**
* Implements hook_cron().
*
* Run the migrations.
*/
function atom_migrate_cron() {
$manager = Drupal::service('plugin.manager.migration');
$migration_ids = ['blog_image', 'blog_featured_image', 'blog_node'];
foreach ($migration_ids as $migration_id) {
$migration = $manager->createInstance($migration_id);
$executable = new MigrateExecutable($migration, new MigrateMessage());
$executable->import();
}
}
Room for improvement
- I do not think this code will update existing articles, as
drush migrate-import --update
would. - I should get the migration IDs programmatically instead of hard-coding the list. As it is, if I add a new migration, then I will have to update this code.
Maybe Migrate Tools should add API functions so that it is easier to get the same effect as the drush
commands from custom code.
Configuring for an Atom feed
The Migrate Plus module provides plugins for downloading a file from an external URL (or from a local file, useful for testing purposes) and for parsing XML, so this is mostly just a question of configuration. I have the following configuration in migrate_plus.migration_group.atom.yml
:
File atom_migrate/config/install/migrate_plus.migration_group.atom.yml
(excerpt)
shared_configuration:
source:
plugin: url
data_fetcher_plugin: http
data_parser_plugin: xml
namespaces:
atom: 'http://www.w3.org/2005/Atom'
urls: 'https://api.example.com/v2/feed?format=atom'
item_selector: '/feed/entry'
ids:
guid:
type: string
The one difficulty I had is that the XML I got had some elements qualified by namespaces and some (including everything from the Atom spec) without. Thanks to @robcast
on the Drupal#migration Slack channel for pointing out thenamespaces
option, which has the effect of doing
$xpath->registerNamespace('atom', "http://www.w3.org/2005/Atom");
in PHP. This lets me use selectors like atom:id
to target XML tags like <id>
(no namespace). Curiously, this option does not seem to apply to the item_selector
key.
File atom_migrate/config/install/migrate_plus.migration.blog_node.yml
(excerpt)
source:
fields:
-
name: guid
label: Guid
selector: 'atom:id'
(The selector
is relative to the item_selector
used above.)
Before using the namespaces
key, I found a work-around with a little help from Google and Stack Overflow:
selector: '*[local-name()="author"]/*[local-name()="name"]'
Skip some entries based on a custom field
In order to use the Migrate API effectively, it helps to get some practice with chaining multiple process plugins. The building blocks are there, and a few examples go a long way in learning how to use them.
For example, the feed I was using had some articles that I wanted to import and also some items that I wanted to ignore. In the source
section of my migration, I defined the content_format
key and the XPath that selects it. Then I chained the static_map
and skip_on_empty
plugins in the process
section of the migration:
File 'atom_migrate/config/install/migrate_plus.migration.blog_node.yml’
(excerpt)
process:
blog_type:
-
plugin: static_map
source: content_format
map:
Blog: Blog
-
plugin: skip_on_empty
method: row
The static_map
plugin converts “Blog” to “Blog” (not much of a conversion) and anything else to NULL
. In the latter case, the skip_on_empty
plugin cancels processing of the current item.
There is not actually a field called blog_type
. The Migrate API lets you make up a destination field like this and ignore it, or use it as an intermediate result in other fields. There is an example of this in the next section.
Creating File and Media entities
There are two ways to manage complex migrations. The first is to have a separate migration for each step of the process, and that is how I managed the “featured image” for my migration. (This is not part of the Atom specification. It is a custom field on the feed I was using.)
The second method is to create the intermediate entities in the “process” phase of the migration. My next blog post will give an example of this method.
Here is an example of using separate migrations. The first migration uses the download
process plugin to fetch images referenced in the feed and create file
entities of type image
:
File atom_migrate/config/install/migrate_plus.migration.blog_image.yml
(excerpt)
source:
fields:
-
name: url
label: 'Image URL'
selector: 'media:content/@url'
constants:
image_base_dir: 'public:/'
image_name: 'post.jpg'
date_format: Y-m
process:
settings:
plugin: skip_row_if_not_set
source: url
temp_date:
plugin: callback
callable: date
source: constants/date_format
temp_image_uri:
plugin: concat
source:
- constants/image_base_dir
- '@temp_date'
- constants/image_name
delimiter: /
uri:
plugin: download
source:
- url
- '@temp_image_uri'
rename: true
type:
plugin: default_value
default_value: image
destination:
plugin: 'entity:file'
There are a few things going on in the snippet above.
The settings
key is a fake destination. It is just there so that the migration will quit early if the url
for this row is empty.
First look at how I create the destination file name.
- I have a fake source key called
constants
. (You can call it whatever you want, butconstants
is the convention.) It has several sub-keys, which are referenced asconstants/image_base_dir
,constants/image_name
, and so on. - I set one of my fake destination keys,
temp_date
, using thecallback
plugin: this has the effect of setting this intermediate result todate('Y-m')
, or something like2017-12
. - I set the next fake destination key,
temp_image_uri
, using theconcat
plugin to paste together'public:/'
, the date string I just created, and'post.jpg'
, using/
as glue. That gives something likepublic://2017-12/post.jpg
.
Next I use the download
plugin. The first argument is the URL of the image file, which I have defined in the source
section of the migration. The second argument is the destination file name. Since I supply the optional rename: true
key, Drupal will add _0
, _1
, and so on in order to create distinct file names.
The download
plugin returns the URI of the created file, something like public://2017-12/post_17.jpg
. I assign this to the uri
property of the File entity that this migration creates.
Now that we have the migration creating File entities, we could attach those files to a content type with a file-reference field, using the migration_lookup
plugin. In fact, we did something a little different on this project.
We decided to use the core Media module. This is a little aggressive with Drupal 8.4, but the Media module is developing quickly. It looks as though everyone will be using it once Drupal 8.5 is released, and we hope to be one step ahead of the crowd.
Since we are using Media entities, we have another migration that deals with them. Here is the interesting part of this migration:
File atom_migrate/config/install/migrate_plus.migration.blog_featured_image.yml
(excerpt)
process:
bundle:
plugin: default_value
default_value: image
field_media_image/target_id:
plugin: migration_lookup
migration: blog_image
no_stub: true
source: guid
field_media_image/alt: description
destination:
plugin: 'entity:media'
migration_dependencies:
required:
- blog_image
This migration creates Media entities.
The Migrate API keeps track of the association between guid
(the source key) and file ID in the blog_image
migration (the one before this). The migration_lookup
plugin translates the guid
to the file ID, and that gets assigned to the target_id
property of field_media_image
on the Media entity.
The alt
property is populated by description
: one of the source fields that I left out because there is nothing new about it.
The migration_dependencies
key tells the Migrate API that this migration should not be run until the blog_image
migration is complete. We can override that by adding the --force
option to drush migrate-import
.
The migration that creates nodes uses these Media entities, translating the feed’s guid
into a Media entity ID using the migration_lookup
plugin. There is nothing new in this step, but you can see the full code in the GitHub repository.
References
In the spirit of Open Source, I have borrowed heavily from blog posts, Slack messages, documentation, and other forms of support. Here are links to some of the sources I found helpful.
- Campbell Vertesi’s blog post Stop Waiting for Feeds Module: How to Import RSS in Drupal 8
- Mike Ryan’s answer to How to start a migration programmatically on the Drupal (core) issues queue
- The Migrate API handbook, especially the section on Migrate process plugins
- The #migration channel on Drupal Slack
I mentioned it at the top but here again is the link to the full code for this post: https://github.com/isovera/atom_migrate.
Here are all the contrib module mentioned above:
This article was originally published by Benji Fisher on our Isovera blog.