Tagged and bagged: migrating AEM pages to use new tags requires planning and care

Handle with care: a guide to safely migrating AEM pages to using new tags

Rachna Mehta
The Telegraph Engineering

--

Introduction

At The Telegraph, we use Adobe Experience Manager (AEM) as our web platform, with various websites running on it. The entire website is divided into two sections:

  1. Core web which includes content from the wider Editorial teams (News, Business, Sport etc)
  2. Travel web which includes content from the Travel Editorial team only.

The Core Web team uses a custom authoring tool that runs on top of AEM Author, providing a bespoke UI engineered for Editorial’s needs and helping to streamline the creation, updating and publication of content on the website. The authoring tool also includes a cool feature, whereby new topic pages are created as soon as X number of pages are given the same tag and published. In contrast, our Travel Editorial team has historically used AEM’s Classic UI for their purposes of creating and tagging content..

We were given a new requirement: the Travel Editorial team should also use our custom tool for their daily workflow so they too can benefit from its streamlined processes. Doing this would also mean the auto-topic page generating feature would be available to the Travel team at no extra development cost. To make this happen, we would first need to migrate all of the AEM Tags used by Travel from the Classic UI to the custom tool.

But how does this migration work, and what is its role in moving Travel Editorial over to our custom tool? The Telegraph’s custom authoring tool supports a few tag namespaces, but the Travel namespace is not one of them. While we can change the custom authoring tool to support the Travel namespace, this is not a good idea, as the current Travel tags would be duplicated and not properly named. This would make it impossible to maintain them further and would require a clean-up before we started using them. What follows is a detailed explanation of how we carried out the migration.

How did we do this?

We created a working group that consisted of Engineers, Architects, Editorial, Adobe Managed Services & the Delivery team. We spent a significant amount of time generating a mapping from the old tags to the new tags — there were nearly 4000 travel tags that needed mapping. After discussing various approaches we divided the entire migration into various phases, detailed in the table below. This set of phases came about because we were trying to automate as much of the process as possible & minimise the amount of manual work that would need to be done afterwards.

Phase 0 : Backup of existing travel tags & travel content.

This phase was required for backing up content so it could be recovered if we uncovered any issues during migration.

Implementation

We wrote a script in this phase to list all the tags under /etc/tags/travel namespace and all the pages under /[Telegraph_Root]/travel ordered by latest modifications.

Later, we packed up all the content in a few content packages in AEM using an automated script.

Phase 1: Create new Travel tags with namespaces supported by custom authoring.

Since we decided on a cleaner approach of creating new Travel tags with namespaces supported by our custom authoring tool, this phase was required to create these new tags for travel in AEM.

Implementation

We received the new tags in a spreadsheet that contained all the tags which needed to be created. We wrote a bash script to read this spreadsheet and saved it in a csv file which then got uploaded to an S3 bucket, so a groovy script could read them one by one and create the tags in AEM Author if they didn’t exist. We also produced an automated test script to verify that all the new tags were created properly, as manually checking hundreds of new tags was simply impossible.

Phase 2 : Add new Travel tags to existing Travel pages.

We decided to add the new tags to Travel pages and have them co-exist with the old tags initially, instead of just replacing the old tags with the new tags in one go. This is because we have list components on section pages, which have tags configured to make a search in AEM. If we had just replaced the tags, we could have ended up with empty blocks on those section pages.

Implementation

We produced a bash script to extract the mapping of old tags & new tags from a spreadsheet and upload the mapping file to an S3 bucket. We then wrote a few groovy scripts for migration as well as automation:

  1. The migration groovy script went through entries in the mapping file, made a search in AEM based on the old tag, added the new tags to all the query result pages, repeated the process for the next entry in the file & so on.
  2. The verification groovy script highlighted the pages not updated in step 1.
  3. This groovy script went through a fixed list of URLs produced in step 2 and added new tags to them.
  4. We ran step 2 again to make sure all the pages with old tags had got new tags on them.
  5. A manual check; we opened a few random page URLs and checked that the page properties dialog had new tags and that the page looked OK visually.

AEM learnings from this phase

  1. AEM checks if the tag exists before saving a session for cq:tags property updates.
  2. AEM traverses tags tree to find out if tags exist.
  3. AEM sometimes decided to not save the session (might be because tag traversal takes time and session times out!) hence we had to write a script to go through missed pages and update cq:tags property. We had to run multiple iterations of step 3 to pass the verification script.

Phases 3 & 4: Migrate Destination & Onward Journey Lists in an automated way

Phases 3 and 4 were very similar, as in migrating List Components/replacing tags on those List Components from old tags to new tags. Telegraph Travel pages have varieties of List Components out of which these two types of List Components should have a tag based on their page URL.

Implementation

We produced a bash script to extract the List Component URLs from a spreadsheet, generated tag names based on URL, stored them in a CSV as comma separated entries, uploaded the CSV to an S3 bucket and ran a groovy script to go through each entry in the file. This updated the List Component with this new configuration and saved the changes. We also wrote a verification script to verify that all the List Components had new tags saved as their configuration. A manual check was also performed to ensure the pages looked OK visually.

Phase 5: Migrate single-tagged List Components

Phase 5 needed to migrate List Components configured with a single tag, script needs to know what should be the value of the new tag, hence it’s slightly different from phases 3 and 4.

Implementation

We wrote two bash scripts to read a spreadsheet and create two CSV files:

  1. A CSV file detailing all the List Components to be migrated.
  2. A CSV file mapping old tags to new tags.

These two CSV files were then uploaded to an S3 bucket and a groovy script went through the list of urls, updating the List tag configuration by looking at the map and saving a session. Like all the other phases, we wrote a verification script that went through all of the List Component URLs and checked that new tags were saved as List configurations. Again, a manual check was performed to ensure that the page appeared correct visually.

By doing Phase 3, 4 & 5, we got from about 7000 pages down to only 300 that would need manual attention.

Phase 6 : Migrate multi tagged List Components

This phase required no automated migration as migrating multi-tagged List Components proved to be a complex process, so we decided to migrate them manually. For this phase we only wrote a bash script to extract list URLs and store them in CSV, which could then be given to Editorial to manually migrate List Components.

Phase 7 : Clean up — Script to remove old travel tags.

This phase was about removing tags with the Travel namespace (i.e. old tags) from the pages. A list of these pages had been generated as part of phase 2, when we added the new tags to them. So we wrote two groovy scripts:

  1. To remove the old tags from all pages.
  2. To verify that pages no longer had the old tags.

The last thing was to manually check a sample of random pages using page properties and verify that old tags were not present.

Phase 8: Delete travel namespace.

This phase involved removing old namespaces manually using the CQ interface and verifying that old tags were not searchable in the Tag field in UI.

Rollout

Once the implementation was designed and built, we needed to decide on how to roll out the phases. For that, we decided to migrate content on our staging environment and then install migrated content on the production environment. This was for a couple of reasons:

  1. It required multiple scripts to migrate different types of pages and we didn’t want to block Editorial work while it took place.
  2. It involved dealing with tags on a total of about 40,000 migrated content pages.

Since migration was only going to happen on the author instances, we needed to activate pages so that the changes were available on the website. We also had to ensure we were only activating previously-activated pages. We decided to create four buckets using the same migration script for each phase as shown below. Each bucket contained a list of URLs as follows:

  1. List of URLs that need to be packaged up to be installed on author.
  2. List of URLs that need to be activated after installing on author.
  3. List of URLs that were modified before migration and required manual activation.
  4. List of URLs that were deactivated or created but not activated.
  5. List of URLs that didn’t have properties to identify whether a page was already activated or not.

This shows a summary of the number of pages/tags in each bucket during real run

Dry Run & Real Run

We wanted to do a dry run before going live for a couple of reasons:

  1. To help estimate the time it would take to complete the process.
  2. To avoid any last-minute issues that could delay the process or could cause production issues.

Environments:

We needed two production-like environments so we used our Pre-Prod & Staging environments.

We pretended Pre-Prod was Staging and that Staging was Prod. The reason we treated Staging as production was that Staging was closer to production in terms of configurations and integrations with other Pre-Prod environments.

Steps:

The table below shows the steps we took. As mentioned above, for the dry-run, Pre-Prod takes the place of “Staging”, while Staging takes the place of “Prod”.

Learnings from the dry-run

  1. We learnt that the order of installation was very important. We couldn’t install Content before Tags because AEM removes the tag from the page while installing migrated packages if the tag doesn’t exist, so we ended up having non-migrated content installed. Thus we learnt tags had to be installed before migrated content.
  2. AEM runs Tag Garbage Collection every night by default and it deletes hidden and unused tags. This became an issue for us during the activation of migrated pages; we noticed a few List Components were coming out as empty on the publisher because migrated List Components were not activated yet. We decided to change the schedule of Tag GC to monthly, as opposed to daily, as that covered our migration period.

The PDF on this link explains how Tag GC affects this project.

Rollback Plan(step 8 & 9 from the above table)

If there was any issue during migration we had to be ready with a rollback plan and we decided to follow the steps below in the event that we needed to recover any original content. We brainstormed about various practical options and produced pros and cons for each before deciding on the final strategy. They are linked in this PDF and this PDF.

Step 8 : Delta content rollback

  1. Get the content urls from migration run on staging, use our custom create-package utility to create multiple packages on Prod for delta backup.
  2. Try to recover the content using delta backup packages. It’s important to note that only those pages will remain impacted, rest of the editorial work can still continue during this rollback.

Step 9 : Entire Travel content rollback

If we couldn’t rollback using Step 8, we decided to be ready with the entire Travel repository content, we did exactly the same step but we identified the list of urls using the first step below.

  1. Get the latest content based on the last modified date using the Query Packager Tool.
  2. Use our custom create-package utility to package up content and tags in multiple packages.
  3. Perform above steps before installing migrated content on prod.
  4. Install the packages created in step 2 to recover original content. It’s important to note that only travel editorial will be impacted & rest of the editorial can still continue while we do this rollback.

Disaster Recovery Plan

We also came up with a disaster recovery (DR) because if recovery was not possible using rollback, we needed this option to recover our author instance. We decided to be ready with the following steps.

  1. Take the author snapshot before installing migrated content
  2. Imagine there is an issue and we had to do DR.
  3. Use the Query Packager Tool to get the latest content since the last snapshot was taken i.e. delta content to not lose any content.
  4. Get the packages.
  5. Minimum Downtime of 1 hour for the Editorial for configuring author with new snapshot.
  6. Power up the production instance to speed up disk warmup.
  7. Install delta content created in step 4.
  8. Run ElasticSearch reindexing. This step was required because our custom authoring tool uses ES for various features.
  9. The whole of Editorial is back. We rehearsed this before going live and calculated 3 hours of downtime for the entire editorial to be back on at this stage.

It is important to note that neither Rollback nor Disaster Recovery would impact the public facing website and it can still function as normal! The only impacted area is Author.

Result

The migration went really smoothly and we performed the real run on production without any issues and without causing any disruption to the editorial teams, who continued to use the Author instance in the background.

This was really hard work by the entire Travel crew and showcased great collaboration amongst engineers, architects, editorial, Adobe Managed Services and the delivery team. All the hard work paid off at the end, when we migrated 40,000+ pages on production. The following are screenshots of tags in Page Properties (the phase 2 migration) before and after migration.

Before:

After:

List Migration(phase3,4 & 5) screenshots

Before:

After :

Rachana Mehta is a Senior Software Engineer at The Telegraph. You can follow her on Twitter at Rachna81185836.

--

--