Migration approaches for AEM Guides & Best Practices
Introduction
I recently encountered a use case involving the migration of Salesforce content to AEM Guides for one of my company’s clients. Extensive effort was invested in research before the actual migration took place, and our team gained valuable insights throughout the process. As a result, I decided to share our learnings and experiences, hoping that they might assist others who are planning to migrate to AEM Guides.
This article provides insight into why you should consider using AEM Guides, outlining the various phases involved in migrating from other technologies to AEM Guides.
After reading, you will:
- Understand the benefits of transitioning to AEM Guides.
- Familiarize yourself with the steps necessary for a successful migration to AEM Guides.
- Learn about best practices to follow during the migration process to AEM Guides.
Understanding AEM Guides
AEM Guides(Also known as XML Documentation for Adobe Experience Manager) is a powerful, content management solution (CMS). AEM Guides is a plugin within AEM which enables native DITA support(content creation and delivery) in Adobe Experience Manager.
Benefits of moving to AEM Guides
The core functions of CMS include content creation, collaboration, review, translation, search, and report generation for DITA content. AEM Guides empower authors to perform all these essential CMS tasks using its web editor. Additionally, it supports efficient content reuse, translation, and robust DITA-based author review and publishing workflows.
Reviewed DITA files can be sent for output generation in multiple formats, such as Experience Manager Sites, PDF, HTML5, EPUB, and custom outputs through the DITA-OT plugin. The tool seamlessly integrates with Adobe FrameMaker and other desktop DITA editing tools, providing complete control over content integrity while collaborating across diverse user groups.
Understanding the transition to AEM Guides
The migration journey comprises several phases. Below, there is a pictorial representation of an end-to-end migration process:
The phases are elaborated further below,
Readiness/Conceive Phase :
During this phase, we develop a deep understanding of the existing system and delve into the relevant features provided by AEM Guides. This process includes conducting a thorough system study to comprehend the migration approach, pinpointing tools that align best with the requirements, assembling the right team, prioritizing tasks, performing a series of Proof of Concepts (POCs) to validate the chosen approach, and ensuring seamless access to customer environments.
The following subtasks are essential for a successful migration to AEM Guides:
1.Requirement elicitation: Here, we gain insight into the end-to-end source system. In our scenario, the website content was hosted on Salesforce. Content authoring took place within Salesforce, and the content was accessible through an API service. The existing AEM site instance made API calls to Salesforce, retrieving the content which was then rendered on the AEM pages. These AEM pages were generated for multiple regions and languages.
2.Choosing a scripting tool/language — The considerable factors to choose a scripting method is based on robustness, readymade plugins available, team members exposure to selected scripts etc — We have opted python for scripting.
Why Python?
Python is chosen for its several advantages. It is an open-source, versatile, and efficient object-oriented programming language. One of its key strengths is its vast and supportive community. During the conceiving phase, we carefully analyzed our requirements and found that Python perfectly aligns with our needs. It offers built-in libraries for essential tasks such as API invocation, data reading, CSV file processing, and more, making it easy to use and execute. Additionally, Python boasts fantastic data processing libraries, enhancing its appeal for our specific use cases.
Why not Java?
In Java, data manipulation often requires the use of third-party JARs like Jsoup, whereas Python offers readily available modules that are easy to invoke. Additionally, Java is typically employed for web applications and advanced coding tasks. In contrast, Python excels in scripting, making it a preferred choice for many leading organizations for tasks like script generation. Python’s ease of use and versatility contribute to quicker productivity compared to Java. Even individuals with limited programming knowledge can quickly learn Python and start implementing solutions effectively.
Next step: The POCs –
The next step involves conducting a series of proof of concepts (POCs). During this stage, specific hypotheses and strategies are tested to validate their feasibility and effectiveness. POCs provide a hands-on approach to assessing the viability of the planned migration, helping to identify potential challenges and refine the migration strategy before proceeding to the actual implementation phase.
i. Article fetch: There are multiple options to retrieve the content from Salesforce. One option was invoking the API by generating the authentication token. Second was to extract the articles directly from Salesforce as CSV file with its related html snippet and images in separate folders. We have done the POCs on both approaches and concluded that involving the API is time consuming and error-prone if there are network issues( whenever there are large numbers of articles involved). On the other hand, exporting data through CSV was straightforward and quicker.
ii.Content cleanup: Using python script we have tried out various types of content clean-up including the HTML sanitization. This is very much required since the DITA has strict validations in place.
iii.H2D plugin: The H2D plugin enables the html to DITA conversion. We ran our POC for the H2D Conversion and identified the XSLT rules to be updated for a better formatted DITA output.
iv.Define a metadata schema: After reviewing the input CSV and engaging in further discussions, we have determined and finalized the schema for AEM DITA Files, which subsequently integrates into the AEM site page. A custom schema was meticulously developed to ensure seamless compatibility and efficient data processing.
v.Asset upload: Our scripts successfully retrieved Salesforce articles and generated corresponding DITA files. Simultaneously, another script was created to generate a CSV file designed for bulk asset upload in AEM, complete with metadata. This metadata was carefully selected from the source CSV and seamlessly integrated into the bulk asset upload CSV. The mapping was done carefully, ensuring alignment with the corresponding AEM metadata schema.
3.Developer System Setup: On the local AEM we have installed the UUID version of AEM Guides(XML Documentation Plugin), ensuring we are able to do the DITA authoring & site generation using the setup.
4.Access & Environment Provisioning:
We have ensured all the relevant access is obtained from customers and ensured DITA specific configurations are deployed before we start the implementation.
Implementation Phase:
In this phase we are utilising the tools and technology efficiently to get the content migrated in DITA format. Once the expected DITA format is ready, we will have to move the files to AEM Guides and do a validation to ensure the generated DITA is in allowed syntax.
A set of predefined steps were executed during implementation.
- Define the DITA folder structure — By considering the future growth of DITA files, we will have to plan the folder structure. This will help customers to have scalable, performance optimized environments in future.
- Define the output site structure considering the multilingual pages — This helps us in organizing the site towards a multi-lingual multi-region approach using ISO Country code. This is very much required when we plan for DITA translation.
- Script to generate DITA (input read, clean-up, html to dita conversion, Post clean-up if any)
Scripts play a pivotal role in any migration process. When transitioning regular HTML content to DITA format, a specific set of steps must be meticulously followed to ensure a smooth and accurate migration. These steps often involve careful planning, data extraction, content transformation, validation, and sometimes manual intervention to handle unique or complex elements. Scripts automate and streamline these processes, making them essential tools in the migration toolkit.
Below steps were executed in this stage.
-Read input CSV and generate HTML
-Tidy UP the HTML
-XSLT Rules for any additional cleaning up if required
-Convert HTML 2 DITA using H2D plugin
-Post process clean up (if h2d plugin is creating some unnecessary formatting we will have to clean up to stick to DITA syntax)
-Generate tags:– Script will consume the input CSV & generate a CSV which can be uploaded through ACS commons tag maker tool.
-Generate the DITA folder structure based on the customer requirements while holding the performance standards and place DITA files with its corresponding images in it.
-Generate the bulk upload CSV with DITA file details & its relevant metadata.
- Upload the tags using the Tag Maker tool.
- Use CSV file & Import DITA using bulk asset importer referring the DITA + Metadata
- Create custom components and templates based on the needs
- Content exporting framework — In our case customers wanted to export the content to use with SPA applications. So we had created custom exporters to support this.
- Create a sitemap for the generated site pages.
- Define the workflows, add users to AEM Guide specific user groups (Author, Reviewer, Publisher) so that DITA approval workflow can be triggered.
- Configure the translation service for DITA translation based on the needs.
- Create element mapping based on needs.
- Create folder profiles
- Create Custom Presets
- Create Map collections & include the DITA topics for bulk site generation.
- Trigger the DITA Map for site generation using the custom presets.
- Test the generated DITA & Site pages.
- Manual fixes if required.
End-to-end testing is crucial for ensuring the success of the migration process. It involves rigorous evaluation of (i) DITA files, (ii) DITA Metadata, and (iii) Generated Site pages. This comprehensive testing approach helps identify and rectify issues at various stages of the migration.
In certain situations, if the source HTML files contain errors or inconsistencies, the scripts may struggle to generate complete and accurate DITA syntax. In such cases, manual intervention becomes necessary to correct the DITA files through re-authoring. This hands-on approach ensures that the migrated content meets the required standards and maintains its integrity throughout the migration process.
Go Live Phase:
Once all components are in place, planning for the Go Live phase is essential. To ensure a seamless transition, a well-prepared cutover plan is crucial. This plan should be developed in parallel with User Acceptance Testing (UAT) Sign Off. Key elements of the cutover plan include:
Preparation of Deployment Runbook: Documenting step-by-step procedures for deployment and rollback, ensuring clarity during the Go Live process.
Checking CI/CD Pipeline Readiness: Verifying the Continuous Integration and Continuous Deployment pipelines to ensure they are ready for production deployment.
Content Package Readiness: Ensuring that all content packages are finalized and validated for production deployment.
Workflow & User Group Setups: Validating workflow configurations and user group setups to guarantee smooth content approval processes.
API and Third-Party Endpoint Validations: Verifying the functionality of APIs and third-party integrations to confirm seamless data exchange.
PROD AEM Instance Readiness: Checking the production AEM instance to ensure it is configured correctly and optimized for performance.
Dispatcher Readiness: Verifying the configuration of the dispatcher for efficient content delivery and caching.
Code & Content Freeze: Implementing a freeze on code changes and content updates to maintain stability during the Go Live phase.
During the production Go Live, special attention is given to validating all DITA-specific items outlined in the deployment runbook. Each step is carefully executed to guarantee the successful migration of DITA content and to minimize any potential disruptions.
Below are the major items covered in the checklist.
- AEM Guide plugin installation on PROD instance
- DAM DITA Package deploy
- Tag Package deploy
- Schema, Pre-sets, Element Mapping, Custom DITA Templates packages
- DITA Specific OSGi changes Confirmation
On the day of the production Go Live, the steps outlined in the runbook were meticulously executed while closely evaluating the prepared checklist. By adhering to the runbook and continuously referencing the checklist, the team successfully navigated through the Go Live process, contributing to a smooth and efficient deployment.
Post Go Live phase:
The post-Go Live phase focuses on monitoring for any issues and implementing performance improvements. During this period, DITA authors begin authoring content in the migrated environment. Simultaneously, the developer team, with support from the infrastructure team, diligently monitors the production (PROD) instance. Proactive support is provided wherever necessary, ensuring that any potential issues are promptly addressed and performance is optimized to guarantee a seamless user experience. This vigilant monitoring and support approach continue to uphold the stability and efficiency of the migrated environment.
What challenges you might face & How to troubleshoot them?
During the end to end migration we had faced a few issues which are common across content migration projects.
Sanitizing the source :- This is one of the major task which is time consuming
H2D Plugin output corrections:- In some cases we noticed the generated DITA gave syntax errors in AEM Guides, due to which a post processing script was required for additional clean-up.
DITA syntax issues in final output:- We will have to use a query browser and find all nodes with ‘fmerror’ from the generated site pages.
Best Practices
- Always do a smaller POC to confirm the script chosen for migration covers varying scenarios.
- Do a thorough study of tools involved.
- Test a smaller piece of migration before we go end to end.
- Plan for delta migration in advance.
- Ensure the html in better form. If possible get the curated content offline, so that script can get executed without any other dependency.
- When there are a larger number of pages, finding DITA syntax errors is complex. We can do an ‘fmerror’* query using AEM query browser to find the DITA with syntax errors and fix them easily in the migration phase. During the run phase, use ACS commons report generation plugin and configure the ‘fmerror’ query so that report can be generated on regular intervals.
- Follow Adobe recommendation of indexing optimizations, content organization, workflow offloading.
- Check the broken links from DITA Map console reports after conversion.
- Plan dispatcher changes w.r.t AEM guides in parallel to dev.
*Note:’fmerror’ is a new node created on a DITA asset when there is an error in the DITA syntax.
Summary
The end to end project spanned approximate 12 weeks of effort to migrate a range of 4000 to 6000 articles. Our observation indicates that once script coverage is completed for all types of articles, there is no major change in the duration of execution, even if the article count increases significantly. However, certain factors such as output generation time for those articles and the overall asset size (including DITA files and images) might impact the bulk upload process and other aspects of the migration. These variables are essential considerations during the migration process.