AEM Assets Cold Archival and Storage (Part 3/3)

Dipen Sagar
7 min readJun 2, 2023

--

Co-Author :Deepali Rathi

The conclusion

We have been on a long journey, and we are as happy as you are as we inch towards our goal. We appreciate you being with us along the way.

This is the last part of our 3-part series on AEM Assets Cold Archival and Storage series, if you have not got a chance, please go through Part 1 and Part 2 before proceeding. Part 1 helped us realize the need for an asset archival capability in a modern Digital Asset Management platform. And, Part 2 demonstrated how such a feature can be implemented in Adobe Experience Manager Assets using AWS S3 as the archival storage.

Welcome to Part 3!

The solution described in this part builds on what was implemented in AEM earlier as we try to address many problems and issues discussed earlier. The major change made to the AEM implementation is the porting of asset archival/restoration logic to external Adobe IO Runtime functions. This virtually removes the complete asset processing from AEM Assets so that it is available for regular digital asset management. We will also look at Adobe IO Events which acts as a powerful way to bridge AEM Assets DAM with external Adobe IO Runtime functions.

The topics we’ll elaborate today are as follows:

1. Adobe IO Events and IO Runtime

2. AEM Integration with Adobe IO

3. Special Scenarios — extensions

1. Adobe IO

1.1 Overview

Adobe IO is a developer platform that offers a cohesive set of tools and services that enable developers to extend and integrate Adobe solutions all in one package.

1.1.1 Adobe IO Events

Adobe I/O Events enables building reactive, event-driven applications, based on events originating from various Adobe services, such as Creative Cloud, Adobe Experience Manager, and Analytics Triggers.

In this implementation, we registered Adobe Experience Manager as an event provider which triggers events, whenever a certain real-world action occurs, such as a DAM user opting to manually trigger archiving of an asset. These events can then be notified to any application of our choice for further event handling.

Adobe IO Runtime functions, discussed later, became our defacto application to handle these archival/unarchive events triggered by AEM Assets.

1.1.2 Adobe IO Runtime

Adobe I/O Runtime is Adobe’s serverless platform which can be used by developers to respond to events and execute functions in the cloud — no server required.

For handling the archive or unarchive asset events, triggered by AEM Assets, NodeJS based function that runs on Adobe I/O runtime was deployed. This function which streams the asset binaries directly between AEM and our external cloud storage, AWS S3.

These I/O Runtime functions automatically scale based on requests, keeping performance high and costs low. Automatic scaling helps asset archiving and restoration at scale.

1.1.3 End to End flow

For deeper understanding of the flow you may watch the video at Experience League.

(flow screenshot)

1.2 I/O Event Setup

AEM Assets is registered as an event provider in Adobe I/O. We can configure multiple custom event types as per need — for this implementation, we will register three event types corresponding to the three action rules configured in AEM (as discussed in part 2) — archive, unarchive and delete.

You can leverage either of the Adobe I/O Events SDK or Adobe I/O Events API to register AEM Assets as an event provider.

Setup

For this implementation, we made use of the ready to use open source AEM project — aio-aem-events that makes this registration simpler. Find more information about how to use this package at aio-aem-events installation.

Once this package is installed, please add OSGI configs corresponding to the three event types as shown below for the archive event type. Take note of the OSGI Topic as the same will be used in our custom AEM project OSGI configs (discussed later)

Event mapping for `Archive` event

Additional event provider configurations are needed which will be covered later in section 2.2.c of this article.

1.3 I/O Runtime Setup

We developed custom NodeJS functions that perform appropriate actions on receiving an event from the AEM Assets as explained below.

Follow the tutorials to create your custom runtime actions.

1.3.1 Archive Actions

Snapshot shows the two actions used to handle archiving of AEM assets.

1. `archive-web`: This is an asynchronous function that receives the archive event information from AEM Assets by means of Adobe IO Event journal. This action internally invokes the other archive I/O Runtime action to do the actual processing.

2. `archive`: This is a synchronous function that can run for minutes or hours. This function establishes connection with AEM, fetches the asset rendition binary and streams it directly into AWS S3 object, effectively archiving it.

Finally, upon successful archiving of the asset, the AEM Assets is notified back by calling appropriate custom AEM servlet (which acts like a webhook). As soon as the AEM is notified, post archive actions are performed like — replacement of original rendition with a small placeholder image, marking of the asset as archived etc.

1.3.2 Unarchive Actions

Following similar approach, we can implement two functions to handle the restoration/unarchive of assets back from AWS S3 to AEM Assets.

2. AEM Integration with Adobe IO

Make the following configurations to complete the integration setup.

2.1 AEM OSGI Settings

a. Make sure to check the Use Adobe I/O option in the OSGI configs of the project.

b. The OSGI Topic Prefix should be as used in the event mapping config covered during section 1.2 — Event Setup

OSGI configs

2.2 Adobe Developer Console Project

a. Create a project.

b. Add I/O Management API and I/O Events APIs to the project.

Snapshot of our project

c. Download your project workspace credentials from the developer console and use them in the AEM Events Workspace OSGI configs of the aio-aem-events package.

OSGI Workspace configuration for your Developer Project

Note — Make sure that all the credentials are correct, else AEM will not be registered as an event provider in Adobe I/O.

d. Event Registration — Under Events, add a new event registration, select AEM as event provider and configure three event registrations — archive, unarchive and delete. The event delivery type should be Journal (for guaranteed event handling) for these events. Lastly, respective Adobe I/O Runtime actions as setup in section 1.3 of this blog should be setup as the event consumer/event delivery. Example shown below.

Subscribing to `Archive` event provided by AEM as event provider
Event journal for `Archive` event with our custom Runtime Action as the event consumer

3. Special Scenarios

3.1. Handling of large assets –

In order to handle huge assets, it is important to understand the default system settings and limitations of the containers which run I/O runtime actions. These container configurations should be optimized while creating runtime actions to allow for large asset processing. For example — the timeout limit of the runtime functions can be increased upto 1 hour, and the container memory can be increased upto a max of 4096 MB. I had to maximize the timeout in my trials to handle a 50 GB asset file. The minuteRate option controls the number of I/O runtime actions that can be invoked per minute.

3.2. Delayed asset retrieval –

Leveraging other cloud storage options in which asset retrieval cannot be performed immediately — example, Amazon S3 Glacier Deep Archive storage class provides two retrieval options ranging from 12–48 hours. When initiating asset unarchive/restoration from such storage, the runtime action needs to wait for a certain time before a valid download link for the archived asset becomes available.

This can be handled by creating and firing triggers at the right time (once or periodic) using the Apache Openwhisk Alarm package. For example, for restoring archived asset back from S3 Glacier into AEM, a trigger can be invoked when we have waited out for the expected retrieval time duration which will ensure that the asset processing starts only after the retrieval link becomes working. We wish to implement this in our core AEM project, configured as OSGI settings, some time in near future. We would appreciate your feedback if you choose to implement this or have any other approach to handle such a case.

Thank you

We started with a simple use case of exporting a simple asset from AEM Assets to an external storage that lead us on to a path of implementing a full working asset archival solution. Thank you for your patience as we slowly covered a lot and hope that we were able to arm you with enough insights, topics and technologies to figure out your own data archival strategy.

Please leave a message if you wish to know more. We would love to hear from you about your implementation and challenges. Give a thumbs up if you found this series interesting. Follow us for more. Thank you !

--

--