AEM Assets Cold Archival and Storage (Part 2/3)

Dipen Sagar
8 min readJun 2, 2023

--

Co-Author :Deepali Rathi

Introduction

In Part 1 of AEM Assets Cold Archival and Storage series, we introduced you to the challenges being faced by industries and organizations that churn a huge amount of digital assets.

We hope that you picked up some useful insights: the main reasons why an effective digital assets archiving strategy is the need of the hour, how to implement such a strategy in your digital solutions such as DAM and CMS, and will stay with us as we cover a few easy-to-fall-into pitfalls as well.

Welcome to Part 2 ! Here, you’ll learn about one such asset archival solution implemented using Adobe AEM (AEM6.5 /AEM Cloud Service) and a cloud storage (AWS S3). You can also integrate any other cloud storage solution with same methodology.

The topics we’ll elaborate in “Part 2” are as follows:

1. Setup AWS S3 bucket in which the assets will be archived to/from.

2. AEM Solution flow

3. AEM Project Setup — Customizations and configurations available in the custom AEM solution

(assumption — AEM is already setup, optional — Adobe Analytics integrated)

1. AWS S3 SETUP

a. Create a regular S3 bucket. You can name the bucket as you like, this will be used later while configuring OSGI settings in AEM. No special permissions are required on the bucket.

We have named our bucket — aemcoldstore.

My S3 bucket named — aemcoldstore

Next, we will generate access key and secret keys for this bucket.

b. Enable programmatic access to S3 bucket.

Navigate to the IAM service on the Amazon console

Select “Users” on the left menu and press the “Add user” button in blue

Enter any username you like (eg : S3-Full-User shown below) . Make sure to enable “Programmatic Access” in the first step

In the next step, press “Attach existing policies directly”, then search for S3 and select “AmazonS3FullAccess”

Complete the user creation and at the end, make sure to take note of your “Access Key ID” and “Secret Access Key”. These will also be used in AEM OSGI settings later.

IAM User with access to S3 bucket

Optional : You can create a bucket specific policy for the user which will restrict permissions to only a dedicated bucket meant for archival of assets.

Next, let us dive into AEM !

2. AEM SOLUTION FLOW

2. 1 Overview

While there are different ways to imagine a custom asset archival and unarchive mechanism in AEM, we chose to implement in the following manner.

2.1.1 Archival Flows :

There are three flows by which digital assets can be archived.

i) Manual flow

· Select any asset(s).

· Trigger a custom workflow (named Cold Store) that consists of a custom process step (named Archive in S3) which processes the asset and performs the archival actions discussed later in this article.

Custom archive workflow model in AEM

(flow screenshot)

ii) Configurable flows

· The custom AEM project offers a variety of configurable rules as shown below. DAM user can create as many rules as needed.

Three different types of rules can be created

Archive Rules card is highlighted
Example of a few `Archive` rules

· Each custom rule is internally implemented as a scheduled sling job. These background jobs run at the defined frequency, search assets that match the filters and take appropriate action — archive.
DAM users can feel free to make use of these granular filters to have total control on the DAM wide asset archival strategy.

(flow screenshot)

iii) Smart Flow (Analytics Driven)

This flow makes use of the native integration between AEM and Adobe Analytics to find potential assets to be archived based on their usage over web. For example — one can choose to automatically archive all the assets that are not being used/or are underused using the `impression count` provided by Adobe Analytics.

· A background job runs daily to find assets that have lesser impressions than a predefined number. Assets found are added to a dynamic collection in AEM.

· An inbox notification is generated which notifies and allows the DAM user/admin to review the collection and allow the archival of found assets.
(screenshot pending)

· The DAM user can then remove/add assets in the collection. Once the collection is ready, it is sent for archival.

(flow screenshot)

2.1.2 Unarchive Flows :

There are two flows by which digital assets can be restored.

i) Manual flow

· Select any asset(s).

· Trigger a custom workflow (named Fetch Cold Stored Asset) that consists of a custom process step (named Get asset from external storage) which performs the unarchive (restoration) actions discussed later in this article.

Custom unarchive workflow model in AEM

(flow screenshot)

ii) Configurable flows

· Like the archival configurable flows, DAM user can also create as many unarchive rules as needed.

Unarchive Rules card is highlighted
Example of an `Unarchive` rule

(As was the case with archive rules, you can add as many unarchive rules as you want each varying in different filters such as location path, file size ranges, file types, scheduler frequency, custom asset property=value etc)

· These custom rules too are implemented as scheduled Apache Sling jobs which run at the configured frequency. On each job run, assets that match the filters are searched and appropriate action is taken on them — unarchive. DAM users can feel free to make use of these granular filters to have total control on the DAM wide asset archival strategy.

(flow screenshot)

2. 2 ASSET PROCESSING

2.2.1 Archival Actions

This section describes the actions performed in AEM during archival flows.

  1. Only the original rendition of the asset is uploaded into AWS S3 bucket and relevant metadata of the original asset is also appended to the uploaded S3 object.
A snapshot of the S3 bucket containing a sample asset archived from AEM
This snapshot shows custom metadata uploaded along with the S3 object

Note : You can choose to upload/archive all available asset renditions including thumbnails, web renditions etc.

2. Once the asset is archived into S3, the original rendition of the asset is replaced with a small sized dummy placeholder image (~1 Kb) in AEM. This ensures that the disk space is freed up.

The placeholder image is also available in the custom deployed AEM project. It can also be picked up from a user defined location as configured in OSGI settings.

Note : Other renditions and the original asset metadata (JCR nodes) are left untouched in AEM. This means that the asset stays available within AEM for regular use including search, drag drop etc.

3. Optional Action : An Archived rendition is also created which contains the same placeholder image.

Note : If an asset is already archived, then rearchiving will not make any change. The entire process of archiving is skipped in this case.

2.2.2 Unarchive Actions

This section describes the actions performed in AEM during unarchive/restoration flows.

1. The corresponding archived object (eg — asset.png shown below) in the AWS S3 bucket is fetched and the original rendition of the AEM asset is replaced with it.

2. Then, the earlier created Archived rendition is deleted.

3. AEM SETUP and SETTINGS

A custom project based on maven AEM project archetype was created. The project is embedded with AWS SDK for Java 2.x to make the interactions with AWS S3 robust and faster.

3.1 Project OSGI Settings

Several OSGI configs are exposed by the project.

Custom AEM Project OSGI configs

(Some of you with keen eye may notice a few settings not discussed yet. Many of these will be explained in the next part of this 3-part series. Feel free to reach us to know more — for eg, to understand the S3 multipart upload feature used within AEM to make uploads faster and reliable)

Another OSGI config only meant to be used for smart config rules (used for archiving assets based on their impression count) is also available.

OSGI configs for smart flows

3.2 Custom Archive/Unarchive Rules

As elaborated during the configurable flows, a variety of configurable rules can be created.

(The project offers three different types of configurable rules)

The Delete rule was implemented to clear the objects archived in S3 without making any changes in the AEM. This rule should be used with caution as all the original renditions will be deleted from the archive storage permanently.

You can add as many rules as you want each varying in several filters such as location path, file size ranges, file types, scheduler frequency, custom asset property=value etc

Custom rule creation

What’s Next?

As you might appreciate, the project is meant to be simple implementation but can be further extended to connect with any other cloud storage solutions following the same approach. In near future, we might extract a base framework with core archiving and restoring capabilities and then create additional connectors for each external cloud storage — AWS S3 Glacier, AWS EBS, Google cloud, Azure blob etc.

Now, some of you might already be wondering about the potential issues (listed below) that one might face with this custom implementation . In the next part of this series, we will try to address these and discuss scalability, performance, and security aspects as we move the asset processing outside of AEM context and implement it in Adobe IO Runtime.

1. Parallel archiving/restoring of several assets simultaneously.

2. Biggest asset file that can be processed

3. Processing huge amount of assets (running in GBs or even TBs) that can cause disk space issues

4. Potential issues if AEM server shuts down midway during archiving/restoration etc

Thank you for taking the time to read through all of this. We covered a lot of ground, but there’s still so much more to discover as you move to our final part of this series — Part 3. Did this seem a bit overwhelming? We are here to support you along your way as you plan to implement your own archival strategy. Kindly reach out to us if you have any questions.

We can’t wait to hear your feedback as you apply these topics in your implementation. Stay Tuned !

--

--