Four highlights from adaptTo 2018

Rachna Mehta
The Telegraph Engineering
8 min readDec 12, 2018

Visiting the AdaptTo conference in Potsdam, Germany

Javier and Rachana of The Telegraph with Carmen and Francois from Adobe at adaptTo conference in Germany

I was recently fortunate to attend the adaptTo meetup 2018 in Potsdam, Germany, where people gathered to listen to some of the most recent developments and innovations in the world of Sling, AEM, OSGi and the Opensource community world. While all of the sessions were useful and informative, I would like to talk about four sessions that really stood out to me.

AEM and Single Page Applications (SPAs)

I loved this session, which made several intriguing points about SPAs.

An SPA-enabled webpage can load content without reloading the entire page, which can be very helpful for web applications whose performance is top priority. It doesn’t load common things such as headers and images, and only loads content that is dynamic (namely, a specific part of the page that is dynamic). This system, of only loading specified jsons, makes load times highly effective and helps to make sites feel more alive.

However, there are few drawbacks to SPAs: Firstly, the first page can be slower if the javascript that you’re using is large in size. Secondly, using SPAs adds complexity to your site and can impact your SEO.

Sling is a headless CMS whose framework is used by AEM. A Headless CMS is a back-end only content management system (CMS) built from the ground up as a content repository that makes content accessible via a RESTful API for display on any device. AEM acts as a content repository which supports RESTful calls, so SPA implementation is very close to this aspect of it. The AEM SPA editor gets content in json instead of html!

There are two aspects to the AEM SPA Editor:

  1. The AEM side, which includes sling models, dialog & HTL (optional). There are two main exporters — com.adobe.cq.export.json.ContainerExporter & com.adobe.cq.export.json.ComponentExporter — which are built on back of sling model exporters. ContainerExporter is used for Page and LayoutContainerComponents while ComponentExporter is used for all the other components.
  2. The front-end side, which includes npm components. Here, React components map to resourceType.

The biggest takeaway is the MapTo component with resourceType. This overcomes SEO drawbacks by using a mixture of HTL and React, combined with sling models. Its disadvantage is the use of two data models (Sling and DOM) and that it’s not a full-blown SPA solution. You cannot mix and match HTL and SPA components yet but it’s available from 6.4, 6.4 SPQ, SPA Feature Pack. There is an open source project called We.Retail Journal which explains SPA.

Sling memory deep dive

This session was all about the memory usage of sling. It used various examples to explain how various parts of the sling application utilises memory.

For instance, if a system’s RAM=8GB, it’s divided into following parts;

  1. The kernel takes some space; it takes 200mb to store slab, kernel stack and pageTables.
  2. Heap takes 4GB (java -Xmx4G cq-quickstart), out of 4GB, Page cache takes 2GB. (/proc/meminfo). Segment tar needs to be loaded in Page cache for maximum performance, otherwise you can see bad response times in AEM.
  3. Logstash takes 800MB.
  4. The remaining 3GB will be managed by the system and will be used for different optimisations.

The heap & page cache are very important for the reliability & performance of AEM. Only part of heap is available to be used, based on java’s way of working.

We were shown a heap dissection of 4GB. Dissection was performed in two steps: firstly, a heap dump was taken, then secondly, that dump was loaded in the eclipse memory analyser. After analysing active objects, the following breakdown of usage was found:

  1. Segment cache (256MB).
  2. Deduplication caches (200MB).
  3. Link checker (72MB).
  4. Sling Discovery (42MB).
  5. Template cache (64MB).
  6. Lucene (23MB).
  7. Replication (3MB).
  8. Felix service registry (2MB).
  9. Sling servlet resolver (3MB).

It is good to know that some objects have a fixed limit and can’t grow, such as segment cache, while other objects such as the sling servlet resolver can grow. Also more load on the system can be resolved by scaling heap.

Page cache memory utilisation

The application has no power over this cache, which is solely controlled by the OS. This cache holds content via a segment tar. The question is, how does the segment tar uses this memory over time? As part of the session, VM Touch was used to find out which part was loaded and available for the application.

The sign of business as usual for the page cache was that old generation is empty, while current generation has some file/s partially loaded. When old generation becomes high it indicates that cache is not properly sized. Major faults in the system are bad, look for upscaling memory or tuning.

The biggest take away from this session was that “you need to let enough RAM for system to store content/tar that your application is using”. The obvious question then is how much memory should be assigned? The answer is, it depends! To summarise, the heap should be scaled according to the system load and the page cache should be based on your current repository size.

Using Apache OpenWhisk to scale and simplify AEM Asset processing

This session was all about AEM Asset processing. I found this session similar to an extended version of my first blog. AEM currently stores and processes images in the same instance. Digital photos are growing fast. The average AEM Asset size is greater than 100 MB.

Current constraint & challenges

When binary gets uploaded and downloaded, they go through a load balancer, AEM and binary cloud storage. Constraint is bandwidth-limited by VM size; more bandwidth means a larger VM size, a single instance does all the processing and Dev constraint means that everything is java & JVM. This means image/video processing should require a java library and that the DAM image workflow is complex.

There is also a functional challenge in that when an Adobe Photoshop file gets uploaded to AEM, it shows blank in the preview, meaning that AEM doesn’t support Adobe’s own format!

However, all of these challenges can be addressed by the Simplified Asset processing available to Adobe Managed Service clients.

The future is bright: a summary of the asset processing future for AEM:

  1. Direct access to cloud binaries. This allows you to reduce IO, bottleneck, it scales much better and gives options for CDN.
  2. Asset Compute Service by Adobe. It’s a multi-tenant cloud service, is scalable and supports the Adobe file formats. Its setup is free and there is no need to set up customisations for new file formats.
  3. It is easy to extend.
  4. It provides serverless agility.
  5. This reduces the AEM binary.

The asset compute service is available with AEM Assets 6.5. The session also exposed some technical details. It is a servlet and uses Apache OpenWhisk cloud power. They selected OpenWhisk because:

  1. It’s open source. (The project was started by IBM and completed in one-and-a-half years, which was then made available as open source).
  2. It’s serverless.
  3. Adobe is committing code to it and it makes developing custom requirements easier.
  4. It supports containers.
  5. It allows Adobe to provide rendering tools, sdk and libraries.

The next obvious question is Why Serverless for Asset processing? Here is a short summary:

  1. It’s asynchronous by nature; serverless is based on eventing.
  2. It’s highly scalable with parallel processing
  3. It allows for sandboxing , which helps with bad files, viruses, etc.
  4. It’s extensibile by nature, for enterprise workflows.
  5. It has dynamic scheduling , meaning that at invocation time you can make a decision on where to run.
  6. It provides you with greater development agility; a lot of responsibilities are handled by the platform and focuses on the application.
  7. It means you can use the Asset Compute service.
  8. It provides the same as existing AEM support for assets as well as support for Photoshop file formats.
  9. It increases performance.
  10. It supports source files such as raster images, raw and pdf.
  11. It allows for renditions such as thumbnails, web renditions, metadata and plain text.

The API uses the json request body. Storage is s3. They finished off the session with a Demo.

It’s not public yet, but AEM 6.5 will have the Asset Compute service and will be available for AMS customers only.

Integrating AEM with the Blockchain

Everyone talks about blockchain now a days hence Adobe decided to work on it and I decided to add it in my blog too! Blockchain is immutable, while Communiqué (cq/AEM) is about publishing content so it’s good to see how the two can be integrated.

How does it work in AEM? It’s proven out by very simple architecture:

  • AEM publishes data, (publish listener on author identifies event) and makes a call to a processor (Validator AEM DOCKTRK tx Processor), then submits it to blockchain processor.
  • The demo showed the AEM page goes to blockchain and it stays there forever.

There were a few things to consider about blockchain that I find very thought provoking: Blockchain is a public-distributed database, it’s decentralised and immutable. Each data has a hash of previous blocks and it’s impossible to change the previous data, and it’s slow.

Governance?

Blockchain is decentralised and everyone owns it, while access control depends on implementation.

GDPR?

This currently poses an unsolvable problem for publishers using blockchain. Anyone can request that data be deleted, but Blockchain is immutable. Because there is no one real owner, there is no one to make these requests to, so data can’t be deleted.

Auditable/verifiable?

Public data : yes, but links : no. Data gets encrypted and then put into the Blockchain, so only people with keys can decrypt the data.

Unchanged?

As it’s inside a blockchain, the data is unchangeable. This is nearly true, because it’s based on hashing and over time the hashing algorithm gets outdated and becomes easy to break. Rehashing in blockchain is not possible, so would be interesting to see what will happen in future, because in the long-term it doesn’t seem realistic.

In summary, the advantages of Blockchain are:

  1. It’s distributed so you can store files. It’s contract based, e.g. think about storing some contract that you get from somewhere, do you want to have it public or encrypted? May be there are other solutions?
  2. Supply chain: there are other solutions.
  3. DRM.
  4. Distributed identity: key management, public key infrastructure exists for 20 years, making it user-friendly, with no real need to use it again.
  5. Monetising existing processes: Using an open source project, how do you get some committers to do that?
  6. It’s contract-based, so someone wants a patch, submit it and fix it and you get money.

There were other interesting sessions as well and all of them are available to download from here.

Rachana Mehta is a Senior Software Engineer at The Telegraph. You can follow her on Twitter at Rachna81185836.

--

--