13 steps to rock-stable AEM package installs

Wim Symons
VRT Digital Products
8 min readSep 16, 2019
Women holding a gift-wrapped package in her hands.

Websites do not magically update themselves, fix their own problems or implement new features. Developers do this. What you want when you make changes to your website is that every part of this process goes smoothly.

At the moment, we have 4 websites running on the Adobe Experience Manager (AEM) content management system and that number will increase.

We also deploy very often, 10 to 15 times a day and about 2 or 3 times a day to production.

A way to ensure this process goes smoothly, is to follow the Continuous Delivery principles.

By doing it this way, we constantly improve small things, avoiding big bang releases.

In this Medium post we want to show you the way we have developed to do this safely on AEM package installs, after 5 years of perfecting the process.

How?

On AEM you make changes by deploying CRX packages. This can be done in several ways. You can use the AEM Package Manager user interface, you can drop the file in the crx-quickstart/install folder of your AEM server instances, or you can use the AEM Package Manager REST API.

To automate this, we chose the latter.

Our whole process is wrapped inside a Puppet run, which uses Bryan Stopp’s amazing AEM Puppet module (which we forked and modified a little for our purposes) to install the needed packages. I’m not going to describe that part in detail, but I want to zoom in on the parts which are important and rare to find.

Why?

Why should you care? Well, we usually don’t install only 1 CRX package to AEM, but a whole series. It could be a ui.apps package followed by a ui.configpackage. But, in case we are setting up an entire new AEM instance, this could be a Service Pack, some hot fixes, the ACS Commons package, the AEM Core Components and all our custom built packages.

For that to succeed, you have to monitor AEM and the state of its OSGi container before installing the next CRX package. Because with OSGi, everything can disappear at any given time. Even the AEM CRX package manager itself.

Upload

This part is documented on the Adobe docs website.

For example, to upload the CRX package called name_of_package.zip:

curl -u admin:admin_password -F package=@"name_of_package.zip" http://localhost:4502/crx/packmgr/service/.json/?cmd=upload

Install

As the Upload part, this part is also documented on the Adobe website.

To continue our example, you can install the package as follows:

curl -u admin:admin_password -X POST http://localhost:4502/crx/packmgr/service/.json/etc/packages/package_group/name_of_package?cmd=install

The package_group of the package depends. For custom-built packages, this usually is the Maven Group Id. For any CRX package, you can also fetch this information from the metadata stored inside the zip file. Use following snippet to do that:

unzip name_of_package.zip META-INF/vault/properties.xml -d /tmp
cat /tmp/META-INF/vault/properties.xml | xq -r '.properties.entry[] | select(."@key"=="group") | ."#text"'

Note: You need xq to be installed for the above snippet to work. You can install this easily using Homebrew on MacOS:

brew install python-yq

Clean up the previous versions

This is not something you’ll find, but each time you install a newer version of a package, AEM doesn’t remove any older version, in case you want to rollback the latest deployment. If you deploy as much as we do, this rapidly increases your repository size.

Since our Jenkins build stores every CRX package version in our Maven artifact repository, we can easily install older versions from there.

Because of that, we can remove all previous versions of the package after a successful install.

This is possible with a few API calls to the AEM package manager, but we created a Ruby Gem to wrap this in an even simpler command. (The reason this is a Ruby gem, is for re-use in our Puppet code.)

You can find the source code here and you can download it from Rubygems as well.

For example:

/usr/bin/aemcrxpkgmgr --pass admin_password --query package_name --action delete_crx_zip

Wait for OSGi Installer

After the CRX package is installed, /apps is searched for any OSGi bundles to install. That is the responsibility of the OSGi Installer. It has a JMX bean in which you can monitor its state.

Screenshot of the Sling OSGi Installer JMX bean as seen in the AEM System Console.

After a package install, the ActiveResourceCount goes down to zero. And Active will return to false. So we have to wait until that state is reached.

The ActiveResourceCount number fluctuates a little to a lot (especially with Service Packs) and can reach 0 several times, so you have to monitor it some time.

The pauseInstallation node

The OSGi installer sometimes doesn’t cleanup its state and leaves a /system/sling/installer/jcr/pauseInstallation node hanging around in your JCR repository.

This needs to be tracked, and if such a node exists, it needs to be removed from the system in order for any subsequent deploy to succeed.

If you don’t remove it, AEM will stop installing any OSGi bundle without the AEM package manager noticing it, nor will it report any error.

This would result in a deploy which seems to be successful, but no code changes are deployed, causing serious headaches and debugging sessions for the developers trying to figure out why newly deployed code doesn’t function.

6dglobal wrote a nice article about it some time ago.

We check it as follows (Bash snippet):

COUNT_NODES=$(curl -s -S -f -m 60 -u admin:admin_password "http://localhost:4502/system/sling/installer/jcr/pauseInstallation.1.json" | jq -r 'keys|.[]' | grep -cvE '(jcr:created|jcr:createdBy|jcr:primaryType|jcr:mixinTypes)')
if [[ ${COUNT_NODES} -gt 0 ]]
then
echo "Found $COUNT_NODES sub-nodes below /system/sling/installer/jcr/pauseInstallation, please remove them and deploy again"
exit 1
else
echo "No pauseInstallation nodes found"
fi

Wait until all bundles and components are up and running

This one has been tricky for a long time. You can check for the state of all OSGi bundles and the state of all OSGi DS components, but this also fluctuates greatly during OSGi bundle installation. Depending on which services your bundle wires, and which OSGi configuration files you install, OSGi can reload quite a lot of other bundles in AEM.

We started out by checking the bundle state and the component state and we did check for 3 consecutive executions every 20 seconds to see if everything was up and running. But in some cases this just wasn’t enough. And if you schedule a few CRX package installs one after another, this leads to some problems. You would start OSGi bundle installation of the next CRX package too soon, and if that depends on bundles being provisioned by the previous CRX package install, you might be left with bundles that just won’t start.

Last year, at the adaptTo conference in Berlin, Germany, there was a talk about the Systemready Framework. This framework (just another OSGi bundle) wires up together with Apache Felix (the OSGi runtime container shipped with AEM). It monitors the readiness of the OSGi runtime. It actually tracks the OSGi runlevels. In short, when the OSGi container starts, it has runlevel 0, then it starts the bundles at runlevel 1, etc until everything is up and running and reaches its final runlevel (like 32 in AEM’s case).

This is actually a much better check as it completely removes the need to check consecutively if the bundles and components are up. If you reach the final level, everything is running. And if new bundles are added to the system, or new versions of existing bundles, the runlevel goes down and back up again until it reaches the final level again.

It also comes with a JMX bean you can monitor, and, even better, you can register a servlet where you can ask the system state. It will return HTTP 200 if all is well, or 503 if the system is in a state of flux.

It is not installed in AEM 6.4, but is is included out of the box in AEM 6.5.

It’s not that hard to install and configure the bundle and the OSGi config for the servlet on AEM 6.4. I’ll show you how.

Installing Apache Felix Systemready Framework on AEM 6.4

For this to work from the initial install of AEM, you must put the OSGi bundle and the corresponding OSGi configuration files in the crx-quickstart/install folder before you start AEM (after the unpack phase).

Download and put org.apache.felix.systemready-0.4.0.jar into thecrx-quickstart/install/19 folder (create it if it doesn’t exist yet).

Create 2 OSGi config files:

  • org.apache.felix.systemready.impl.ServicesCheck.config
  • org.apache.felix.systemready.impl.servlet.SystemAliveServlet.config

and place them into thecrx-quickstart/install folder.

The contents of org.apache.felix.systemready.impl.ServicesCheck.config should be:

service.pid="org.apache.felix.systemready.impl.ServicesCheck"
services.list=["org.apache.sling.launchpad.api.StartupService"]
type="ALIVE"

And the contents of org.apache.felix.systemready.impl.servlet.SystemAliveServlet.config should be:

service.pid="org.apache.felix.systemready.impl.servlet.SystemAliveServlet"
osgi.http.whiteboard.servlet.pattern="/systemalive"
osgi.http.whiteboard.context.select="(osgi.http.whiteboard.context.name\=org.osgi.service.http)"

So, since a month or so, we are using this system to check if AEM is ready for action.

Note: the Apache Felix Systemready framework is already deprecated. Apache Sling donated its Sling Healthcheck framework to Apache Felix where it was renamed to the Apache Felix Healthcheck framework. The system ready components were integrated into that system of health checks. I guess we will see this coming up in AEM 6.6 or later.

Note 2: At the 2019 AdaptTo conference, there was a consecutive talk on this subject. Georg Henzler from Netcentric released https://github.com/Netcentric/healthcheck-migration-kit to migrate AEM 6.4 or 6.5 to Apache Felix Healthchecks.

Installing Service Packs

Service packs look like your default CRX package, but they are a very different beast. A Service pack copies a number of CRX packages into the JCR repository and installs and starts an OSGi bundle called updater.some_name to orchestrate the entire installation process.

So, in order to install these, we do not only have to take care of everything else we learned, but due to the asynchronous nature of the install process, we also need to monitor the state of this OSGi bundle.

The bundle is started by an installation hook in the service pack CRX package and once its job is completely finished, it removes itself. This is something we learned by monitoring our service pack installations.

We have implemented the necessary AEM restarts (before and/or after as documented by Adobe in the Service Pack release notes) and we noticed the service pack was only half installed at the time Puppet restarted the AEM service. It is vital to let the Service Pack installation finish, before you restart (or stop) AEM. If you don’t, the service pack will install itself over and over again.

We use this Bash snippet:

# wait max 30 minutes for SP update
local timeout=1800
local starttime=$(date +%s)
echo "Waiting until AEM Service Pack Updater is finished ..."
until (($(date +%s) > starttime+timeout)) || ! curl -s -S -f -u admin:admin_password "http://localhost:4502/system/console/bundles.json" | jq -r -e '.data[] | select(.symbolicName|test("^updater."))'
do
sleep 5
done
if (($(date +%s) > starttime+timeout))
then
log "AEM Service Pack Updater did not finish after ${timeout}s."
exit 1
fi
echo "AEM Service Pack Updater is finished."

Wrap-up

In short, to wrap things up, here are the steps we take to safely install every CRX package we deploy to AEM:

  1. Wait until ‘systemready’
  2. Upload the package
  3. Restart AEM if needed before installation of the package
  4. Wait until ‘systemready’
  5. Check for the pauseInstallation node
  6. Install the package
  7. Wait until OSGi installer is idle
  8. Wait until ‘systemready’
  9. Wait until Service Pack install bundle is gone
  10. Wait until ‘systemready’
  11. Restart AEM if needed after installation of the package
  12. Wait until ‘systemready’
  13. Clean up the previous package versions

Now repeat from step 1 for every next package to install.

I hope you learned a thing or two about AEM deploys. Over the past few years we learned a lot as well.

To end, a big thank you to Christian Schneider and Andrei Dulvac for building the Systemready framework. It was key to making our AEM package installs rock-stable.

If you want to chat about this, hit me up on Twitter.

Until next time!

Update 25/09/2019: I discovered the cURL documentation on the Adobe docs website. You can find it at https://helpx.adobe.com/experience-manager/6-5/sites/administering/using/curl.html.

--

--

Wim Symons
VRT Digital Products

Adobe Experience Manager lead architect, Java developer, Dev-ops fanboy. Always learning.