Caching Static Resources

Use AEM Dispatcher and CDN to cache and update on-demand custom static resources

German Cely
Globant
8 min readAug 22, 2023

--

Photo by Marcin Simonides on Unsplash

There are some web page resources that don't change periodically and going to the source in each load page is expensive. This tutorial explains a way to cache some static resources exposed as AEM web services. The aim is to use the CDN and AEM Dispatcher to return the information filled by an AEM service and avoid using the AEM publisher instance, having the content updated in real-time.

First, the CDN cache can improve the performance in the page load, this is because the nearest server in the network responds to the request, which means velocity in the page load if there is a high cache ratio. The cost of a CDN service is low because of the many providers that offer this service. On the other hand, the dispatcher is a node used first to cache resources and as a load balancer to publisher instances. When the CDN doesn't know some resource, the request goes to the dispatcher looking for the resource.

The idea is to cache the different resources of a page through these two nodes, as indicated, to improve page performance, for security, using them as a filter against malicious requests, avoiding exaggerated loads on the AEM publisher instance, and finally for licensing costs in AEM Cloud or infrastructure costs in on-premise architectures.

Custom Assets Caching

In order to solve a functional requirement, which asks for a way to change the styles in a specific component or entire web page and the need to use the AEM Style System without a deployment for each change, defines the following architecture:

Figure: Architecture Diagram

Through a content fragment, the author user defines the CSS classes that the page or a specific component will have. The page loads the style sheet via Web Service that takes from the Content Fragment and passes it to the page as a CSS file. Finally, the end user can see the page with the styles updated in real time without having to wait for a production release.

As part of the non-functional requirements, there is the need to cache the CSS that loads the page through the dispatcher and the CDN, because of performance and to avoid making many requests to the publisher.

To cache the CSS file, suffixes are used to the service instead of query parameters, because the CDN can`t cache query parameters, eg:

https://domain.com/urlService.html/contentFragmentPath/Variant.css

In the first place, we have the domain of the site followed by the path of the service and the path of the CSS resource that finally becomes the same path of the content fragment, including the variant.

Caching the new specific content

In the AEM architecture, we have the following nodes:

Image by: Optimizing AEM Site Caches

The first cache level is the browser, the second is the CDN, and, finally, we have the dispatcher. By default, the dispatcher restricts all requests, only allowing those that are basic and do not represent vulnerabilities for the project. As a first measure, it is necessary to authorize these requests in the dispatcher through the /filters. For this, the following rule is included in the filters.any file:

/0100 { 
/type "allow"
/method "GET"
/path "/bin/project/servicePath"
/extension 'html'
/suffix '^(\/content\/dam\/)([\/a-zA-Z0–9._-]+)(\.css|\.json)$'
}

This rule enables access to any .CSS or .JSON files that match the path. In addition to this, caching rules are included in the /available_vhosts/myProject.vhost file.

<LocationMatch "^/bin/project/servicePath.html/contentFragmentRootPath/(.*)">
Header set Cache-Control "max-age=300, s-maxage=300"
Header set Surrogate-Control "stale-while-revalidate=3600,stale-if-error=43200"
Header set Age 0
</LocationMatch>

With the Cache-Control directive, we define the time in which the browser will keep the information referring to the .CSS file cached. In this case, it is caching for 5 minutes; after 5 minutes the browser will go back to the site to update the information. With the s-maxage parameter, the aim is to maintain a cache time of 5 minutes in the CDN, to keep the information cached in a CDN. In addition to providing performance, it reduces costs thanks to the low cost of CDN services today. A CDN follows the rules just like browsers do; that is, they also consider the responseCache-Control to define how long it will take to refresh the content.

Conversely, Surrogate-Control is a directive of Fastly, the CDN used (to this day) by AEM As a Cloud Service, Fastly takes this directive in the same way as Cache-Control to determine the times in which it should refresh its content, preferring Surrogate-Control instead. if both are configured. This particular example defines the stale-while-revalidate and stale-if-error parameters in this directive.

There are two options in which the CDN seeks to update its content, when there is a change and even when the resource remains static and its cache time has expired. When the content is kept static, the CDN still seeks to revalidate its content through the stale-while-revalidate directive, with which the CDN knows for how long to keep the version it has cached for a resource. The stale-if-error property keeps the content cached if the source returns an error when trying to update the content.

It is possible to validate the cached content using a local environment and verify the image that is generated with the dispatcher configuration:

Figure: local Dispatcher with the specific resources cached

The image above represents the information cached in the dispatcher; on the one hand, there is the etc.clientlibs folder, which is a folder that stores the different JavaScript libraries used by the front end. This path includes the clientlibs of the granite library: for example, the clientlibs of core/wcm, the foundation libraries, and the custom libraries or clientlibs of the project, for example, clientlib-site which is generated from the code of the ui.frontend module.

Another folder found in the dispatcher cache is the one corresponds to the content of the project. Inside this folder is cached all the content of the website:

Figure: Local Dispatcher with commons files cached

There are folders like the dam, the web pages of the site in this case “MyProject” and the experience fragments. All this content can be static over time and not undergo constant changes (such as images and other DAM assets) or be high-traffic information that has a high demand for requests—needing a high performance at the moment of arranging the content.

Finally, there is the folder named bin in this example, it is a custom project folder, not necessarily in all of them, although readers familiar with implementing services in AEM will be familiar with this path. In this particular project, a series of web services were created that provide certain information to be consumed by other applications. For example, an email service that provides a template so that another service capable of sending emails in bulk can take that template and create the email, a service that has a list of countries with telephone codes and a list of special words in JSON format. All this information is easy to manage through the CMS either using specific content fragment models or simply a structured text with the information.

This particular route has no way to override when its content changes; for one thing, the etc.clientlibs are updated after every deployment, and content is updated on demand with the publication of either a page or an asset, but this route since it is custom, has no way of updating its content until the TTL defined by the dispatcher expires.

Manual invalidation of specific content

As mentioned above, ideally it is the dispatcher and the CDN that know how to update the content in their configurations, but what happens if there is a group of resources that are in high demand, as in the example illustrated in this writing, CSS files are resources that are required by all pages of the site, and it is not convenient to lower the TTL because this means a decrease in performance and an increase in the load on the publisher. As seen in the section where the cache policy was defined, this policy applies to the entire set of CSS.

For this, a manual invalidation of the cache can be used, through a workflow, for example, to force the update of a specific content without affecting the TTL of the others.

DistributionRequest distributionRequest = new SimpleDistributionRequest(DistributionRequestType.DELETE, false, completePath);
DistributionResponse dResponse = distributor.distribute(PUBLISH_AGENT, resolver, distributionRequest);

AEM provides the DistributionRequest class which instantiates an object that allows defining the specific resource that is going to update. This object is created by specifying (from left to right) the path of the asset to be invalidated, specifying if will be invalidated only the asset defined in the path or its children, and finally the action, which for our example is to delete the asset so that it can be recovered again in the first request.

Through the distributor service, the command is sent to remove the content of the dispatcher. We can use a workflow for example to automate the invalidation process. This is, in the Content fragment publish send the invalidation order to the dispatcher:

@Component(
immediate = true,
service = WorkflowProcess.class,
property = {"process.label = The name of my workflow"}
)
public class InvalidateCacheProcessStep implements WorkflowProcess {
private static final String PUBLISH_AGENT = "publish";
private final String JCR_CONTENT = "/" + JcrConstants.JCR_CONTENT;
private static final Logger LOG = LoggerFactory.getLogger(InvalidateCacheProcessStep.class);
@Reference
private transient Distributor distributor;
@Override
public void execute(WorkItem workItem, WorkflowSession workflowSession, MetaDataMap metaDataMap) throws WorkflowException {
WorkflowData wfData = workItem.getWorkflowData();
String resourcePath = wfData.getPayload() != null ? (String) wfData.getPayload() : "";
ResourceResolver resolver = workflowSession.adaptTo(ResourceResolver.class);
String completePath = "clean path";
DistributionRequest distributionRequest = new SimpleDistributionRequest(DistributionRequestType.DELETE, false, completePath);
DistributionResponse dResponse = distributor.distribute(PUBLISH_AGENT, resolver, distributionRequest);
if(!dResponse.isSuccessful()){
LOG.error("Error in clean content to: {}", completePath);
}else{
LOG.info("Content updated for: {}", completePath);
}
}
}

Conclusions

It is important to use the CDN and Dispatcher to cache the different resources on a page, such as images, JSON files, videos, and CSS styles. This improves the performance of a page and can reduce the costs of having online pages.

In AEM Cloud is possible to cache specific resources, but it is necessary to have in mind a good way to revalidate or refresh those resources. It is possible to have troubles when changing the resources because the dispatcher and CDN don’t have a way to know the changes.

Further References

--

--