SOLR Integration on AEM as a Cloud Service

2 min readSep 21, 2022

SOLR is an enterprise grade secure, highly scalable, fast, open-source NoSQL search platform from Apache Lucene with major features of powerful full-text search, faceted search and analytics, rich document parsing, geospatial search, extensive REST APIs as well as parallel SQLR.

To index the entire website you can use web crawler for the first time only. There are several web crawlers are available like Nutch, jSpider, Heritrix etc. and web crawler is also provided by SOLR.

If real time content update is not needed, then web crawler can be executed on scheduled basis to index the content via sitemap.xml

This is the second option where you can build custom application which can read the content from CDN (so that content get cached on all layers) and then pass the content to SOLR for further indexing. In this approach, custom application will invoke the AEM servlet with a page path as a parameter and servlet will return the content of a page in the form of a JSON.

As servlet data get cached on dispatcher & CDN, you can also use TTLs in this case to flush the cache instead of writing custom logic.

SOLR has couple of more options to crawl the content:

Post Tool — Option 3

SOLR includes a simple command line tool for posting various types of content to a SOLR server. The tool is bin/post. The bin/post tool is a Unix shell script; for Windows (non-Cygwin) usage.

This tool is meant for use by new users exploring SOLR’s capabilities and is not intended as a robust solution to be used for indexing documents into production systems.

SOLR Integration on AEM as a Cloud Service

Written by Tushar Bias