Frequent Out of Memory Errors in Apache Solr

Amrit Sarkar
4 min readOct 22, 2017

--

This article details various endpoints to check when Solr is encountering Out-Of-Memory frequently. There can be various diverse reasons, and here we try to list down the most common ones:

  • Solr Caches — QueryResultCache, DocumentCache, FilterCache, FieldCache
  • Sorting/Faceting/Grouping on fields which are not DocValues
  • Insufficient Heap Memory allocated to Solr JVMs

Solr Caches — QueryResultCache, DocumentCache, FilterCache, FieldCache:

Refer the official Solr cwiki page: Solr-Caches to understand what is and how to configure the caches in your solrconfig.xml. They seems to be a very powerful tool and it is, until the significance of its use-case on your current environment/infrastructure is known. Otherwise it can take big chunk of your heap memory allocated to Solr and eventually run out-of-memory when you aren’t doing any heavy operations (indexing/searching/analytics) whatsoever.

Let’s see a sample solr-cache configuration and break down the relevant parameters which may or may not be responsible for eating up memory:

<filterCache class=”solr.FastLRUCache” size=”512" initialSize=”512" autowarmCount=”128"/>

The numbers specified for ‘size’,’initialSize’ and ‘autowarmCount’ are actual number of documents. Let’s just say, we have N number of documents and each document has 20 fields and average length of a field value across is 10 chars. Each document now will be of decent size and with 512 such entries taking up one cache, all-the-caches can take a sizeable percentage of total heap memory allocated to Solr. ‘autowarmCount’ is explained wonderfully in Solr-Caches, gets loaded every-time a new searcher is opened (not for DocumentCache though!). With current scenarios asking for Near-Real Time Search, a new searcher is opened via commits in relatively short intervals. If we have autowarmCount a sizeable number (in this case 128), imagine the time it will take for each searcher to get opened, as it will be loading documents for the cache again and again. With respect to heap memory, a part of it allocated is already filled up.

So, should we use Solr-Caches? the answer is “it depends on queries which are coming along”. Now for a government/firm/private/low-content website, where the data to be searched is limited and there are some specific queries which users hit often, you can have the Solr-Caches enabled, what size? we will discuss shortly. For an e-commerce website, where we have billions and millions of data/documents and the queries coming are dynamic and different (yes! there can be cases where there will be products which are searched very frequently), we don’t need to cache results/documents, put size Zero for every cache (or maybe a relatively small count, how much? we will discuss next).

Now the rule of thumb on how much size to be provided to each cache and the autowarmCount associated to it should solely be set after doing thorough analysis of the queries hitting the searchers on a decent timeframe on live traffic. What to set the size when you setup for first time? Start with Zero!

Sorting/Faceting/Grouping on non-DocValues fields:

Better explanation for this cannot be written than already in the official Solr cwiki page: DocValues. You simply cannot sort or facet or group on fields which are not docValues. So don’t!

Insufficient Heap Memory allocated to Solr JVMs:

In other words, it is just not enough! Let us suppose you have set-up the Solr-Caches properly and are not sorting/faceting/grouping on non-DocValues but still running out-of-memory now and then. You checked other-endpoints/configurations, everything seems alright! You are indexing a huge batch of documents consistently and Solr is supposed to index everything smoothly. You are requesting 300 rows in the result-set from Solr for a query, it should return the same. Red-Flag! You allocated just-not-enough memory to the Solr nodes. By default, each Solr node is started with 2G of heap memory, and the operations do need some space of its own to execute them. Rule of thumb is to give maximum possible memory to the Solr node without compromising with Operating system’s. It is best to fire up a seperate machine for each Solr node, and don’t run anything other application it (yes! it is not possible every-time every-where). If you have machine of 16G physical memory, don’t allot the whole 16G or even 15G or even 14G to the Solr. Give some room for breathing to Operating system so that you don’t end up with unnecessary swapping of files b/w primary and secondary memory. In Lucidworks, we have witnessed 8G heap memory per node does a decent job when your cluster has multiple collections, you are doing light-to-medium complex queries, indexing in again-light-to-medium batches and have set up Solr-Caches too, though a optimal number can only be achieved by analysing the cluster on live traffic for a decent timeFrame.

Moving forward in future releases the above mentioned explanations and examples may no longer apply. Please leave your suggestions, improvements and feedback in the comments. Cheers!

--

--