Elasticsearch In Production — Deployment Best Practices

Abhinay Dronavalli
Jan 5, 2018 · 11 min read
Elasticsearch is highly optimized search engine for modern data analytics.

Elasticsearch is an amazing real time search and analytics engine. It is built on Apache Lucene. It is distributed, RESTful, easy to start using and highly available. Elasticsearch use cases include powering the search, transaction monitoring and error detection, content discovery, log analytics, fuzzy search, event data aggregation, data visualization. Elasticsearch and the rest of the Elastic Stack have proven to be extremely versatile, and as you can see above use cases, there are multiple ways to integrate Elasticsearch into what your product is delivering today and add extra insight to it.

We use it heavily for search and analytics at Botmetric, we index about a billion documents a day and we use very complex aggregations for data visualization in realtime.

That said, bootstrapping an application vs running it in production and maintaining are totally different. This aricle covers many of these factors from real life experiences and are the basic common items you should consider for running Elasticsearch in production.


Elasticsearch and Lucene are written in Java, which mean you must look out for the heapspace and JVM stats. The more heap available to Elasticsearch, the more memory it can use for filter and other caching to increase query performance. But note that too much heap can subject you to long garbage collection pauses. Don’t set Xmx to above the cutoff that the JVM uses for compressed object pointers (compressed oops); the exact cutoff varies but is near 32 GB.

A common problem is configuring a heap that is too large. You have a 64 GB machine — and by golly, you want to give Elasticsearch all 64 GB of memory. More is better! Heap is definitely important to Elasticsearch. It is used by many in-memory data structures to provide fast operation. But with that said, there is another major user of memory that is off heap: OS file cache.

Lucene is designed to leverage the underlying OS for caching in-memory data structures. Lucene segments are stored in individual files. Because segments are immutable, these files never change. This makes them very cache friendly, and the underlying OS will happily keep hot segments resident in memory for faster access. These segments include both the inverted index (for fulltext search) and doc values (for aggregations). Lucene’s performance relies on this interaction with the OS. But if you give all available memory to Elasticsearch’s heap, there won’t be any left over for OS file cache. This can seriously impact the performance. The standard recommendation is to give 50% of the available memory to Elasticsearch heap, while leaving the other 50% free. It won’t go unused; Lucene will happily consume whatever is left over for file cache. Elasticsearch heap can be configured following ways,

export ES_HEAP_SIZE=10g


ES_JAVA_OPTS="-Xms10g -Xmx10g" ./bin/elasticsearch


Elasticsearch supports aggregations and filtered queries. Running complex filtered queries, intensive indexing, percolation and queries against indices need heavy CPU, so picking up the right one is critical. One must understand CPU specs and how they behave with Java as the queries run on JVM.

Each pool runs a number of threads, which can be configured, and has a queue. Changing this is not recommended unless you have very specific requirement as Elasticsearch does allocation of cores dynamically.

Thread pool types:

Elasticsearch has 3 types of Thread pools.

  1. Cached: The cached thread pool is an unbounded thread pool that will spawn a thread if there are pending requests. This thread pool is used to prevent requests submitted to this pool from blocking or being rejected. Unused threads in this thread pool will be terminated after a keep alive expires (defaults to five minutes). The cached thread pool is reserved for the generic thread pool.
  2. Fixed: The fixed thread pool holds a fixed size of threads to handle the requests with a queue (optionally bounded) for pending requests that have no threads to service them. The size parameter controls the number of threads, and defaults to the number of cores times 5.
  3. Scaling: The scaling thread pool holds a dynamic number of threads. This number is proportional to the workload and varies between 1 and the value of the size parameter.

Elasticsearch divides the CPU use into thread pools of various types:

  • generic: for standard operations such as discovery and thread pool type is cached.
  • index: for index/delete operations. Thread pool type is fixed.
  • search: for count/search operations. Thread pool type is fixed.
  • get: for get operations. Thread pool type is fixed.
  • bulk: for bulk operations such as bulk indexing. Thread pool type is fixed. The best configuration of bulk documents depends on cluster configuration, this can be identified by trying out multiple values.
  • percolate: for percolation. Thread pool type is fixed.
  • refresh: For refresh operations. Thread pool type is scaling.

Changing a specific thread pool can be done by setting its type-specific parameters.

Read more https://www.elastic.co/guide/en/elasticsearch/reference/2.2/modules-threadpool.html#types

Shard size:

The shard is the unit at which Elasticsearch distributes data within the cluster. The speed at which Elasticsearch can move shards around when rebalancing data, e.g. following a failure, will depend on the size and number of shards as well as network and disk performance.

In Elasticsearch, each query is executed in a single thread per shard. Multiple shards can however be processed in parallel, as can multiple queries and aggregations against the same shard.

This means that the minimum query latency, when no caching is involved, will depend on the data, the type of query, as well as the size of the shard. Querying lots of small shards will make the processing per shard faster, but as many more tasks need to be queued up and processed in sequence, it is not necessarily going to be faster than querying a smaller number of larger shards. Having lots of small shards can also reduce the query throughput if there are multiple concurrent queries.

Each shard has data that needs to be kept in memory and uses heap space. This includes data structures holding information at the shard level and also at the segment level in order to define where data resides on disk. The size of these data structures is not fixed and will vary depending on the use-case. One important characteristic of the segment related overhead is however that it is not strictly proportional to the size of the segment. This means that larger segments have less overhead per data volume compared to smaller segments. The difference can be substantial. Choosing the right number of shards is complicated because you never know how many documents you’ll get before you start. Having lots of shards can be both good and terrible for a cluster. Indices and shards management can overload the master node, which might become unresponsive, leading to some strange and nasty behavior. Allocate your master nodes enough resources to cope with the cluster size.

The bad thing is that the number of shards is immutable and it is defined when you create the index. Once index is created, the only way to change the number of shards is to delete your indices, create them again, and reindex.


Elasticsearch supports replication, data is replicated among the data nodes so a node loss would not lead to data loss. By default replication factor is 1, but depending on your product requirements it can be increased. The more replicas, more disaster resistant your data will be. Another advantage of having more replicas is that each node holds a replica shard, which improves query performance as replicas too are used for querying.

The replication formula used by Elasticsearch for consistency is,

(primary + number_of_replicas) / 2 + 1

Optimising allocation

Based on product data requirements, we can classify data into hot and cold. Indices that are accessed more frequently than others, can be allocated more data nodes while indices that are less frequently accessed indices can have less resources allocated. This strategy is especially useful for storing time series data like application logs (eg: ELK).

This can be achieved by running a cronjob that moves the indices to different nodes at regular intervals.

Hot node is a type of data node performs all indexing within the cluster. They also hold the most recent indices since these generally tend to be queried most frequently. As indexing is a CPU and IO intensive operation, these servers need to be powerful and backed by attached SSD storage. We recommend running a minimum of 3 Hot nodes for high availability. Depending on the amount of recent data you wish to collect and query though, you may well need to increase this number to achieve your performance goals.

Warm node is type of data node designed to handle a large amount of read-only indices that are not as likely to be queried frequently. As these indices are read-only, warm node tend to utilize large attached disks (usually spinning disks) instead of SSDs. As with hot node, we recommend a minimum of 3 Warm node for high availability. And as before, with the caveat that larger amounts of data may require additional nodes to meet performance requirements. Also note that CPU and memory configurations will often need to mirror those of your hot nodes. This can only be determined by testing with queries similar to what you would experience in a production situation.

For more details on hot and warm node refer here.

Another strategy that you can adapt is, archiving the indices to s3 and restoring when you need data from those indices. You can read more about it from here.

Node Topology:

Elasticsearch nodes can be divided into three categories master node, data node, client node.

  1. Master node: Master node can be small if it is not a Data node too as it does not store any indices/shards. Its responsibility is store detailed cluster state and help data and other nodes in indices/shards meta-data lookup. Elasticsearch should have multiple master nodes to avoid split brain problem.
  2. Data node: Data node is responsible for storing/querying the actual index data.
  3. Client node: Client node isused as a proxy for indexing and searching. This is highly recommended if aggregations are heavily used. These are special ElasticSearch nodes that are neither data or master eligible. Client nodes are cluster aware and therefore can act as smart load balancers. You can send your queries to the client nodes which can then take on the expensive task of gathering responses to the query results from each of the data nodes.

add these settings to elasticsearch.yml file for respective nodes.

Master node: node.master:true node.data:false 
Data node: node.master:false node.data:true
Client node: node.master:false node.data:false

Troubleshooting tips:

Elasticsearch performance depends heavily on the machine it is installed on. CPU, Memory Usage, and Disk I/O are basic operating system metrics for each Elasticsearch node. It is recommended that you look into Java Virtual Machine (JVM) metrics when CPU usage spikes. In the following example, the reason for the spike was higher garbage collection activity.

  1. Heap pressure: High memory pressure works against cluster performance in two ways: As memory pressure rises to 75% and above, less memory remains available, and your cluster now also needs to spend some CPU resources to reclaim memory through garbage collection. These CPU cycles are not available for handling user requests while garbage collection is on. As a result, response times for user requests increases as the system becomes more and more resource constrained. If memory pressure continues to rise and reaches near 100%, a much more aggressive form of garbage collection is used, which will in turn affect cluster response times dramatically. Index Response Times metric shows that high memory pressure leads to a significant performance impact.
  2. Growth in the JVM’s non-heap memory, eating away memory intended for page cache and possibly causing kernel-level OOM-reaping.
  3. Avoid split brain problem. Split brain is a scenario where the cluster splits up. For eg, you have 6 node cluster. 2 nodes disconnect from the cluster, but they are still able to see each other. These 2 nodes then create another cluster. They will even elect a new master among themselves. We now have two clusters with the same name, one with 4 nodes and other with 2 nodes. Each has a master node too. This is what is called split-brain issue with ES clusters.To avoid this, set the ES parameter discovery.zen.minimum_master_nodes to half the number of nodes + 1.
  4. Since Elasticsearch uses storage devices heavily, monitoring the disk I/O ensures that this basic need gets fulfilled. There are many reasons for reduced disk I/O, its considered a key metric for predicting many kinds of issues . It is a good metric to check the effectiveness of indexing and query performance. Analysing read and write operations directly indicates what the system needs most in the specific use case. The operating system settings for disk I/O are a base for all other optimizations, tuning disk I/O can avoid potential problems. If the disk I/O is still not sufficient, countermeasures such as optimizing the number of shards and their size, throttling merges, replacing slow disks, moving to SSDs, or adding more nodes should be evaluated according to the circumstances causing the I/O bottlenecks.
  5. For the applications that rely on search, the user experience is highly correlated to the latency of search requests. There are many things that can affect the query performance, like constructed queries, improperly configured Elasticsearch cluster, JVM memory and garbage collection issues, disk IO, and so on. Query latency is the metric that directly impacts users, so make sure you put some alerts on it.
  6. Most of the filters in Elasticsearch are cached by default. That means that during the first execution of a filtered query, Elasticsearch will find documents matching the filter and build a structure called “bitset” using that information. Data stored in the bitset contains a document identifier and whether a given document matches the filter. Subsequent executions of queries having the same filter will reuse the information stored in the bitset, thus making query execution faster by saving I/O operations and CPU cycles. Using the filter in query is recommended. For more details refer here.
  7. Refresh time and merge time are closely related to indexing performance, plus they affect overall cluster performance. Refresh time increases with the number of file operations for the Lucene index (shard).
  8. Enabling slow query logging will help in identifying which queries are slow and what can be done to improve them, especially useful for wildcard queries.
  9. Increase the ulimit size to allow max files.
  10. ElasticSearch performance can suffer when the OS decides to swap out unused application memory. Disable swapping by setting OS level settings or set the following in ElasticSearch config bootstrap.mlockall: true
  11. Disable deleting all the indices by wildcard query. To ensure that someone does not issue a DELETE operation on all indexes (* or _all) set action.destructive_requires_name to true.

Before finishing off, here is the list of urls that are useful for watching the metrics.

  • /_cluster/health?pretty: For the cluster health indicator.
  • /_status?pretty : For all information about all the indices.
  • /_nodes?pretty : For all information about the nodes.
  • /_cat/master?pretty: For master node.
  • /_stats?pretty : For shard allocation, indices stats.
  • /_nodes/stats?pretty : For individual node stats, this includes, jvm, http, io stats for the node.

Metrics aggregation of Elasticsearch is supported by most system monitoring tools like Datadog, TICK. Using such tools is recommended and creating funnel is heavily recommended for continuous monitoring of Elasticsearch.


Elasticsearch is a distributed full-text search and analytics engine, that enables multiple tenants to search through their entire data sets, regardless of size, at unprecedented speeds. In addition to its full-text search capabilities, ElasticSearch doubles as an analytics system and distributed database. ElasticSearch has great defaults to get started. But once past the initial experimentation stage you must spend some time to tweak the settings for your needs. It is recommended that you revisit your configuration later, along with the official documentation, to ensure that your cluster is configured to meet your needs.

Abhinay Dronavalli

Written by

A student in the university of Life.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade