5 Ways to Reduce Bills and Improve Performance — Elasticsearch

Pawneet Singh
Jul 22 · 4 min read
Image Credit

Elasticsearch is an open-source, RESTful, distributed search engine built on Apache Lucene. It is the most popular and highly scalable search engine.

Searchkick is a wrapper for Elasticsearch that makes it really easy to use it with Ruby on Rails. Searchkick makes it pretty easy to get started with Elasticsearch. However, scaling it might sometimes result in low performance and huge server bills. Increasing the size and number of the servers seems to the least resistant path to the problem.

This is a guide on how to save overheads and increase the performance of your Elasticsearch cluster in rails application. However, it is not specific to ruby on rails, the non-rails implementation is also provided for each case.

1. Searchable and filterable

Sometimes, you just need to store some fields in Elasticsearch but do not want to perform search or filter operations on them. Searchkick by default makes all fields searchable, which results in increased overheads.

Scope

  • Fast indexing
  • Smaller index size i.e. low storage

Rails Implementation

class Product < ApplicationRecord
searchkick searchable: [:name], filterable: [:brand]
end

Non-Rails Implementation

Elasticsearch provides an enabled setting which can be applied only to the top-level mapping definition.

2. Number of nodes, shards, and replicas

If you are familiar with Elasticsearch, you might have heard about the intricate concept of shards and replicas. The number of nodes depends on a number of factors like the amount of data, complexity of the search query, number of queries per minute etcetera. You will get across various sources that would give you the formula to calculate the number of nodes, shards, and replicas. But every use-case is different. So, the best way to go around the problem is to play around with the number and to monitor the resource utilization regularly.

The downside of playing around with shards and replicas is you have to reindex the data every time you need to change the number. If you have a small data set you can tweak the number but in case of large data sets, it can take a large amount of time. You can use parallel indexing in case of large datasets. You can read more about the optimal number of shards per node.

Scope

  • Smaller CPU
  • Reduced storage
  • Lesser memory consumption

Rails Implementation

class Product < ApplicationRecord
searchkick settings: {number_of_shards: 1, number_of_replicas: 1}
end

Non-Rails Implementation

Shards and replicas can be defined in the settings associated with the Elasticsearch’s Index API.

3. Control what records to index

We sometimes do not need to index all records in the Elasticsearch. Searchkick indexes all records in the table but it provides a way to select what records to index.

Scope

  • Smaller index
  • Faster queries

Rails Implementation

class Product < ApplicationRecord
scope :search_import, -> { where(active: true) }

def should_index?
active # only index active records
end
end

Non-Rails Implementation

You can use a control statement before inserting a record in the index to check whether a record needs to be indexed or not.

4. Routing

Routing can significantly increase Elasticsearch’s performance. Routing determines in which shard the data resides. When there are multiple shards, all the shards are queried and the results from each shard are merged to get the actual result. Elasticsearch broadcasts the query in all shards. Routing tells Elasticsearch where to search for the document. Routing can be based on Elasticsearch’s id or any external id.

Scope

  • Faster results
  • Low search bandwidth
  • Less memory consumption

Rails Implementation

class Business < ApplicationRecord
searchkick routing: true

def search_routing
city_id
end
end

Search using the following command:

Business.search "ice cream", routing: params[:city_id]

Non-Rails Implementation

_routing field is used in Elasticsearch.

5. Monitoring your resources

When an Elasticsearch cluster fails or throws a 502 error, our general tendency is to increase the size of the instance. But you should keep a tab on the resource utilization. AWS or almost all other service providers provide a gamut of instance types for all your need. If your Elasticsearch cluster requires more memory, you can use compute-optimized server and alternatively memory-optimized if you need more memory. You can monitor your CPU and memory utilization.

Bonus: Rails Specific Performance Tweaking

1. Fast JSON generation

Parsing and generation of JSON can significantly increase Elasticsearch performance. Oj is a fast JSON parser for rails. Searchkick is automatically usees Oj to parse and generate JSON if the gem is included in your application.

gem 'oj'

2. Persistent HTTP Connections

Typhoeus is a ruby gem that runs HTTP requests in parallel while cleanly encapsulating handling logic. You just need to add this to your Gemfile and it will be automatically used by Searchkick.

gem 'typhoeus'

Happy Coding!

    Pawneet Singh

    Written by

    Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
    Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
    Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade