Managing your Elasticsearch indexes in Rails

Most Rails apps start life with a single relational database (generally PostgreSQL or MySQL), which are great for many purposes, but can get pretty slow when doing unstructured searches over huge datasets. When you hit this kind of problem, it’s time to reach for something like Elasticsearch or Solr.

At carwow, a few of the team already had experience using Elasticsearch, so we went with that.

One of the bigger challenges with using a specialised search tool is keeping the data in sync between your main database and the search index. There’s a great blog post called Syncing Postgres to Elasticsearch over on the GoCardless blog, that details how to make sure you always get the right data pushed into Elasticsearch.

At the end, it mentions zero-downtime reindexing, which is detailed in the Elasticsearch docs. We’ve put together a little gem which manages most of this for you. There’s two main classes:

  • IndexManager : this organises Elasticsearch aliases for you, so you can re-index without downtime
  • Indexer : this proves helpful methods for indexing and removing individual documents or batches of documents from the index, and hides the complexity of the aliases

Here’s an example. First, create a configuration for your index:

class MyModel < ApplicationRecord
...
end

MyModelIndex = EsIndex.new(
client: Elasticsearch::Client.new(...),
mapping: {
...
},
data_source: MyModel.some_scope
) do |my_model|
# this block transforms an instance of MyModel into the hash which goes into Elasticsearch
{
attr_1: my_model.attr_1,
attr_2: my_model.attr_2,
attr_3: my_model.attr_3
}
end

Now you’ll need to work out where this model gets updated and deleted, and ensure that we send the update to the index afterwards:

indexer = EsIndex::Indexer.new(MyModelIndex)
indexer.index_record(my_model)
indexer.index_batch([my_model1, my_model2])
indexer.delete_by_id(123)
indexer.delete_by_ids([123, 456])
indexer.delete_by_query(...)

This is how you re-index everything:

new_name = SecureRandom.hex(3)
index_manager = EsIndex::IndexManager.new(MyModelIndex)

First, we create a new index called my_model_{new_name} , and add it to the write alias, my_model_write :

index_manager.create_index(new_name)

Now, every time the Indexer makes any changes, it’ll apply them to the old index and the new index, since both are in the write alias.

Next, we populate the new index, which just re-indexes every record from the scope configured:

index_manager.populate_index(new_name)

At this point, you’ll have two indexes:

  • my_model_{old_name} will still be the index for the read alias (my_model), and will still be an index for the write alias (my_model_write)
  • my_model_{new_name} will only be under the write alias

You’ll probably want to fire up your favourite Elasticsearch UI (e.g. Kibana, Kopf, Cerebro), and make sure that the new index looks about right. Then, it’s time to switch the aliases over, so your app starts sending queries to the shiny new index:

index_manager.switch_read_index(new_name)

If things start breaking, you can always switch back to the old one using:

index_manager.switch_read_index(old_name)

Once you’re confident everything is working fine with the new index, it’s time to cleanup. These two methods stop writing to the old index (i.e., remove it from the write alias), and then delete the old index altogether:

index_manager.stop_dual_writes
index_manager.cleanup_old_indices

Hopefully this saves you a bit of time when getting your Elasticsearch index set up!