Get to Know the Power of Elasticsearch From a Rails App Perspective

Tools for using Elasticsearch with custom index analyzers and filters

Franco Prudhomme
Wolox

--

When working with Rails applications we tend to use different types of databases, depending on the necessity of each particular project. The project may need a certain type of database, yet at the same time require the ability to query one table with potentially hundreds of thousands of entries, searching by a text field and sorting the query results by proximity to a geolocation, and a combination of other columns in that table.

With all of these requirements, we reached the conclusion that a full-text search engine provides relevance and analysis to those databases. Hence why we decided to use Elasticsearch.

This post gives you the tools you’ll need for using Elasticsearch with custom index analyzers and filters. In addition to setting up the configuration for using Elasticsearch with AWS Elastic Beanstalk.

What you will need:

  1. Familiarize yourself with the basic concepts about Elasticsearch.
  2. Make sure you have Elasticsearch already installed on your computer and it’s up and running. (For this example we use Elasticsearch v2.3.0, you can follow this installation guide). You can validate that by opening http://localhost:9200 (or the port you specified to its configuration) and having a response similar to this:

Getting Started: Rails Integration

First, we must add Elasticsearch to our Rails application. The example below is an extract from a small app where we will query the Person model.

To integrate Elasticsearch to our Rails application simply follow the steps below:

  1. Add the library Elasticsearch-model to your gemfile and run bundle installs.

2. To the model you want to search include the Elasticsearch::Model.

3. Add a method to use for querying the model to Elasticsearch.

Our Elasticsearch::PersonQueryBuilder will be the class in charge of building the expected query hash of which Elasticsearch will be expecting to complete the query. You can take a look at our PersonQueryBuilder with the empty methods that we are going to implement in the following section.

4. Finally, add the fields to index of the documents

In this next section, we will add custom mappings and analyzers to our model.

Let’s do it!

Start with the text search. There are likely two difficulties one faces when implementing the text search over a field:

  1. The search input is given by users, and, therefore we can not expect that users will spell the text correctly as it was stored in the Elasticsearch database, for example, with special characters or letters capitalization
  2. Ex: searching “Mcfly” instead of “McFly”, or searching “México” instead of “Mexico” or the other way around.
  3. Users search by partial text, for example “Arnold Sch” or “nold Schwar” instead of “Arnold Schwarzenegger”.

Why is this a problem? Elasticsearch has different types of tokenizers and the correct one for this case is the keyword tokenizer.

To achieve this create a custom analyzer and implicit mapping to address and full name indexes. Add the following to the model:

This creates our string_lowercase analyzer that uses the keyword tokenizer (solution to item number 2). Add the lowercase filter that normalizes the text to lowercase and the custom filter called ascii_folding. The ASCII Folding token filter adds the option of preserving the original token you are indexing with the analyzer (solution to item number 1).

Now, add to the PersonQueryBuilder the correct method to match the desired text, displayed in the code below:

Notice the following points:

  • we normalized the text to lowercase (for item number 1)
  • we used I18n transliterate to change UTF-8 characters to ASCII (for item number 1)
  • we used a regex instead of matching for the exact text (for item number 2)

Finally, to add the relevance sort by proximity to geolocation we need to add the type geo_point to the address, as it is not detected automatically and the fields must be mapped explicitly.

This is accomplished by adding to the PersonQueryBuilder the following implementation of the sort_by_address method.

Now it’s just about calling the following command at the rails console:

Person.search({ full_name: ‘Arnold Sw’, lat: ‘34.603’, lng: ‘58.381’ })

And waiting for the query results.

Our final version of the PersonQueryBuilder class will be the following:

Indexing data to Elasticsearch

Every time you create, delete or update one of the fields of a Person record you must keep your Elasticsearch data up-to-date with your original database. Personally, I recommend not using Active Record callbacks to avoid callback hell. In this case, it is not necessary to keep the Elasticsearch database up-to-date, therefore, we can create background jobs and place them in a queue to process them later on. The following code is a Sidekiq job for keeping both databases synchronized.

You simply call the perform method with the specific operation (:index in the case you are creating or editing, and :delete in the case you are deleting that record) and the id of the record.

BONUS: Signing your request to Elasticsearch

If your Elasticsearch service is configured to only receive requests from trusted applications, you must sign the request you want to send to the Elasticsearch service.

Here is the modification for signing the request in case the environment is equal to production.

DONE!

Now you have your Rails app integrated with Elasticsearch as well as the basic knowledge of customizing your own analyzers with a geolocation sorting relevance. Don’t hesitate to check the official documentation of Elasticsearch to read more about all of its power! Link to documentation (remember to set the correct version you are looking for!)

If you have questions or suggestions regarding the post please feel to reach out!

--

--