From Zero to Hero, MultiModel & Autocompletion Search with Elasticsearch & Rails

Elastic-What?

David León Calermo
Wolox

--

“My name is Search… ElasticSearch”

Elasticsearch is an open-source, enterprise-grade search engine which can power extremely fast searches that support all data discovery applications. With Elasticsearch we can store, search and analyze big volumes of data quickly and in near real time. It is generally used as the underlying search engine that powers applications that have simple/complex search features and requirements.

It’s document-oriented, that means that stores complex entities as structured JSON documents and indexes all fields by default, providing a higher performance.

ElasticSearch, are you sure?

Are you sure about this? It’s not that I don’t believe you, but if I were you I would check again to see if your problem can’t be solved with a better-defined query.
Mostly it could be better to ask someone else if this is the right path, or think about how you are facing the problem, as we mentioned before, is an enterprise-grade search engine, you need to consider if your problem worth the use of Elasticsearch, otherwise will be like killing a bug with a piano.

“You need to consider if your problem worth the use of Elasticsearch, otherwise it will be like killing a bug with a piano.”

Ok, if you are still here that means that we will do our best to pull this forward.

First of all, you obviously need to install Elasticsearch. Here the LINK.

Kibana, your hero in this mess

Trust me, you will need it.

I spent almost 3 weeks trying mapping, queries, and indexes over and over locally and debugging trying to find whats wrong.

Kibana is the developing tool that allows you to try requests over your local ElasticSearch server, that will help you to understand how it works a lot faster. It is necessary to be downloaded and you can easily download and install it through this link: Kibana.

I want it Rails way

All right, so now the first things that we will need are this beautiful gems over here:

#Gemfilegem 'elasticsearch-rails'
gem 'elasticsearch-model'
group :test do
gem 'elasticsearch-extensions'
end

Elasticsearch-rails: contains features for RoR applications.
Elasticsearch-model: contains search integration for Ruby/Rails models.
Elasticsearch-extensions: this is the one we will use for testing.

Great, now that we already have those gems in our Gemfile, we need to run bundle install in the terminal.

Now that is finally installed, we are going to start our application, for this case we have our friend Johnny, who is looking for an activity for his son, so that’s why he wants to search through all the possible activities nearby, at first he thought that maybe learning a martial art could be a good solution.

So that’s why at first we need the concern to work as a searchable property, for the models who wanted to be searched through.

# models/concerns/searchable.rbrequire "elasticsearch/model"module Searchable
extend ActiveSupport::Concern
included do
include Elasticsearch::Model
after_commit :index_document, if: :persisted?
after_commit on: [:destroy] do
__elasticsearch__.delete_document
end
end
private def index_document
__elasticsearch__.index_document
end
end

Here we are including the Elasticsearch::Model to be able to use the gems methods.
The after_commit parts says that with every creation, updates or deletions to the model, it will index or delete the associated document in Elasticsearch.

After that, we need to add the concern to the desired model.

# /models/karate_dojo.rbclass KarateDojo < ApplicationRecord
include Searchable

# we are assuming this fields as string fields
validates :name, :city, :activity_code, :category, presence: true


settings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :name
indexes :city
indexes :activity_code
indexes :category
end
end
end

Here are we including the Searchable concern in the model, validating the presence of the fields, after that, we see something weird, the settings part.
That section works as a settings configuration for the document, it will define the number of shards, for now, 1 is ok, and we will set which fields do we want to search through.

Now we have to think about this, we are adding the Elasticsearch Integration to our Rails App, but maybe we already have existing data in our DB and the ElasticSearch Indexes will be empty at first, so we need an importer to run it manually to migrate our data to the indexes.

# poros/elasticsearch_data_importer.rb  module ElasticsearchDataImporter
def self.import
[KarateDojo].each do |model_to_search|
model_to_search.__elasticsearch__.create_index!(force: true)
model_to_search.find_in_batches do |records|
bulk_index(records, model_to_search)
end
end
end
def self.prepare_records(records)
records.map do |record|
{
index: {
_id: record.id,
data: record.__elasticsearch__.as_indexed_json
}
}
end
end
def self.bulk_index(records, model)
model.__elasticsearch__.client.bulk(
index: model.__elasticsearch__.index_name,
type: model.__elasticsearch__.document_type,
body: prepare_records(records)
)
end
end

First, we create an index for each one of the models that we will search for, and then we will import our records in batches to the ES database, mainly to avoid importing all the records at once.

To run the importer you need to run this in your console.

# You need to have ElasticSearch running on port 9200 to make this 
# work.
ElasticsearchDataImporter.import

Okay, now it looks like we are almost set… not yet, we need to define the search itself.

At this point, you can create a KarateDojo.

KarateDojo.create(name: 'Cobra Kai', city: 'Pasadena', category: 'teenagers', activity_code:'karate')

And it would be ready to be searched.

# This should return how many records were found
KarateDojo.__elasticsearch__.search('Cobra Kai').results.total
# This should return the found records
KarateDojo.__elasticsearch__.search('Cobra Kai').response.hits.hits

This could be a first approach to test if all of our effort was worth it. But the thing that we really need, is a Controller.

# controllers/search_controller.rbclass SearchController < ApplicationController
def index
search_result = Elasticsearch::Model.search(
params[:query].to_s, [KarateDojo]
).records.records
render json: search_result, status: :found
end
end
# routes.rbresources :search, only: [] do
collection do
get :index
end
end

This is the SearchController, where we use the method Elasticsearch::Model.search and we will input the string and the Model (or Models) that we want to search in.

Now we are ready to try this.

This can read my mind? Nope, it’s just a well-defined search

At this moment your search will bring results if you search for Cobra Kai, but if you search for Cebra Key it will fail.

So we need to upgrade our search.

# services/autocompleter.rbclass Autocompleter
MODELS_TO_SEARCH = [KarateDojo].freeze
attr_accessor :query
def initialize(query)
@query = query
end
def self.call(query)
new(query).call
end
def call
results.map do |result|
{
hint: build_hint(result),
record_type: result.class.name,
record_id: result.id
}
end
end
private def results
Elasticsearch::Model.search(search_query,
MODELS_TO_SEARCH).records
end
def build_hint(record)
HintBuilder.call(record)
end
def search_query
{
"size": 50,
"query": {
"function_score": {
"query": {
"bool": {
"must": [multi_match]
}
},
"functions": priorities
}
}
}
end
def multi_match
{
"multi_match": {
"query": @query,
"fields": %w[name category city activity_code],
"fuzziness": 'auto'
}
}
end
def priorities
[
{
"filter": {
"term": { "_type": 'karate_dojo' }
},
"weight": 5000
}
]
end
end

Here we are tuning our search with a more helpful result of what you actually need from the search.

Also, we are building a better and more refined search.

  • The size part is for how many results will be returned.
  • The function_score allows you to modify the score of documents that are retrieved by the query, to give them some weight (also mentioned down below) used for further searches.
  • The multi_match query allows you to search to multiple fields, now we could be able to search through name, category, city, and activity_code.
  • The query part is the text that we are searching search for.
  • The fuzziness job is trying to solve the typo issues, the auto value will cover up to 2 characters of misspelling.
  • The functions help us to define the priorities that the search will have, in this case, we will give more importance to the results belonging to the KarateDojo class. Believe me, this will make more sense later.
  • The HintBuilder will be in charge of building the final response to the search.
# services/hint_builder.rbclass HintBuilder
attr_accessor :record
def initialize(record)
@record = record
end
def self.call(record)
new(record).call
end
def call
KarateDojoResultBuilder.new(@record).autocomplete_hint
end
end
# services/result_builder_base.rbclass ResultBuilderBase
def initialize(record)
@record = record
end
private attr_reader :record
end
# services/karate_dojo_result_builder.rbclass KarateDojoResultBuilder < ResultBuilderBase
def autocomplete_hint
"#{record.name}, #{record.city}"
end
end

The HintBuilder will build the response with the help of the KarateDojoResultBuilder based on the ResultBuilderBase , that will return the name and city as the body of the record.

And you need to update your search_controller.rb to replace how are you calling the search method.

# controllers/search_controller.rb
def index
search_result = Autocompleter.call(params[:query].to_s)
render json: search_result, status: :found
end

After all of this work, we will have a well-defined search that has defined the fields where should be searching into and knowing if you misspelled some word.

Why just one? We can take over the world!

“Ok, now is when all the things start to come together.”

Remember Johnny? Now he is not so sure about Karate, he was talking with his son and now they are also considering DramaClub, SoccerClub, and PhotographyWorkshop.

And that means that we need to find them a solution.

We will asume the following model structures.

# models/class DramaClub < ActiveRecord
validates :title, :city, :category, :time_range, presence: true
end
class SoccerClub < ActiveRecord
validates :title, :city, :sponsors, :professional, presence: true
end
class PhotographyWorkshop < ActiveRecord
validates :name, :city, :max_students, :available_cameras,
presence: true
end

As you can see, we have different fields here, and how are we going to search through all of this records at once ?

Luckily we’ve been setting the ground for this.

For time-saving we will show the process of adding one model to the search, and the rest of them will be the same process.

First, we need to update Model to add.

class DramaClub < ActiveRecord 
include Searchable
validates :title, :city, :category, :time_range, presence: true

settings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :title
indexes :city
indexes :category
indexes :time_range
end
end
end

We need to update the ElasticsearchDataImporter to add the new model at importing the Database.

# to change this line
[KarateDojo].each do |model_to_search|
# into this one
[KarateDojo, DramaClub].each do |model_to_search|

The Autocompleter needs an update too to make this works.

MODELS_TO_SEARCH = [KarateDojo, DramaClub].freeze

Also, we need to add the fields of the new model that we want to search through.

# Here we should all the fields that will be part of the search"fields": %w[name category city activity_code title time_range]

And some minor changes to the HintBuilder.

# instead of directly calling KarateDojo
def call
KarateDojoResultBuilder.new(@record).autocomplete_hint
end
# we will use a method to call each one of the ResultBuilders
def call
result_builder.autocomplete_hint
end
privatedef result_builder
"#{@record.class}ResultBuilder".constantize.new(@record)
end
# and as a consequence we will need to create anothe result builder
class DramaClubResultBuilder < ResultBuilderBase
def autocomplete_hint
"#{record.title}, #{record.city}"
end
end

Now we are covering all possible outcomes for the response, and you will get some response like this if we search for Pasadena.

[
{
hint: "Cobra Kai, Pasadena",
record_type: "KarateDojo",
record_id: 1
},
{
hint: "Macbeth Drama Club, Pasadena",
record_type: "DramaClub",
record_id: 1
}
]

“But I want that the search knows what I want to search before I even start”

To do this we need to make a few touches in the Searchable file.

We are going to use nGrams. This means that it wont search by the text itself, it will search for the text and parts of the text.

For more information about nGrams you can take a look here:
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/analysis-ngram-tokenizer.html

# models/concerns/searchable.rb# outside the included block
ngram_filter = { type: 'nGram', min_gram: 2, max_gram: 20 }
ngram_analyzer = {
type: 'custom',
tokenizer: 'standard',
filter: %w[lowercase asciifolding ngram_filter]
}
whitespace_analyzer = {
type: 'custom',
tokenizer: 'whitespace',
filter: %w[lowercase asciifolding]
}
# inside the included block settings analysis: {
filter: {
ngram_filter: ngram_filter
},
analyzer: {
ngram_analyzer: ngram_analyzer,
whitespace_analyzer: whitespace_analyzer
}
}

Here we are defining the ngram_filter, ngram_analyzer & whitespace analyzer :

  • ngram_filter : defines the minimum and maximum amount of characters together in a block to perform the search
  • ngram_analyzer : defines under how it will tokenize the stream of characters in the ngram.
  • whitespace_analyzer : breaks text into terms whenever it encounters a whitespace character.

And we need to use those analyzers in each model, that you want to search for:

# models/karate_dojo.rbsettings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :name, type: 'text', analyzer: 'ngram_analyzer',
search_analyzer: 'whitespace_analyzer'
indexes :city, type: 'text', analyzer: 'ngram_analyzer',
search_analyzer: 'whitespace_analyzer'
indexes :activity_code, type: 'text',
analyzer: 'ngram_analyzer',
search_analyzer: 'whitespace_analyzer'
indexes :category, type: 'text', analyzer: 'ngram_analyzer',
search_analyzer: 'whitespace_analyzer'
end
end

This should be adding to the mapping and indexing the new analyzers to which it should search through.

How should I test this?

Finally! Jonny will have his search through all of the activities in Pasadena! 🎉

But wait, we need to make tests for this, we were just too close.

Remember the gem 'elasticsearch-extensions' ? This is the one that will help us testing our code.

First of all, we are going to work on our spec_helper to start and stop Elasticsearch when it’s needed to.

# spec/support/spec_helper.rb
# This will allow us to use the following methods
require 'elasticsearch/extensions/test/cluster'
# It will start ES unless if it's already running
config.before :all, elasticsearch: true do
Elasticsearch::Extensions::Test::Cluster.start() unless
Elasticsearch::Extensions::Test::Cluster.running?(port: 9200)
end
# It will create all the indexes and load the documents if exists.
config.before :each, elasticsearch: true do
ActiveRecord::Base.descendants.each do |model|
if model.respond_to?(:__elasticsearch__)
begin
model.__elasticsearch__.create_index!
model.__elasticsearch__.refresh_index!
rescue Elasticsearch::Transport::Transport::Errors::NotFound => e
puts "There was an error creating the elasticsearch index
for #{model.name}: #{e.inspect}"
end
end
end
end
# After all the tests were done it will stop the ES instance if it's # still running
config.after :suite do
Elasticsearch::Extensions::Test::Cluster.stop if
Elasticsearch::Extensions::Test::Cluster.running?(port: 9200)
end
# It will delete the indexes after each test.
config.after :each, elasticsearch: true do
ActiveRecord::Base.descendants.each do |model|
if model.respond_to?(:__elasticsearch__)
begin
model.__elasticsearch__.delete_index!
rescue Elasticsearch::Transport::Transport::Errors::NotFound => e
puts "There was an error removing the elasticsearch index
for #{model.name}: #{e.inspect}"
end
end
end
end

Ok, now this will start and stop when it’s necessary and it will create and delete the indexes too.
You might be asking, how ES knows in which port should run and how many instances it should have.

It doesn’t, we need to define it.

But only with the help of the .env file

# .env
TEST_CLUSTER_PORT=9200
TEST_CLUSTER_NODES=1
TEST_CLUSTER_COMMAND=/path/to/elasticsearch/file
TEST_CLUSTER_NAME=my_testing_cluster
ELASTICSEARCH_URL=http://localhost:9200

The TEST_CLUSTER environment variables will be consumed by Elasticsearch::Extensions .

And remember that all the current tests where you have created one of the Searchable models need to have this in the tests.

# this will enable elasticsearch for this test.describe SearchController, elasticsearch: true do
...
end

CircleCI keeps failing, what am I doing wrong?

“What? Why this keeps failing?”

Now you are thinking, how am i suppose to make to run Elasticsearch in CircleCI?.

This is just our final part and we are ready.

You need to add this changes in the config file of CircleCI.

#.circleci/config.yml
steps:
- run:
name: Starting Elasticsearch
command: |
wget
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.0.tar.gz
tar -xzf elasticsearch-6.3.0.tar.gz
background: true
sleep: 5

This will download a version of elasticsearch to be ready to run it on CircleCI.

And now the only thing left is to update the TEST_CLUSTER_COMMAND environment variable but in CircleCI.

Commonly starts with /home/circleci/repo/ and then the path to the Elasticsearch file, like /home/circleci/repo/elasticsearch-6.3.0/bin/elasticsearch

CircleCI Environment Variables Config

The NODES can be 3 for now, and the PORT it’s usually around 9200 – 9300

Now CircleCI is ready.

Finally!

We did it!! We were able to create a good search app where Johnny will be able to find a sport/activity for his son, searching through different fields and models.

This was a hell for me find out most of the thing to make it work, so I hope that this “little” guide could make things easier for you.

“I’ve found a better way”

Oh please, I would love to hear(read) you about, any kind of comments you are more than welcome to send me an email to leon.calermo@wolox.com.ar .

Also, all of this work is based on this repo, that is waiting to receive pull requests and comments about this can be improved: Github Repo.

Helpful links and resources

The helpful pages that helped me to survive this full dive into ElasticSearch.

--

--