From Zero to Hero, MultiModel & Autocompletion Search with Elasticsearch & Rails
Elastic-What?
“My name is Search… ElasticSearch”
Elasticsearch is an open-source, enterprise-grade search engine which can power extremely fast searches that support all data discovery applications. With Elasticsearch we can store, search and analyze big volumes of data quickly and in near real time. It is generally used as the underlying search engine that powers applications that have simple/complex search features and requirements.
It’s document-oriented, that means that stores complex entities as structured JSON documents and indexes all fields by default, providing a higher performance.
ElasticSearch, are you sure?
Are you sure about this? It’s not that I don’t believe you, but if I were you I would check again to see if your problem can’t be solved with a better-defined query.
Mostly it could be better to ask someone else if this is the right path, or think about how you are facing the problem, as we mentioned before, is an enterprise-grade search engine, you need to consider if your problem worth the use of Elasticsearch, otherwise will be like killing a bug with a piano.
“You need to consider if your problem worth the use of Elasticsearch, otherwise it will be like killing a bug with a piano.”
Ok, if you are still here that means that we will do our best to pull this forward.
First of all, you obviously need to install Elasticsearch. Here the LINK.
Kibana, your hero in this mess
Trust me, you will need it.
I spent almost 3 weeks trying mapping, queries, and indexes over and over locally and debugging trying to find whats wrong.
Kibana is the developing tool that allows you to try requests over your local ElasticSearch server, that will help you to understand how it works a lot faster. It is necessary to be downloaded and you can easily download and install it through this link: Kibana.
I want it Rails way
All right, so now the first things that we will need are this beautiful gems over here:
#Gemfilegem 'elasticsearch-rails'
gem 'elasticsearch-model'group :test do
gem 'elasticsearch-extensions'
end
Elasticsearch-rails: contains features for RoR applications.
Elasticsearch-model: contains search integration for Ruby/Rails models.
Elasticsearch-extensions: this is the one we will use for testing.
Great, now that we already have those gems in our Gemfile, we need to run bundle install
in the terminal.
Now that is finally installed, we are going to start our application, for this case we have our friend Johnny, who is looking for an activity for his son, so that’s why he wants to search through all the possible activities nearby, at first he thought that maybe learning a martial art could be a good solution.
So that’s why at first we need the concern to work as a searchable property, for the models who wanted to be searched through.
# models/concerns/searchable.rbrequire "elasticsearch/model"module Searchable
extend ActiveSupport::Concern included do
include Elasticsearch::Model
after_commit :index_document, if: :persisted?
after_commit on: [:destroy] do
__elasticsearch__.delete_document
end
end private def index_document
__elasticsearch__.index_document
end
end
Here we are including the Elasticsearch::Model
to be able to use the gems methods.
The after_commit
parts says that with every creation, updates or deletions to the model, it will index
or delete
the associated document in Elasticsearch.
After that, we need to add the concern to the desired model.
# /models/karate_dojo.rbclass KarateDojo < ApplicationRecord
include Searchable
# we are assuming this fields as string fields
validates :name, :city, :activity_code, :category, presence: true
settings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :name
indexes :city
indexes :activity_code
indexes :category
end
end
end
Here are we including the Searchable concern in the model, validating the presence of the fields, after that, we see something weird, the settings
part.
That section works as a settings configuration for the document, it will define the number of shards, for now, 1 is ok, and we will set which fields do we want to search through.
Now we have to think about this, we are adding the Elasticsearch Integration to our Rails App, but maybe we already have existing data in our DB and the ElasticSearch Indexes will be empty at first, so we need an importer to run it manually to migrate our data to the indexes.
# poros/elasticsearch_data_importer.rb module ElasticsearchDataImporter
def self.import
[KarateDojo].each do |model_to_search|
model_to_search.__elasticsearch__.create_index!(force: true) model_to_search.find_in_batches do |records|
bulk_index(records, model_to_search)
end
end
end def self.prepare_records(records)
records.map do |record|
{
index: {
_id: record.id,
data: record.__elasticsearch__.as_indexed_json
}
}
end
end def self.bulk_index(records, model)
model.__elasticsearch__.client.bulk(
index: model.__elasticsearch__.index_name,
type: model.__elasticsearch__.document_type,
body: prepare_records(records)
)
end
end
First, we create an index for each one of the models that we will search for, and then we will import our records in batches to the ES database, mainly to avoid importing all the records at once.
To run the importer you need to run this in your console.
# You need to have ElasticSearch running on port 9200 to make this
# work.
ElasticsearchDataImporter.import
Okay, now it looks like we are almost set… not yet, we need to define the search itself.
At this point, you can create a KarateDojo.
KarateDojo.create(name: 'Cobra Kai', city: 'Pasadena', category: 'teenagers', activity_code:'karate')
And it would be ready to be searched.
# This should return how many records were found
KarateDojo.__elasticsearch__.search('Cobra Kai').results.total# This should return the found records
KarateDojo.__elasticsearch__.search('Cobra Kai').response.hits.hits
This could be a first approach to test if all of our effort was worth it. But the thing that we really need, is a Controller.
# controllers/search_controller.rbclass SearchController < ApplicationController
def index
search_result = Elasticsearch::Model.search(
params[:query].to_s, [KarateDojo]
).records.records
render json: search_result, status: :found
end
end# routes.rbresources :search, only: [] do
collection do
get :index
end
end
This is the SearchController, where we use the method Elasticsearch::Model.search
and we will input the string and the Model (or Models) that we want to search in.
Now we are ready to try this.
This can read my mind? Nope, it’s just a well-defined search
At this moment your search will bring results if you search for Cobra Kai
, but if you search for Cebra Key
it will fail.
So we need to upgrade our search.
# services/autocompleter.rbclass Autocompleter
MODELS_TO_SEARCH = [KarateDojo].freeze
attr_accessor :query def initialize(query)
@query = query
end def self.call(query)
new(query).call
end def call
results.map do |result|
{
hint: build_hint(result),
record_type: result.class.name,
record_id: result.id
}
end
end private def results
Elasticsearch::Model.search(search_query,
MODELS_TO_SEARCH).records
end def build_hint(record)
HintBuilder.call(record)
end def search_query
{
"size": 50,
"query": {
"function_score": {
"query": {
"bool": {
"must": [multi_match]
}
},
"functions": priorities
}
}
}
end def multi_match
{
"multi_match": {
"query": @query,
"fields": %w[name category city activity_code],
"fuzziness": 'auto'
}
}
end def priorities
[
{
"filter": {
"term": { "_type": 'karate_dojo' }
},
"weight": 5000
}
]
end
end
Here we are tuning our search with a more helpful result of what you actually need from the search.
Also, we are building a better and more refined search.
- The
size
part is for how many results will be returned. - The
function_score
allows you to modify the score of documents that are retrieved by the query, to give them someweight
(also mentioned down below) used for further searches. - The
multi_match
query allows you to search to multiplefields
, now we could be able to search throughname, category, city, and activity_code
. - The
query
part is the text that we are searching search for. - The
fuzziness
job is trying to solve the typo issues, theauto
value will cover up to 2 characters of misspelling. - The
functions
help us to define the priorities that the search will have, in this case, we will give more importance to the results belonging to theKarateDojo
class. Believe me, this will make more sense later. - The
HintBuilder
will be in charge of building the final response to the search.
# services/hint_builder.rbclass HintBuilder
attr_accessor :record def initialize(record)
@record = record
end def self.call(record)
new(record).call
end def call
KarateDojoResultBuilder.new(@record).autocomplete_hint
end
end# services/result_builder_base.rbclass ResultBuilderBase
def initialize(record)
@record = record
end private attr_reader :record
end# services/karate_dojo_result_builder.rbclass KarateDojoResultBuilder < ResultBuilderBase
def autocomplete_hint
"#{record.name}, #{record.city}"
end
end
The HintBuilder
will build the response with the help of the KarateDojoResultBuilder
based on the ResultBuilderBase
, that will return the name
and city
as the body of the record.
And you need to update your search_controller.rb
to replace how are you calling the search
method.
# controllers/search_controller.rb
def index
search_result = Autocompleter.call(params[:query].to_s)
render json: search_result, status: :found
end
After all of this work, we will have a well-defined search that has defined the fields where should be searching into and knowing if you misspelled some word.
Why just one? We can take over the world!
“Ok, now is when all the things start to come together.”
Remember Johnny? Now he is not so sure about Karate, he was talking with his son and now they are also considering DramaClub
, SoccerClub
, and PhotographyWorkshop
.
And that means that we need to find them a solution.
We will asume the following model structures.
# models/class DramaClub < ActiveRecord
validates :title, :city, :category, :time_range, presence: true
endclass SoccerClub < ActiveRecord
validates :title, :city, :sponsors, :professional, presence: true
endclass PhotographyWorkshop < ActiveRecord
validates :name, :city, :max_students, :available_cameras,
presence: true
end
As you can see, we have different fields here, and how are we going to search through all of this records at once ?
Luckily we’ve been setting the ground for this.
For time-saving we will show the process of adding one model to the search, and the rest of them will be the same process.
First, we need to update Model to add.
class DramaClub < ActiveRecord
include Searchable
validates :title, :city, :category, :time_range, presence: true
settings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :title
indexes :city
indexes :category
indexes :time_range
end
end
end
We need to update the ElasticsearchDataImporter
to add the new model at importing the Database.
# to change this line
[KarateDojo].each do |model_to_search|# into this one
[KarateDojo, DramaClub].each do |model_to_search|
The Autocompleter
needs an update too to make this works.
MODELS_TO_SEARCH = [KarateDojo, DramaClub].freeze
Also, we need to add the fields of the new model that we want to search through.
# Here we should all the fields that will be part of the search"fields": %w[name category city activity_code title time_range]
And some minor changes to the HintBuilder
.
# instead of directly calling KarateDojo
def call
KarateDojoResultBuilder.new(@record).autocomplete_hint
end# we will use a method to call each one of the ResultBuilders
def call
result_builder.autocomplete_hint
endprivatedef result_builder
"#{@record.class}ResultBuilder".constantize.new(@record)
end# and as a consequence we will need to create anothe result builder
class DramaClubResultBuilder < ResultBuilderBase
def autocomplete_hint
"#{record.title}, #{record.city}"
end
end
Now we are covering all possible outcomes for the response, and you will get some response like this if we search for Pasadena
.
[
{
hint: "Cobra Kai, Pasadena",
record_type: "KarateDojo",
record_id: 1
},
{
hint: "Macbeth Drama Club, Pasadena",
record_type: "DramaClub",
record_id: 1
}
]
“But I want that the search knows what I want to search before I even start”
To do this we need to make a few touches in the Searchable
file.
We are going to use nGrams
. This means that it wont search by the text itself, it will search for the text and parts of the text.
For more information about
nGrams
you can take a look here:
https://www.elastic.co/guide/en/elasticsearch/reference/6.4/analysis-ngram-tokenizer.html
# models/concerns/searchable.rb# outside the included block
ngram_filter = { type: 'nGram', min_gram: 2, max_gram: 20 }
ngram_analyzer = {
type: 'custom',
tokenizer: 'standard',
filter: %w[lowercase asciifolding ngram_filter]
}
whitespace_analyzer = {
type: 'custom',
tokenizer: 'whitespace',
filter: %w[lowercase asciifolding]
}# inside the included block settings analysis: {
filter: {
ngram_filter: ngram_filter
},
analyzer: {
ngram_analyzer: ngram_analyzer,
whitespace_analyzer: whitespace_analyzer
}
}
Here we are defining the ngram_filter, ngram_analyzer & whitespace analyzer
:
ngram_filter
: defines the minimum and maximum amount of characters together in a block to perform the searchngram_analyzer
: defines under how it willtokenize
the stream of characters in the ngram.whitespace_analyzer
: breaks text into terms whenever it encounters a whitespace character.
And we need to use those analyzers in each model, that you want to search for:
# models/karate_dojo.rbsettings index: { number_of_shards: 1 } do
mappings dynamic: 'false' do
indexes :name, type: 'text', analyzer: 'ngram_analyzer',
search_analyzer: 'whitespace_analyzer'
indexes :city, type: 'text', analyzer: 'ngram_analyzer',
search_analyzer: 'whitespace_analyzer'
indexes :activity_code, type: 'text',
analyzer: 'ngram_analyzer',
search_analyzer: 'whitespace_analyzer'
indexes :category, type: 'text', analyzer: 'ngram_analyzer',
search_analyzer: 'whitespace_analyzer'
end
end
This should be adding to the mapping and indexing the new analyzers to which it should search through.
How should I test this?
Finally! Jonny will have his search through all of the activities in Pasadena! 🎉
But wait, we need to make tests for this, we were just too close.
Remember the gem 'elasticsearch-extensions'
? This is the one that will help us testing our code.
First of all, we are going to work on our spec_helper
to start and stop Elasticsearch when it’s needed to.
# spec/support/spec_helper.rb
# This will allow us to use the following methods
require 'elasticsearch/extensions/test/cluster'# It will start ES unless if it's already running
config.before :all, elasticsearch: true do
Elasticsearch::Extensions::Test::Cluster.start() unless
Elasticsearch::Extensions::Test::Cluster.running?(port: 9200)
end# It will create all the indexes and load the documents if exists.
config.before :each, elasticsearch: true do
ActiveRecord::Base.descendants.each do |model|
if model.respond_to?(:__elasticsearch__)
begin
model.__elasticsearch__.create_index!
model.__elasticsearch__.refresh_index!
rescue Elasticsearch::Transport::Transport::Errors::NotFound => e
puts "There was an error creating the elasticsearch index
for #{model.name}: #{e.inspect}"
end
end
end
end# After all the tests were done it will stop the ES instance if it's # still running
config.after :suite do
Elasticsearch::Extensions::Test::Cluster.stop if
Elasticsearch::Extensions::Test::Cluster.running?(port: 9200)
end# It will delete the indexes after each test.
config.after :each, elasticsearch: true do
ActiveRecord::Base.descendants.each do |model|
if model.respond_to?(:__elasticsearch__)
begin
model.__elasticsearch__.delete_index!
rescue Elasticsearch::Transport::Transport::Errors::NotFound => e
puts "There was an error removing the elasticsearch index
for #{model.name}: #{e.inspect}"
end
end
end
end
Ok, now this will start and stop when it’s necessary and it will create and delete the indexes too.
You might be asking, how ES knows in which port should run and how many instances it should have.
It doesn’t, we need to define it.
But only with the help of the .env
file
# .env
TEST_CLUSTER_PORT=9200
TEST_CLUSTER_NODES=1
TEST_CLUSTER_COMMAND=/path/to/elasticsearch/file
TEST_CLUSTER_NAME=my_testing_cluster
ELASTICSEARCH_URL=http://localhost:9200
The TEST_CLUSTER
environment variables will be consumed by Elasticsearch::Extensions
.
And remember that all the current tests where you have created one of the Searchable
models need to have this in the tests.
# this will enable elasticsearch for this test.describe SearchController, elasticsearch: true do
...
end
CircleCI keeps failing, what am I doing wrong?
“What? Why this keeps failing?”
Now you are thinking, how am i suppose to make to run Elasticsearch in CircleCI?.
This is just our final part and we are ready.
You need to add this changes in the config file of CircleCI.
#.circleci/config.yml
steps:
- run:
name: Starting Elasticsearch
command: |
wget
https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.3.0.tar.gz
tar -xzf elasticsearch-6.3.0.tar.gz
background: true
sleep: 5
This will download a version of elasticsearch to be ready to run it on CircleCI.
And now the only thing left is to update the TEST_CLUSTER_COMMAND
environment variable but in CircleCI.
Commonly starts with
/home/circleci/repo/
and then the path to the Elasticsearch file, like/home/circleci/repo/elasticsearch-6.3.0/bin/elasticsearch
The NODES
can be 3 for now, and the PORT
it’s usually around 9200 – 9300
Now CircleCI is ready.
Finally!
We did it!! We were able to create a good search app where Johnny will be able to find a sport/activity for his son, searching through different fields and models.
This was a hell for me find out most of the thing to make it work, so I hope that this “little” guide could make things easier for you.
“I’ve found a better way”
Oh please, I would love to hear(read) you about, any kind of comments you are more than welcome to send me an email to leon.calermo@wolox.com.ar
.
Also, all of this work is based on this repo, that is waiting to receive pull requests and comments about this can be improved: Github Repo.
Helpful links and resources
The helpful pages that helped me to survive this full dive into ElasticSearch.