Guide to ElasticSearch Mappings

Sindhuri Kalyanapu
inspiringbrilliance
5 min readApr 17, 2020

Written by Krupa and Sindhuri Kalyanapu

In this post, we’d like to share our experience in powering search in MongoDB and the journey we took in bootstrapping search in Elastic Search.

Our client, an ERP self-service platform had the business need of building a searchable Analytics dashboard with capabilities of slicing, filtering the data in various dimensions.

Mongo DB was the core serving datastore and the initial thoughts were to leverage the same for search. Beginning from version 2.4, Mongo has support for full-text indices, which provide an out-of-the-box capability for full-text search and regular expressions using the $regex operator.

This brings up the question: “Why would one switch from MongoDb to ElasticSearch (or any other search engine/library)?”

MongoDB is a popular choice in the schema-less data stores. While it seemed to be a plausible solution for search as well, in reality, the Full-Text Search(FTS) in MongoDB is plagued by low performance and high resource usage. This, combined with larger datasets renders MongoDB a less attractive search solution for most practical use cases as queries take dozens of seconds to execute. Another problem is the complete lack of customization or fuzziness for search results.

Elastic search is a platform-independent search engine, offers a rich feature set and meets most of our business requirements. It is highly customizable and has plugin support too.

Porting data to ElasticSearch

Consider a sample users collection in MongoDB comprising of the below documents.

{ "fullName": "smitha", "email": "smitha@gmail.com", "gender": "female","city":"bangalore", "login":0}{ "fullName": "john", "email": "john@gmail.com","gender": "Male","state" : "karnataka", "login":0}{"fullName": "ricky","email": "ricky@gmail.com","gender": "male", "login":0}

Before porting data, the Elasticsearch index needs to be set up.

Index Creation API

curl --location --request PUT 'http://localhost:9200/users' \
--header 'Content-Type: application/json'

The next step is to port data to ES. Mongo-connector is a handy, easy to use utility for sinking data from MongoDB to ES. One can also apply transformations to the data being migrated with very minimal code. Here is a sample code snippet for the transformation. While porting the document to ElasticSearch, we keep a track of the corresponding mongo DB for reference. For ease of lookups, the ID of documents in ES is the same as the Id of the document in MongoDB.

Sample code for Transformation

function transform(msg) {
msg.data[mongoId] = msg.data.id
delete msg.data.id;
return msg;
}

Caveats — Mongo Connector hits the roadblock when it comes to integrations with newer versions of ElasticSearch. It has integrations for ElasticSearch versions upto 6.8.1 but not with 7.3.2. Hence, we do have to write custom code for porting the data.

One can also add documents to an ES index through APIs.

API to add data to users index

curl --location --request POST 'http://localhost:9200/users/_doc/5e60a50854d12d716e81351e' \
--header 'Content-Type: application/json' \
--data-raw '{
"fullName" : "john",
"email" : "john@gmail.com",
"mongoId" : "5e60a50854d12d716e81351e",
"gender" : "Male",
"state" : "karnataka",
"login" : 0,
}'

If you observe the above API, the ES document id is the same as the MongoDB Id.

The next step is to identify a common mapping index for each of the documents. The ERP platform being domain agnostic, had myriad different attributes for each of the customer use cases. Continuing with the User entity in the platform, each user entity could have fields like ‘fullName’, ‘gender’, ‘mongoId’, ‘email’, ‘login’. Few of the users could have additional fields like ‘city’,’ state’. To accommodate attributes with varying structure, we had to come up with a well-crafted mapping that fits most of the use cases.

Static schema typically works well for structured data. There is also an option of dynamic mapping where ElasticSearch infers the schema structure based on the data flowing through. All the fields in the document are auto indexed.

A hybrid approach of having a Static schema with dynamic bindings works best for slightly unstructured data. Below is the curl request for creating a hybrid mapping for the users index for the given example users data in the previous part

curl --location --request PUT 'http://localhost:9200/users' \
--header 'Content-Type: application/json' \
--data-raw '{
"mappings": {
"dynamic": "true",
"dynamic_templates": [{
"anything": {
"match": "*",
"mapping": {
"index": true,
"type": "text"
}
}
}],
"properties": {
"fullname": {
"type": "text"
},
"email": {
"type": "text"
},
"gender": {
"type": "text"
},
"mongoId": {
"type": "text",
"index": false
},
"login": {
"type": "Integer",
"index": false
}
}
}
}'

With the above index mapping; any new field apart (‘city’,’ state’) outside the predefined category of fields are dynamically mapped to text fields and are indexed by default.

One has to identify fields that are essential to be searchable from a business standpoint. In the User’s case, fields like ‘mongoId’ and ‘login’ are not searchable.

Index Mapping Update

Periodic changes to ES mapping(search fuzziness, etc) are needed to incorporate feedback from business users. In the absence of an Index UPDATE option, a seamless update of mapping with zero data loss was a challenge. The below approach works well and updates are fairly fast. The time taken to swap the data depends upon the volume of data in the Index.

Swapping is one approach for updating an index mapping.

  1. Creating a temp index with the new mapping.
curl — location — request PUT 'http://localhost:9200/temp' \
— header 'Content-Type: application/json'

2. Use reindex option to copy documents from the current index to temp index.

curl — location — request POST 'http://localhost:9200/_reindex' \
— header 'Content-Type: application/json’ \
— data-raw '{
"source": {
"index": "users"
},
"dest": {
"index": "temp"
}
}'

3. Delete the old index

curl — location — request DELETE 'http://localhost:9200/users'

4. Create the index with the desired mapping.

5. Move the temp index back to the desired one.

curl — location — request POST 'http://localhost:9200/_reindex' \
— header 'Content-Type: application/json' \
— data-raw ‘{
"source": {
“index”: “temp”
},
“dest”: {
“index”: “users”
}
}’

6. Delete the temp index created.

curl — location — request DELETE ‘http://localhost:9200/temp'

In the next post, we will cover some aspects of FullTextSearch. Also check our post on ElasticSearch Aggregations.

--

--