Elasticsearch is an open-source, RESTful, distributed search and analytics engine built on Apache Lucene. Since its release in 2010, Elasticsearch has quickly become the most popular search engine and is commonly used for log analytics, full-text search, security intelligence, business analytics, and operational intelligence use cases. A few months back, I got a chance to try Elasticsearch when we were trying to implement search functionality in our platform, these are the insights I got while working on it.
1. Difference between Elasticsearch and Other Relational Databases
Elasticsearch is a powerful Open Source, Distributed, RESTful Search Engine, which implies, it is not intended to be and should never be used as a primary database. Given a search query, It is expected to return results in less than 1 seconds (roughly). To work it that way, it is designed and functions very differently from primary relational databases like PostgreSQL, MySQL, etc. That is why the thought process while designing the Elasticsearch structure should be different than the primary relational database structure.
Few key differences in thought processes
- Relational Databases like PostgreSQL, MySQL favors ‘Normalization’ while Elasticsearch favors ‘Denormalization’.
- This is a bit correlated to the above point. Relational Databases offers Inner or Outer joins to keep data consistent and to keep Tables less cluttered. Elastic search has a way of providing ‘joins’ features using Nested Attributes and Parent/Child Attributes but it might make search query slow which defeats the purpose of using Elasticsearch in the first place.
- Elasticsearch supports One-to-One and One-to-Many relationships but it does not support Many-to-Many.
2. Add only ‘Searchable’ Fields to the Elasticsearch
If the primary database has 10 fields and from which, search functionality is needed on only 5 fields, then add those 5 fields only to the Elasticsearch. It will help in keeping the index size at optimum.
3. Many-to-Many Relationships in Elasticsearch
Elasticsearch does not support Many-to-Many relationships. There is an easy way to implement it but can be cumbersome if not designed correctly. To handle Many-to-Many, we need to convert it to One-to-Many form which is supported by Elasticsearch. The solution to that is Duplication! Duplication! Duplication!
This can be transformed into one-to-many as below:
Users are duplicated to transform the problem into one-to-many. It implies if there is an update for ‘User 1’, then all the duplicated rows related to ‘user-book #’ need to be updated too with the latest info.
Few tricks to minimize the updates:
- Duplicate resource which has less probability of getting updated. For example, in most of the platforms, ‘User’ info can have less chance of getting updated on a regular basis than other dependant fields (in this case, books)
- This is where #3 from above is very important too. If we have a restricted scope of fields for search functionality, then the number of updates in Elasticsearch can be reduced significantly.
4. Keep Elasticsearch as the Last Resort
First of all, you should first check the scale of data to verify Elasticsearch is actually needed or not before moving to it. Setting up the Elasticsearch structure is expensive and is an ‘additional overhead’ to maintain. If the scale is not on a considerably large scale and can be implemented by other simpler but fast implementations then that can be considered first. For example, relational databases like Postgresql provide full-text search functionality for search which is pretty fast and you can get advantages of relational database features like ‘joins’. Elasticsearch has a way to provide ‘joins’ functionality but as it should be avoided as it is a costly operation that might affect search time. Check out this awesome blog for more details regarding Postgres Full-Text Search.
Thanks for reading. I hope this helps. Don’t hesitate to correct any mistakes in the comments or provide suggestions for future posts!