Introduction to Elasticsearch

Phase 01 — Introduction to Elasticsearch — Blog 01

Arun Mohan
elasticsearch
6 min readDec 9, 2017

--

This is the first article to my blog series, Introduction to Elasticsearch. This series aims to guide you to get introduced to Elasticsearch, understand its capabilities, use cases in real life, and familiarisation with the rest of the components in the Elasticsearch stack. This blog will give you a brief idea of Elasticsearch, solutions it offers and the reasons to choose Elasticsearch service.

1. What is Elasticsearch?

This is the first and foremost question of a newbie!. Here is the answer in a few sentences:
Elasticsearch is a NoSql database and search engine build on top of Lucene. Elasticsearch provides a distributed, real-time, JSON based multi-tenant capable full text search solution.
Even though the above definition is over in just two sentences, there were a lot of terms which you would not have heard before. Let us split them and explore them individually

1.a Lucene

Simply put, Lucene is a library written in Java. So what does it do and what are its capabilities are the next obvious questions!.
Lucene is a search library. This means, there are functions and methods written in Java which are optimised for different search strategies. Lucene is the most popular search library ever created. Most of the open source/ commercial search implementations have the backbone of them as Lucene.
So a series of questions arise after reading the above definition of Lucene, like ,if Elasticsearch is utilizing Lucene for the search part, why can’t we use bare Lucene for our purposes?. Why go for Elasticsearch? Or, what is the difference between Elasticsearch and Lucene then?
The answers to this questions are, Lucene is an extremely well written library, which also makes it very hard to be dealt with when coming to customisation according to the end customer needs. So what Elasticsearch did was to build on top of Lucene an API layer which will make the using of Lucene methods and functions a very simple affair.

1.b Distributed System

Apart from the incredible difficulty in configuring bare Lucene in our applications, what makes Elasticsearch preferable over Lucene is the former’s distributed nature. Distributed, essentially means that Elasticsearch can run on different systems/nodes at the same time and try to solve a single problem harnessing the resources of the systems in the network. Lucene does not support this and is a major roadblock for many implementations.

1.c Near real-time

The documents inserted to the Elasticsearch are made available for search almost instantaneously. This capability comes ready out of the box and no external/additional configurations has to be made.

1.d JSON based

Elasticsearch uses JSON based communication. This means that it uses JSON format for the APIs and other communications. This provides a great flexibility in usage and interoperability as nowadays most of the web applications and services communicate in JSON

1.e Multi-tenant capable

Multi-tenancy refers to the architecturing of an application in which an instance of the application on a server/cloud can be accessed by multiple tenants (user groups) with varying levels of accessibility options.

2. Elasticsearch- Use cases

2.a Search

The primary use case and the aim which Elasticsearch was built is to make the “search” faster and better. So searching is the number one use case of Elasticsearch. It provides a lot of search strategies like the case dependent/independent search, partial matches, auto-suggestion searches right out of the box. Also heavy customisation of the search according to the user dependent strategies like selective weighting, highlighting etc are very easy to build and implement in Elasticsearch. This factors make it the most common choices when it comes to the search operation.

2.b Log Collection/Parsing and Analytics

Elasticsearch with the other members of the stack, like Logstash, and the Beats platform makes the data collection from a wide range of sources, a very easy and smooth process. Data forwarding from various sources are made easy with Logstash and the Beats, and due to their native integration with Elasticsearch, it is very easy to setup and start collecting the data in Elasticsearch.
The problem Elasticsearch solves here is the need for different handlers of data from different sources. That is, if you are to collect the logs from different sources and need to standardise the logs, the data forwarding and the data parsing parse of this process can be easily handled with Elasticsearch’s Logstash application. Thus a lot of intermediate steps, and there by the time and effort on making a standard format can be solved with this approach.
The parsed and saved data can be easily visualised by using the Elasticsearch’s visualisation tool Kibana. Many types of analytics are built in features in Elasticsearch, like different kinds of aggregations and many statistical computations, which can be applied to the logs and then make interactive visuals using Kibana to gain useful insights on the log data.

2.c Content connectors

Like with the logs as mentioned in the previous section, the next biggest use case of Elasticsearch lies within the data collection from a multitude of sources
like Twitter,Sharepoint,JIVE etc. There are strong community connector plugins to extract data, with the required customisation from various sources and river it to Elasticsearch. This in-turn makes not only for a powerful data collection for specific purposes, but also makes it searchable. For eg: data from a specific hashtag can be streamed to Elasticsearch and then if we have are able to provide a lightning fast searching on this data, imagine the ease of streamlining the content which the users want. A similar implementation is being used by the Guardian news house, where in the latest comments for their news articles are streamed to Elasticsearch. This data is then analysed and made searchable, so that they can quickly find the trends of the articles as quickly as possible.

2.d Instant Visualisation

Quick data visualisation facility to create insightful dashboards within minutes of data indexing in Elasticsearch is also one of the main use cases of Elasticsearch stack. The visualisation tool that Elasticsearch provides is Kibana, which in-turn can load the data from Elasticsearch and can apply a good number of analytics on them and then render them in a wide range of graphs which can be arranged in any order to create reports/dashboards. The application process monitoring area finds huge use cases with the Kibana-Elasticsearch combination as the anomalies or threats can be detected and countered in near real time.

3. Why Elasticsearch?

Finally in to the million dollar question, why Elasticsearch should be prefferred?. Let us see the most important factors which answer this question perfectly:

3.a Scalability

One of the major advantages in using Elasticsearch is its scalability. In most use cases, for a decent search time, you just need to index the data in to Elasticsearch. Yes, that is right, no hassles or painpoints to be encountered in handling the distributed nature of Elasticsearch. Elasticsearch handles the scaling by itself. For example, if a new node is added to a cluster, we need not set the routing to it or make huge and critical settings changes to make it discoverable and functioning, the master node of Elasticsearch handles this with very less or no intervention from us.

3.b Schema less

By design, Elasticsearch is made to be a schema less application. This means we dont need to provide a schema in prior for putting documents in Elasticsearch. This is indeed a huge relief when it comes to dealing with multiple data sources. In similar NoSQL databases like MongoDb, we need to specify the schema in advance. Here in Elasticsearch we can be sigh on this part and simply start indexing the data. If there is no schema, Elasticsearch automatically assigns a schema for the document fields.

3.c Customisation

Another resounding answer to the question, why Elasticsearch?, is the customisation options it provides in the solutions it offer. For example, as mentioned in one of the previous sections, the customisation of the search options it offers to the developers can make almost all use cases of search inclusive. Also the data communication part with Elasticsearch can also be done in a wide variety of ways, ranging from default addons,plugins or user developed solution, which can be finely and gracefully integrated with it.

3.d Community

Last but not the least, the amazing community lead by Shay Banon and other equally talented developers makes it one of the robust opensource community. There are a lot of plugins, addons and libraries created by the community efforts ranging from simple analyzer plugins to data river implementations. Also the prompt responsive forums and active online presence will save a lot of developement time.

Conclusion

In this article, I have introduced Elasticsearch, the problems and issues it is attempting to solve and the compelling reason for having Elasticsearch. In the next article to the series, I will briefly introduce you to the Elasticsearch stack and what each component does.

--

--