What is Elastic Stack and Where to Use it

Published in

Tecnología

6 min readFeb 6, 2020

Introduction

Elastic Stack provides a set of open-source tools for data ingestion, enrichment, storage, analysis, and visualization. The Elastic Stack consists of Elasticsearch (open-source, enterprise-grade search engine), Logstash (open-source, server-side data processing pipeline), Kibana(open-source data visualization dashboard for Elasticsearch) and Beats(platform for single-purpose data shippers). Depending on the usage and functional requirements, ELK stack can be deployed on any cloud provider or on-premises.

Elasticsearch

Elasticsearch is a distributed, open-source search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured. Elasticsearch is already a famous search engine(database model, #1 as DB Rankings).

Elastic search offers full-text search (linguistic search with relevancy based results using TF/IDF algorithm). It also has an auto-complete feature. We can store complex entities as JSON documents and index and search based on individual attributes. It’s also built for scaling — sharding, replicas and distributed cluster are supported. We can solve a variety of search use cases using different index types in Elasticsearch.

Terminology:

Let’s look at some of the most common terminologies related to Elasticsearch

Document — A document can be considered as a basic unit of information that can be indexed.
Index — An index is a collection of documents that have similar characteristics. A cluster can have many indices and there are many ways(types of an index)in which an index can be defined.
Node — A node is a single server that’s part of a cluster. You can consider this as a connected Elasticsearch instance
Cluster — A cluster is a collection of one or more nodes. All the data is stored across the cluster (across multiple nodes)
Shard — Each index in Elasticsearch is divided into multiple shards and each shard can have multiple copies. An index can potentially store a large amount of data that can exceed the hardware limits of a single node. So sharding allows us to horizontally scale the content volume and also distribute and parallelise the operations across shards.

How Elasticsearch works?

All the data in Elasticsearch is internally stored in Apache Lucene as an inverted index. Although data is stored in Apache Lucene, Elasticsearch is what makes it distributed and provides the easy-to-use APIs. An inverted index is similar to the card catalog used in the libraries.

Data is stored in JSON format in Elasticsearch and can be queried to fetch the needed documents. Every feature of Elasticsearch is also exposed as an API. For instance, Get and Search APIs can be used to get documents.

Just like we can use database-specific queries to fetch data (SQL, P/L SQL and so on), Elasticsearch has got its own query language which works on indices, types and document properties.

Query DSL — Elasticsearch’s query DSL consists of two types of clauses — Leaf query clauses (match, term, range) which look for a particular value in a field and compound query clauses that wrap other leaf or compound queries. Here are some sample queries

Regexp query — This query fetches books title and author name for all the cases where the regexp matches authorname

POST /booksdb_index/book/_search{"query": {"regexp" : {"authorname" : "a[a-z]*a"}},"_source": ["title", "authorname"],"highlight": {"fields" : {"authorname" : {}}}}

2. Range query — This query fetches books title, publish date and publisher name for all the cases where the publishdate is in the range of given dates

POST /booksdb_index/book/_search{"query": {"range" : {"publishdate": {"gte": "1986-01-01","lte": "1986-10-31"}}},"_source" : ["title","publishdate","publisher"]}

Below diagram shows how a query will flow in an Elasticsearch cluster

What are some of the use cases for Elasticsearch?

If your application has any complex search requirements, Elasticsearch can be used as the underlying engine/technology.

Other uses cases include logging and log analytics, infrastructure metrics and container monitoring, application performance monitoring, geospatial data analysis and visualization, security & business analytics, scrapping and combining public data.

Lyft uses Elasticsearch for operational log analytics. So does eBay. Soundcloud uses Elasticsearch for it’s flexible, easy-to-use, a real-time search engine that caters to over 30,000,000 users. Optum’s Cyber Defense organization utilizes the Elastic Stack within its Security Big Data Lake (SBDL) to search and pivot between cyber threats.

Connectors for Elasticsearch?

A data connector is a stand-alone software or a function that imports, exports or converts one data format to another. A number of connectors are available to push or pull data from Elasticsearch. These allow for leveraging Elasticsearch functions for some specific use cases even when the data is not pushed into Elasticsearch directly.

Here are some of the most commonly used connectors

UiPath Connector — Perform CRUD operation on an Elasticsearch instance
Mongo Connector — Push data from Mongo to Elasticsearch instance
Kafka Elasticsearch Connector — Copy data between Kafka and Elasticsearch

Let’s review other products in ELK stack now

Logstash

Logstash allows collecting data in realtime. It can dynamically unify data from disparate sources and normalize the data into any destination. Instead of using connectors, we can use logstash for almost any data ingestion use case.

Beats

ELK provides for multiple beats to capture a different type of data. AuditBeat for audit data, HeartBeat for availability monitoring, Metricbeat for Metrics and so on. Beats can send data directly to Elasticsearch or via Logstash(if you need to further process and enhance the data)

Kibana

Data in Elasticsearch can be visualized using Kibana. We can track query load times and see how the request flows. A number of graph types are available by default — histograms, line and pie charts and more. Kibana allows location analysis, advanced time-series analysis, data relationship analysis (using elastic-graph) and anomaly detection.

Kibana Lens leverages drag-and-drop experience to provides smart suggestions for visualizations based on field selection.

References & Further Reading

About the Author

Arpit is a seasoned technologist with vast experience in leading large cross-functional and cross-geography teams. Arpit also consults clients on competitive market analysis, defining MVPs, product ideation, product monetisation and go live strategies.

Arpit is also interested in early-stage investments in startups in design & fashion, finance, renewable energy, media, real estate & manufacturing domain.

Arpit believes we should all contribute back to society. He has set his goals for social work in five broad areas. You can read more about the same in his blog post “Do Good, Together” on Tumblr. Arpit is interested in working with people who want to contribute towards the same goals.

You can follow Arpit on Linkedin and Twitter

ABC. Always be clappin’.