PART I | Quick Overview of Elasticsearch for Node Developers

Published in

eDonec

5 min readMay 21, 2021

First and foremost, Elasticsearch is excellent at doing what traditional databases can’t do: full text search. Both relational and non-relational databases are very slow when it comes to this search technique

Elasticsearch is a distributed, open-source search and analytics engine built on Apache Lucene and developed in Java. It started as a scalable version of the Lucene open-source search framework then added the ability to horizontally scale Lucene indices. Elasticsearch allows you to store, search, and analyze huge volumes of data quickly and in near real-time and give back answers in milliseconds. It’s able to achieve fast search responses because instead of searching the text directly, it searches an index. It uses a structure based on documents instead of tables and schemas and comes with extensive REST APIs for storing and searching the data. At its core, you can think of Elasticsearch as a server that can process JSON requests and give you back JSON data.

Every feature of Elasticsearch is exposed as a REST API:

Index API — Used to document the Index
Get API — Used to retrieve the document
Search API — Used to submit your query and get the result
Put Mapping API — Used to override default choices and define our own mapping

Let’s take a look at some of Elasticsearch’s main components and principles to see how it functions on the inside.

Document

Documents are the basic unit of information that a node can index. Data in a document is stored in the JSON format, In the world of relational databases, documents can be compared to a row in table.

If you’re creating an e-commerce website, for example, you can keep the specifics of each product in a separate document. Data can be of different types, from numbers to texts to dates.

{
   "_index" : "products",
   "_type" : "_doc",
   "_id" : "2",
   "_version" : 7,
   "result" : "updated",
   "_shards" : {
      "total" : 2,
      "successful" : 1,
      "failed" : 0
   },
   "_seq_no" : 3,
   "_primary_term" : 1
}

Index

Documents are JSON objects that are stored within an Elasticsearch index and are considered the base unit of storage. In the world of relational databases, documents can be compared to a row in table.

For example, let’s assume that you are running an e-commerce application. You could have one document per product or one document per order. There is no limit to how many documents you can store in a particular index.

Shards

Elasticsearch provides the ability to subdivide the index into multiple pieces called shards. Each shard is in itself a fully-functional and independent “index” that can be hosted on any node within a cluster. By distributing the documents in an index across multiple shards, and distributing those shards across multiple nodes, Elasticsearch can ensure redundancy, which both protects against hardware failures and increases query capacity as nodes are added to a cluster.

Replicas

Replica shard is a copy of a primary shard. Each document in an index belongs to one primary shard. Replicas provide redundant copies of your data to protect against hardware failure and increase capacity to serve read requests like searching or retrieving a document.

Inverted Index

For example, in the image below, the term “the” occurs in document 2, so it is mapped to that document. This serves as a quick look-up of where to find search terms in a given document. By using distributed inverted indices, Elasticsearch quickly finds the best matches for full-text searches from even very large data sets.

Uses a special data structure called “Inverted index” for very fast full-text searches. An inverted index consists of a list of all the unique words that appear in any document, and for each word, a list of the documents in which it appears. Inverted index is created from document created in elasticsearch. Inverted index is created using process called analysis (tokenisation and Filterization).

Node

A single server that holds some data and participates on the cluster’s indexing and querying is called node. A node can be configured to join a specific cluster by the particular cluster name.

A single cluster can have as many nodes as we want. A node is simply one Elasticsearch instance. Consider this a running instance of MySQL. There is one MySQL instance running per machine on a different port. While in Elasticsearch generally, one Elasticsearch instance runs per machine. Elasticsearch uses distributed computing so having separate machines would help as there would be more hardware resources.

Mapping

Whenever we load a document into Elasticsearch, a process called dynamic mapping steps in. Dynamic mapping is Elasticsearch’s method of determining the type of data that we load into it.

It can recognize dates by format, numbers, and strings. If dynamic mapping is not enough for our needs, explicit mapping can be defined as well.

What Is Elasticsearch Used For?

Application search — — For applications that rely heavily on a search platform for the access, retrieval, and reporting of data.

Website search — — Websites which store a lot of content find Elasticsearch a very useful tool for effective and accurate searches. It’s no surprise that Elasticsearch is steadily gaining ground in the site search domain sphere.

Geo-Search : Elasticsearch can be used to geo-localized any product. For example, the search query: ‘all the restaurants that serve pizza within 30 minutes’ can use Elasticsearch to display information of the relevant pizzerias instantly.

Autocompletion and Instant Search : Elasticsearch database helps in autocompleting the search query by completing a search box on partially-typed words, based on the previous searches.

Metrics & Analytics & logging: Elasticsearch analyzes a ton of dashboards consisting of several emails, logs, databases, and syslogs, to help businesses make sense of their data and provide actionable insights.

Conclusion

Elasticsearch is at its core a search engine ,with a quick and scalable underlying architecture and components. It sits at the heart of an ecosystem of complementary tools that can be used together for a variety of uses cases, including search, analytics, and data processing and storage.

This has been developed by myself at eDonec .