Elastic Search

Cephalo
6 min readNov 30, 2022

--

Elasticsearch is a distributed, open-source search and analytics engine for textual, numerical, geographical, structured, and unstructured data. Elasticsearch was initially published in 2010 by Elasticsearch N.V., and it is based on Apache Lucene (now known as Elastic). Elasticsearch is the heart of the Elastic Stack, a collection of free and open tools for data intake, enrichment, storage, analysis, and visualization. It is known for its easy REST APIs, distributed nature, speed, and scalability. The Elastic Stack currently contains a comprehensive collection of lightweight shipping agents known as Beats for sending data to Elasticsearch, which is referred to as the ELK Stack (after Elasticsearch, Logstash, and Kibana). ( Elasticsearch B.V., 2022)

Elastic search is a free and open-source engine based on Apache Lucene, a full-text search engine library. Lucene, both open source and proprietary, is perhaps the most powerful, high-performance, and fully complete search engine library available today. Lucene, on the other hand, is only a library. To make use of its capabilities, you must work in Java and incorporate Lucene directly into your program. Worse, understanding how it works will very certainly need a degree in information retrieval. Lucene is a complicated database.

Elastic search, on the other hand, is considerably more than Lucene and “simply full-text search.” A distributed search engine with real-time analytics capable of growing to hundreds of servers and petabytes of structured and unstructured data might be defined as follows.

All of this functionality is packaged as a standalone server that your application can communicate with via a simple RESTful API, a web client written in your programming language, or even the command line. ( O’Reilly Media, Inc., 2015)

Many businesses rely on elastic search to provide strong search capabilities in their apps that are simple to set up, scalable, and cloud-ready. Elasticsearch is better thought of as a Lucene interface built from the ground up for BIGDATA. Because Lucene is ultimately the library that is used for indexing and querying data, the extensive feature set that Lucene provides for searching data is immediately available through Elasticsearch. This also implies that Elasticsearch plugins that work with Lucene will function out of the box with Elasticsearch.

Elasticsearch’s capabilities built on Lucene are intended to make it the ideal tool for full-text search on large datasets. (Goyal & Divya, 2013).

Lucene’s background:

Lucene is an open-source Java library for text search developed by Apache. The Lucene project has been evolving for more than a decade and has now established itself as the standard reference for how to create a strong yet simple integrated open-source search engine. To allow an application to leverage the functionalities of Lucene as a search library, it must be wrapped in an interface. Many of these interfaces have been created for various platforms and use cases, such as SOLR. An interface like SOLR, on the other hand, is built for a world where a single server can handle the whole effort of indexing and querying data. When the data volume exceeds a certain threshold, SOLR (and comparable Lucene interfaces) become difficult to use: the same sharding, replication, and query dispatching issues that plague RDBMS systems resurface in this environment. as well as the dispersion in the vicinity of SOLR. (Goyal & Divya, 2013)

Background:

A. Basic features

1) REST API:

ElasticSearch uses a REST API to store and retrieve items. Version checks (optionally on PUT), id generation (optionally on POST), and the ability to read your writing are all supplied via convenient PUT, POST, GET and DELETE APIs (on getting). It’s because of this that it’s classified as a key-value store.

2) ElasticSearch’s Key-Value Store:

Every item of data in ElasticSearch has a unique index and type. An index can be compared to a collection of documents or a database table. The documents uploaded to an index, on the other hand, have no predetermined structure or field types. Objects are assigned a type and placed in an index. As a result, the relative URL to any object from a REST perspective is /index/type/id. A REST API is used to construct indexes and types at runtime.

3) Multi-tenancy:

Indexes may be created, updated, retrieved, and deleted. You may set up sharding and replication for each index separately. As a result, ElasticSearch is multi-tenant and very adaptable.

4) Mapping:

ElasticSearch indexes your documents using a dynamic mapping or a mapping that you specify (recommended). That implies you can use the search API to get your documents.

5) Replication & Sharding:

The replicas (copies of index portions) are used to improve availability and performance.

B. Elasticsearch cluster

By default, ElasticSearch is clustered. That is, if two nodes in the same network are started, they will connect and form a cluster. This does not need any extra setup.

ElasticSearch may be used as a stand-alone search engine. It may, however, be operated on a huge number of collaborating servers to handle massive amounts of data and provide fault tolerance. These servers are referred to as a CLUSTER collectively, and each of them is referred to as a node individually.

Index sharding (dividing data into smaller individual portions) may distribute large quantities of data across several nodes, and ElasticSearch replicates across whatever nodes are available in the network. Typically, you configure it in a variety of ways to run in various contexts.

ElasticSearch is designed for large-scale installations on Amazon Web Services (AWS), Heroku, or your own data center. That means it has built-in monitoring capabilities, a pluggable architecture for adapting to various contexts, and a slew of additional characteristics you’ll need to run in one. This is in stark contrast to SOLR, which lacks all of these capabilities out of the box.

C. Basic Elasticsearch Workflow:

In a nutshell, here’s the actual workflow of elastic search: Documents can be uploaded or saved, and they can be of any form, size, or quantity. The JSON Builder then transforms these documents from their original format to JSON. The Tokenizer’s job now is to split down the data into individual words. These words are indexed, and mapping is done to group words of the same type into one mapping type. This guarantees that text is retrieved more quickly in response to the user’s inquiry. The query will be parsed by the parser, which will then search and get the searched text from the indexed documents. (Goyal & Divya, 2013)

D. Application support:

Elasticsearch is more than a search engine; it’s also a database on which you can construct a whole frontend application. Multiple indices (databases) and mappings (tables) per index are supported by Elasticsearch. This functionality, when paired with Elasticsearch’s sophisticated document structure, allows you to create complicated data models that support applications. Elasticsearch also supports the more “conventional” actions that constitute an application database, such as listing records, creating records, updating records, and deleting records, in addition to rich search queries over the data. These capabilities provide everything you’ll need to create a standard database-driven, read/write the application on top of the same database, complete with full-text search and complicated queries, and built-in horizontal scalability.

Data is always flowing into your system. The challenge is, how rapidly can that data be transformed into knowledge? Real-time is the only time that matters with Elasticsearch. Elasticsearch allows you to start small and scale up as your company grows. It’s designed to be horizontally scaled right out of the box. Simply add more nodes as needed, and the cluster will reconfigure itself to make use of the additional hardware. Under the hood, Elasticsearch leverages Lucene to give the most advanced full-text search capabilities of any open-source product. Multi-language support, a rich query language, geolocation support, context-aware did-you-mean recommendations, auto-complete, and search snippets are all included in the search. Elasticsearch can store complex real-world items as structured JSON documents. By default, all fields are indexed, and all indices may be combined in a single query to get lightning-fast results. It is constantly improving, and new text analytics versions will be released shortly. It will eventually usher in a new age in the fields of full-text search and text analytics. (Goyal & Divya, 2013).

--

--

Cephalo

I prefer talking about and with data. I love to write anything that falls between technology and human behavior.