Elasticsearch Tutorials Part 2: Installation, Setup, and Creating Index

Abhishek Bairagi
6 min readDec 29, 2023

--

In the previous part of our Elasticsearch series, we explored the fundamentals and use cases of this powerful search and analytics engine. Now, it’s time to delve into the practical aspects of Elasticsearch, starting with the installation and setup process. So Lesssgooo!

Installing Elasticsearch:

Official Documentation:

Elasticsearch offers comprehensive documentation that guides you through the installation process. Follow the links below based on your operating system:

Verifying the Installation:

Once you’ve successfully installed and started Elasticsearch, verifying its status is crucial. You can do this by checking localhost:9200 in your web browser.

http://localhost:9200/ , Here , localhost is the host where you are running your elasticsearch, if you are running in your local PC it will be localhost else if you are running ES on a virtual machine replace localhost with url of that virtual machine. Further 9200 is the port on which ES is running. Depending on your dockerfile and installation it can be 9201 too.

If everything is set up correctly, you should see a JSON response similar to the following:

{
"name": "af54bc19sdf7ecb",
"cluster_name": "elasticsearch",
"cluster_uuid": "LC7PptTsdhSJiSJJVVkMHAdsd4g",
"version": {
"number": "7.9.3",
"build_flavor": "default",
"build_type": "docker",
"build_hash": "c413sdfs8e51121ef06a6404866cddc601906fe5c868",
"build_date": "2020-10-16T10:36:16.141335Z",
"build_snapshot": false,
"lucene_version": "8.6.2",
"minimum_wire_compatibility_version": "6.8.0",
"minimum_index_compatibility_version": "6.0.0-beta1"
},
"tagline": "You Know, for Search"
}

This JSON response confirms that Elasticsearch is up and running, providing essential details such as version, build information, and cluster status.

So, how it’s gonna go?

Now that we’ve successfully set up Elasticsearch, it’s time to get our hands dirty and explore ES further. The overall Elasticsearch process follows these key steps:

Note: Depending on the use-case data ingestion can be a one-time or continuous process.

The Book Library Analogy:

Now, I understand that terms like “index,” “query,” and “documents” in the context of Elasticsearch might be unfamiliar to some of you. But don’t worry — I’ve got your back.

Imagine Elasticsearch as a vast digital book library, where the shelves are neatly organized, and each book is precisely cataloged for quick retrieval. Now let’s explore three essential concepts: Index, Query, and Document.

1. Index — Think of it as a Bookshelf: In our library, an index is comparable to a bookshelf. Each bookshelf contains a specific genre or category of books. For instance, there’s a bookshelf dedicated to fiction, another for non-fiction, and so on. Similarly, in Elasticsearch, an index acts as a virtual bookshelf, containing data.

2. Document — Each Book on the Shelf: As you navigate through the library, you’ll notice that each book on a shelf is an individual entity with its own unique content. In Elasticsearch, these individual entities are referred to as documents. Each document holds a piece of information, much like each book contains a unique story.

3. Query — Your Search Request: Now, let’s say you walk into the library and want to find a book. You’d approach the librarian and provide details about the book you’re looking for — this request is your query. In Elasticsearch, a query serves the same purpose; it’s your way of asking for specific information from the indexed data.

The Elasticsearch Index:

Now, let’s delve into Elasticsearch’s core: the Index. Once your Elasticsearch journey begins, the next step is creating an index. To do that, you’ll need two essentials:

  1. Mappings, defining your data’s structure, and
  2. Settings, configuring your index’s behavior.

Index Settings

A setting field contains details about shards, replicas and analyzers and much more:

let me give a short explanation of each of them :

  1. Number of Shards (number_of_shards):
  • Think of your library as a collection of books and each bookshelf as a shard. If you have more bookshelves (higher number_of_shards), it's like having many shelves to distribute and organize your books. Each shelf (shard) can be managed independently, making it efficient to find and retrieve books.
  • Example: If you set number_of_shards to 3, it's like having three bookshelves. Each bookshelf (shard) contains a portion of your book collection, allowing Elasticsearch to manage them separately for faster access.

2. Number of Replicas (number_of_replicas):

  • Now, imagine having a twin library that mirrors your original library. If a book is not available in one library, you can always find it in the other. Replicas work similarly by creating duplicate copies of your data.
  • Example: Setting number_of_replicas to 1 means you have a twin library with the same books. If one library (shard) is occupied or undergoing maintenance, the replica in the twin library steps in, ensuring you always have access to your entire book collection.

3. Analyzer (analysis):

  • Defines how the text in fields is processed during indexing and searching.
  • Determines the process of tokenization, stemming, and other text transformations.

4. Tokenizer:

  • Breaks down text into individual terms during indexing.
  • Examples include the standard tokenizer, whitespace tokenizer, and more.
  • Example:
  • Text: “This is a fiction book.”
  • Tokenizer: Whitespace Tokenizer.
  • Resulting Terms: [“This”, “is”, “a”, “fiction”, “book.”]

5. Filter:

  • Modifies the terms produced by the tokenizer.
  • Common filters include lowercase (for case-insensitivity), stop (for removing common words), and stemming (for word normalization).

A sample setting would look something like this —


"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"analysis": {
"analyzer": "standard"
}
}

Mapping

Let’ s understand mapping now. Index mapping defines the structure of your data, specifying the fields and their data types and if any analyzer we want to be used for that specific field.

Let’s say we want to create an elasticsearch index named “library” which contains information about books. The information includes title, author, description, publised_data and purchase_url of the book. So mapping for it would look something like this:


"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "standard"
},
"author": {
"type": "text",
"analyzer": "standard"
},
"description": {
"type": "text",
"analyzer": "standard"
},
"published_date": {
"type": "date"
},
"purchase_url": {
"type": "keyword"
}
}
}

🤔 you might have a question why keyword for purchase_url?

We use “keyword” for the purchase_url because it's like a special string (a URL) that doesn't need breaking into parts. Elasticsearch keeps field with keyword types in one piece only, just as it is, in the index.

Once we have the settings and mappings ready, to create an index named ‘library’ you have to send a put request to the url http://localhost:9200/library with the body:

{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"analysis": {
"analyzer": "standard"
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "standard"
},
"author": {
"type": "text",
"analyzer": "standard"
},
"description": {
"type": "text",
"analyzer": "standard"
},
"published_date": {
"type": "date"
},
"url": {
"type": "keyword"
}
}
}
}

Once successful you will get an output like this.

{
"acknowledged": true,
"shards_acknowledged": true,
"index": "library"
}

But wait, sending a request isn’t the only way! You can also achieve this using Elasticsearch clients in various languages like Python or JavaScript. Let me guide you through the Python approach. Start by installing the Elasticsearch library (more details):

python -m pip install elasticsearch

Further you can connect to elasticsearch using this python code

from elasticsearch import Elasticsearch
es = Elasticsearch('https://localhost:9200')

To create an index you can use this code:

from elasticsearch import Elasticsearch

# Connect to Elasticsearch
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])

# Index settings and mappings
index_definition = {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 1,
"analysis": {
"analyzer": "standard"
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "standard"
},
"author": {
"type": "text",
"analyzer": "standard"
},
"description": {
"type": "text",
"analyzer": "standard"
},
"published_date": {
"type": "date"
},
"published_url": {
"type": "keyword"
}
}
}
}

# Create the 'library' index
index_name = 'library'
es.indices.create(index=index_name, body=index_definition)

# Check if the index is created successfully
if es.indices.exists(index=index_name):
print(f"Index '{index_name}' created successfully.")
else:
print(f"Failed to create index '{index_name}'.")

Yayy, we have our library ready. But its empty :(. Not to worry we will fix this in the next part where we will be ingesting some documents (books) into it. Till then stay safe.

Next Chapter is here: Elasticsearch Tutorial Part 3

--

--