Elasticsearch for Dummies: Part 1

Published in

CodeX

6 min readApr 25, 2021

One of my favorite technologies that I have gotten the pleasure to work with is called Elasticsearch (which I will refer to as Elastic from now on). Elastic is an open-sourced search engine that is quite flexible and powerful. I could give countless use-cases of where and why Elastic is used, but that would detract from the purpose of this article.

This series is dedicated to the dummies out there. It’s for those who’ve realized that they know nothing about Elastic and are willing to admit it. For those who embody the following Shakespeare quote:

-William Shakespeare. Credit: tymoff.com

These articles are a resource I wish existed when I was trying to figure all this out on my own. My hope is that this will help, inspire, and simplify your journey thought the vast world of software engineering.

Let’s get started, shall we?

Setting the Stage

Time to put on your imagination cap: You’ve just been handed a hard drive which contains the entire published works of William Shakespeare and it’s up to YOU to put it into Elastic.

We will get to indexing the entire works of Shakespeare, but let’s start small for now with a line from the play “As You Like It” that I quoted earlier. The data is in JSON format, which will make it easier to put into Elastic.

{
    "type": "line",
    "line_id": 18122,
    "play_name": "As you like it",
    "speech_number": 19,
    "line_number": "5.1.28",
    "speaker": "TOUCHSTONE",
    "text_entry": "The fool doth think he is wise, but the wise man knows himself to be a fool."
}

For those who are not familiar, JSON is simply a way of storing data using key-value pairs (which you Python people may recognize as a Python Dictionary!)

Looking at the data, we can see that this line belongs to a fellow by the name of Touchstone, and the line number is 5.1.28. Not super useful information right now, but perhaps we can do some cool stuff with it later.

Let’s form a game plan. We need to do the following things (and its ok if you don’t know what each step means):

Spin up a sandbox Elastic cluster.
Create an index that will store our data.
Insert our document into the index.

Spinning Up Your First Cluster

Firstly, download Elastic onto your local computer, choosing the appropriate download file for your system (you can find the downloads here).

To get Elastic up and running on Mac:

Download and unzip the file you’ve downloaded and cd into the unzipped directory.
Edit the elasticsearch.yml file in the config directory and disable the xpack-security feature flag. For learning about Elasticsearch, I don’t want to bog us down in the details about authentication and security. Only do this step for developing locally and for easy testing and never in a production use-case.

Disabling xpack authentication (for now)

To start elasticsearch, run bin/elasticsearch (or bin\elasticsearch.bat on Windows)
To verify that the cluster is up and running, let’s make an unauthenticated request tohttp://localhost:9200/ in a separate terminal session:

➜ curl http://localhost:9200
{
  "name" : "Tims-MacBook-Pro.local",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "JZcAF-F1S-y-CmBtQ-roOQ",
  "version" : {
    "number" : "8.4.3",
    "build_flavor" : "default",
    "build_type" : "tar",
    "build_hash" : "42f05b9372a9a4a470db3b52817899b99a76ee73",
    "build_date" : "2022-10-04T07:17:24.662462378Z",
    "build_snapshot" : false,
    "lucene_version" : "9.3.0",
    "minimum_wire_compatibility_version" : "7.17.0",
    "minimum_index_compatibility_version" : "7.0.0"
  },
  "tagline" : "You Know, for Search"
}

Now let’s connect our trusty VSCode IDE to our local cluster so we can start talking to it via some simple REST APIs.

To connect your local Elastic cluster to VSCode:

Download the Elasticsearch for VSCode extension.
Create a new file in VSCode called hello_world.es.
When you create an .es file with the extension enabled, it will prompt you to specify the connection (which you can leave localhost:9200):

To verify that we’ve connected to our sandbox cluster, type GET _cluster/health into your new file and press ctrl+enter to run the command. You should see something like this:

Congrats: you just created your first cluster!

Creating Your First Index

First, some vocabulary:

Simply put, an index is an extremely optimized place where we store documents. An index is usually composed of many documents.
Documents are a collection of one or more key-value pairs that we can search and filter through (think things that look like that Shakespeare JSON object we talked about earlier).
A cluster is a place where we store and manage indexes.

Relationship between Documents, Indexes, and Clusters

Before we can put documents into our cluster, we first have to create an index. To create your first index, run PUT hello_world in your hello_world.es file:

In the real world, setting up an index is usually more complicated than this, but we are trying to stay simple for now. I want you to focus on the concepts, not the nitty gritty (which can come later).

Now that we’ve created an index, we are ready to start indexing documents.

Inserting a Document into Your First Cluster

If you don’t like typing, copy and paste the put_document.es file and run the PUT hello_world/_doc/1 command:

Let’s break down the command we just made: PUT hello_world/_doc/1

PUT specifies the action type. We want to PUT a document.
hello_world is the index which we want to PUT the document into
_doc/1 is the syntax to specify the doc_id (which is the unique id associated with the document in the index)

But how do we know that the document made it into the index? Well, we got a success response from the cluster, but to be certain, let’s make our first Elastic query and manually find that document.

Making Your First Elastic Query

Elastic uses a specialized Query DSL (Domain Specific Language) based on JSON to define queries. Getting yourself used to the query syntax will help a lot down the road, so don’t just mindlessly copy and paste the snippets into your files! Typing a few out can go a long way to helping you understand their structure.

To make a query, we first have to decide what fields to query on. The quote is the most memorable part of the document we want to find, so let’s make a text match phrase query on the text_entry field.

The following query will search for docs whose text_entry field contains the phrase “the fool doth think” (plus some highlighting to show exactly what we found):

And because we asked for highlights, we can see the phrase we searched for in the response.json! Super cool.

Let’s switch it up: what do you think will happen if we search for the phrase “the brave fool doth think”? (Notice I added an extra word into the query)

If you guessed we would get no results, you would be right! That’s because the match_phrase query only looks for exact phrase matches.

Well that wraps up our first session. To cleanup, use the command ctrl+c in the terminal that’s running the cluster to to stop it.

I’m proud of you if you’ve made it this far! Hopefully, you understand at the conceptual level what documents, indexes, and clusters are. And you were able to successfully create your first index, index your first document, and then make your first Elastic query.

This is just the tip of the iceberg, and there is so much more to learn. In my next post, I’ll talk about Elasticsearch mappings and data types for dummies…