Elasticsearch: Hands-on with searching(1)

Ying Ray Lu
DeepQ Research Engineering Blog
7 min readFeb 26, 2018

Learning how to search by Elasticsearch always is a big lesson. Therefore, I will share and distinguish between several interesting topics. In this topic, I’d like to share some experiences for searching using Elasticsearch.

Installation

Before we understand Elasticsearch, it is very troublesome when we set up the environment. Thence, I propose a simple solution using docker-compose to help us set up the environment conveniently and quickly. There is the docker-compose config for setting up the Elasticsearch-Kibana below:

The configuration of setting up the EK.

And then start it up:

docker-compose up -d && docker-compose logs -f

Test it out by opening another terminal window and running the following:

curl 'http://localhost:9200'

Communication

How you talk to Elasticsearch depends on whether you are using Java. Elasticsearch provides 2 ways: Java API, RESTful API. Java API comes with two built-in clients that talk to the cluster over port 9300, using the native Elasticsearch transport protocol, but it allows only Java to access. Therefore, on this topic, we only talk about RESTful API which can be communicated to all others languages over port 9200.

RESTful API

A request to Elasticsearch consists of the same parts as any HTTP request:

curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>'\
-d '<BODY>'

We will use HTTP expression to make it concise and readable.

<VERB> /<PATH>?<QUERY_STRING>
<BODY>

Kibana comes with development tools named “Console(Sense)” to make us use HTTP expression easily.

Build a Catalog

Before we start up the tutorial, we need to build a catalog for playing the Elasticsearch. First, we create a document with a single command:

PUT /megacorp/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}

Notice that the request path /megacorp/employee/1 contains 3 pieces of information:

  • megacorp is the Index name
  • employee is the Type name
  • 1 is the ID of this particular employee

The request body contains all the information about this employee.

After we dispatch this request, we will get the response like this:

Let’s add a few more employees to the directory before moving on:

PUT /megacorp/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}

PUT /megacorp/employee/3
{
"first_name" : "Douglas",
"last_name" : "Fir",
"age" : 35,
"about": "I like to build cabinets",
"interests": [ "forestry" ]
}

Search the document

Let’s talk about how to search and read the document. If you familiar with RESTful API, you’ll subconsciously use the GET method request like this:

GET /megacorp/employee/1

That’s right, after you request, you will get the John Smith’s information from the _source field of the response.

And then, let’s try using _search to put into the id part of the request path. Then we get all the employees of the megacorp index.

GET /megacorp/employee/_search

It also support query-string searching to let you put the parameters behind the request path:

GET /megacorp/employee/_search?q=last_name:Smith

Query-string searching is the fairly easy method to search, but it has its limitations. Elasticsearch provides a rich, flexible, query language called the Query-DSL, which allows us to build much more complicated, robust queries. It will make the search like this:

GET /megacorp/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}

Let’s see the request body, we will find the field named match which is one of several types of queries.

Finally, we try more complicated searching to query Smith who is older than 30:

GET /megacorp/employee/_search
{
"query" : {
"bool" : {
"must" : {
"match" : {
"last_name" : "smith"
}
},
"filter" : {
"range" : {
"age" : { "gt" : 30 }
}
}
}
}
}

Furthermore, it’s not the common thing that put the request body in GET method request. It also works with POST method request. In order to keep with the tutorials intuition and reference to official documents, I will keep using GET method when I do the Elasticsearch searching.

Full-Text Queries

Elasticsearch is built-on Lucene, so that makes it easy for us to achieve full-text search. Let’s use the same match query as before to search the about field for “rock climbing”.

GET /megacorp/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}

Then, we will get back 2 matching documents which have the string of “rock” or “climbing” in the about field.

If you want to query that contain both “rock” and “climbing”, we should add the operator and to the query body:

GET /megacorp/employee/_search
{
"query": {
"match": {
"about": {
"query": "climbing rock",
"operator": "and"
}
}
}
}

If you want to query the text exactly matching “rock climbing”, we should use a slight variation of the match query called the match_phrase query:

GET /megacorp/employee/_search 
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "climbing rock"
}
}
}

Then, we will get back 1 employee who really loves to go rock climbing not collect rock albums.

Geo Queries

Elasticsearch has a convenience ability which supports searching data in geography. In order to make this example more specific, let’s go to the Google Map to get the actual data. Refer to the picture below, we can find the longitude and latitude parameters of 7–11 in the real world.

Before we start to make Geo-Queries, we should define the type of geo_point or geo_shape. In the example, I choice geo_point for demonstration.

PUT /my_locations
{
"mappings": {
"_doc": {
"properties": {
"pin": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}

Then, let’s create the documents. And check the documents by _search.

PUT /my_locations/_doc/1
{
"pin": {
"name": "7-Eleven",
"location": {
"lat": 24.9840705,
"lon": 121.5399874
}
}
}
PUT /my_locations/_doc/2
{
"pin": {
"name": "Mos burger",
"location": {
"lat": 24.9827922,
"lon": 121.5399175
}
}
}
GET /my_locations/_search

I provide a pin location which is the location of our company. And then, let’s use the Geo-Queries to find the stores which are near to our company with only 20 meters.

GET /my_locations/_search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_distance": {
"distance": "20m",
"pin.location": {
"lat": 24.9841894,
"lon": 121.539891
}
}
}
}
}
}

Aggregations

Elasticsearch has a functionality to let you query and analyze the data. For example, let’s find the user who named “Smith”, and then calculate the average of age. In this case, we should use “Aggregations”.

GET /megacorp/employee/_search
{
"size": 0,
"query": {
"match": {
"last_name": "Smith"
}
},
"aggs": {
"average_of_age": {
"avg": {
"field": "age"
}
}
}
}

Aggregations provide the ability which is similar to GROUP BY in SQL, group and extract statistics from your data. For example, let’s find the most popular interests enjoyed by our employees:

GET /megacorp/employee/_search
{
"size": 0,
"aggs": {
"all_interests": {
"terms": {
"field": "interests.keyword"
}
}
}
}

In SQL, the above aggregation is similar in concept to:

# database = megacorp
SELECT interests, COUNT(*)
FROM employee
GROUP BY interests
ORDER BY COUNT(*) DESC

Conclusion

In this topic, we understood the Elasticsearch from going hands-on search part. However, these are very basic functions. When we face some complicated queries, we still need to figure out the solutions from the documentation of official website. There is the reference from official website below which is very easy to read.

Also, You can find the full source code of my Github repo. Thanks for reading, and hopefully this tutorial is helpful for you! :D

--

--

Ying Ray Lu
DeepQ Research Engineering Blog

Life can only be understood backwards, but it must be lived forwards.