Elasticsearch — Intro with examples

bschandramohan
Jun 24 · 4 min read

Elasticsearch is a highly scalable open-source full-text search and analytics engine. It allows you to store, search, and analyze big volumes of data quickly and in near real time.

To know more about Elastic Search, the official reference guide is a great place to start: https://www.elastic.co/guide/en/elasticsearch/reference/7.1/elasticsearch-intro.html#elasticsearch-intro

Data IN

Elasticsearch uses a data structure called an inverted index that supports very fast full-text searches, for text fields. Numeric and geo fields are stored in BKD trees. Quoting from the above:

An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.

BKD-Tree: Quoting from a great post explaining the BKD trees: https://medium.com/@nickgerleman/the-bkd-tree-da19cf9493fb

It is a special index tree structure for searching over multi-dimensional data. This data can be anything from points in physical space to colors in a very large palette.

Original whitepaper: https://users.cs.duke.edu/~pankaj/publications/papers/bkd-sstd.pdf

Every field is auto-indexed and not necessarily just as text. ElasticSearch will detect and map booleans, integers, floating points, dates, etc., and can be run as schemaless. However, it’s recommended to map key fields and especially geo_point and geo_shape which aren’t auto-detected.

Data OUT/ Search

Search capabilities built on the Apache Lucene Search engine library. Elasticsearch provides a simple, coherent REST API for managing your cluster and indexing and searching your data

You can Elasticsearch’s comprehensive JSON-style query language (Query DSL) OR you can also construct SQL-style queries to search and aggregate data natively inside Elasticsearch

For search, you can use Query Context or Filter Context both of which are used for matching data.

“Filter” is used to filter out data (YES or NO) and is not used for getting the “score” of the match. Explained well in this page: https://www.elastic.co/guide/en/elasticsearch/reference/7.1/query-filter-context.html

Similarly, a good one to read about SQL access: https://www.elastic.co/guide/en/elasticsearch/reference/7.1/sql-getting-started.html

Indexing and Trying search

The easiest way to install on the MAC is by using brew install: https://www.elastic.co/guide/en/elasticsearch/reference/7.1/brew.html

brew tap elastic/tap
brew install elastic/tap/elasticsearch-full
elasticsearch

If you have httpie installed (else, you can install with brew: “brew install httpie”), you could just run

  1. http localhost:9200/_cat // to get the list of system objects. Note that specifying localhost is optional while using httpie

Adding data to search:

  1. http PUT :9200/recipe for creating the index

Get/Searching data:

  1. http GET :9200/recipe/_search OR
    http GET :9200/recipe/_search q==’*’ // Return all recipes as part of hits.hits array (defaults to first 10 documents)

Filters should be used:
1.
for binary yes/no searches
2. for queries on exact values

Queries should be used instead of filters:
1.
for full-text search
2. where the result depends on a relevance score

Filter Example:

http GET :9200/recipe/_search query:=’{“bool”: { “must”: { “match” : { “name”: “chicken”} }, “filter”: { “match”: {“rating” : 10} } } }’

GeoSpatial Search:

If you want to query a geolocation point, it depends upon the accuracy of the point that is stored and being queried. Latitude and Longitude can have 12 digits of accuracy and typically we end up using 6 (As also shown in Google Maps -> Click on a point -> What’s here to get Lat/Lon listed)

ElasticSearch provides mechanisms to search for geolocation with a distance factor of radius/box around it which should also match. Let’s try an example:

http PUT :9200/restaurant mappings:='{
"properties": {
"location": {
"type": "geo_point"
}
}
}'
http PUT :9200/restaurant/_doc/1 name="Sireesha's Restaurant" location:='{"lat": 37.331910, "lon": -122.023217}'

You can search for a location without it matching exactly, like below:

http GET :9200/restaurant/_search query:='{
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "1km",
"location" : {
"lat" : 37.33195,
"lon" : -122.02322
}
}
}
}
}'

The above will return the item we have saved since the location is within 1 KM.

Just for your info, this wouldn’t have worked if the field wasn’t marked as a geo_distance field (first step of defining mappings)

The error returned is:

"root_cause": [
{
"index": "restaurant",
"index_uuid": "LXCOASGwTx2WOSo4VOFISw",
"reason": "failed to find geo_point field [location]",
"type": "query_shard_exception"
}
],
"type": "search_phase_execution_exception"

TechieConnect

My technical posts mostly on Java backend services

bschandramohan

Written by

Software Engineer working on Java based Micro-services deployed on AWS cloud. California, US resident currently — born and brought up in Bangalore, India.

TechieConnect

My technical posts mostly on Java backend services

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade