Fundamentals of Elasticsearch

Part 1 — Creating data iteratively

Introduction

I have been using Elasticsearch for a while now and this lesson will cover the fundamentals of the Elasticsearch search engine.

I have taken an iterative approach and mirrors the problems I faced when setting up an ELKB stack for internal system monitoring.

Elasticsearch has been around for a few years and is quite mature. However, it suffers from poor documentation and major versions introduce some breaking changes, which render older tutorials and StackOverflow posts obsolete. This tutorial can serve as a baseline for the current 6.1.1 release of Elasticsearch and I hope to re-test against subsequent versions for any breaking changes.

We will execute quite a few scenarios including creating and updating indices, updating mapping and running a geospatial search.

And I’ll be using my favorite cartoon series, Transformers, to play with the data.

We will run our scenarios using a Ubuntu based standalone version of Elasticsearch 6.1.1. In case of any problems, please let me know in the comments and I’ll fix ASAP.

What this is not

This is not a comparison of different search frameworks or a discussion on whether it makes more sense to implement search in your current database such as PostgreSQL or MongoDB.

Prerequisites

You have some basic knowledge of making HTTP URI requests and are familiar with the concept of databases and indexes in general.

I would be running scenarios using:

  • Elasticsearch 6.1.1 standalone version on the Windows Bash Ubuntu subsystem
  • curl

You can run Elasticsearch standalone on Windows or install it as a service. You can also use the tool Postman if you like GUIs.


What is Elasticsearch

Elasticsearch is a search engine that is built around the Apache Lucene project. Lucene is built around a data structure called the inverted index, which is optimized for search.

Since Lucene itself is hard to work with, other projects were built around it, to make it easier to ingest or write data and retrieve or find data. Solr is another such project.

Elastic.co owns the Elasticsearch product and it also provides other products that complement it including:

  • Beats: Small lightweight processes which collect various telemetry data from servers such as RAM, CPU & network usage and Apache & NGINX logs
  • Logstash: A service that collects data from multiple sources such as Beats instances, processes and changes that data and passes it on to an Elasticsearch server
  • Kibana: Web-based visualization for Elasticsearch data

Start Elastic

  • Unzip the contents and change directory
unzip elasticsearch-6.1.1.zip && cd elasticsearch-6.1.1

For good housekeeping purposes, let’s give our instance a proper name and specify data and log directories

Elasticsearch is configured with commandline parameters or via its YAML configuration files. YAML is a terse and easy to read data format

(YAML) structure is shown through indentation (one or more spaces). Sequence items are denoted by a dash, and key value pairs within a map are separated by a colon. source

So let’s open config/elasticsearch.yml and uncomment and change the following lines (you can use your own terms)

cluster.name: cybertron-cluster
node.name: cybertron-node-1
path.data: ./data
path.logs: ./logs

Open a terminal and start Elasticsearch with the command:

bin/elasticsearch

This will create the directories elasticsearch-6.1.1/data and elasticsearch-6.1.1/logs and start the engine. If your instance is running properly, you should see the following logs on the termnial:

[2017-12-28T11:29:23,282][INFO ][o.e.h.n.Netty4HttpServerTransport] [cybertron-node-1] publish_address {127.0.0.1:9200}, bound_addresses {[::1]:9200}, {127.0.0.1:9200}
[2017-12-28T11:29:23,284][INFO ][o.e.n.Node ] [cybertron-node-1] started
[2017-12-28T11:29:23,355][INFO ][o.e.g.GatewayService ] [cybertron-node-1] recovered [0] indices into cluster_state

This indicates that we have 1 node called cybertron-node-1 that is listening on port 9200 on the local interface and it has found 0 existing indices (since this is its first run). You should now be ready to start running queries and analyzing results.

One of my favorite parts of Elasticsearch is that you can execute all search operations, data CRUD and system analysis and through HTTP REST calls.

So let’s call the default endpoint and see if it’s all good

curl localhost:9200

This returns us the following response:

{
"name" : "cybertron-node-1",
"cluster_name" : "cybertron-cluster",
"cluster_uuid" : "rxF850LBSuuqBCT8wGQOXA",
"version" : {
"number" : "6.1.1",
"build_hash" : "bd92e7f",
"build_date" : "2017-12-17T20:23:25.338Z",
"build_snapshot" : false,
"lucene_version" : "7.1.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
},
"tagline" : "You Know, for Search"
}

This is kind of like the --versionof Elasticsearch. You can see that the version of Elastic is 6.1.1 and it uses Lucene 7.1.0 as its brains.

Analyzing system health

Before we actually go ahead and play with data, let’s look at what’s happening under the hood.

Elasticsearch centralizes around the concept of a cluster. A cluster is a collection or one or more Elasticsearch instances (like the one we ran above) which coordinate between each other to create, manage and serve data. The engine also keeps track of its own health and makes that information available to us.

First, let’s see the cluster health

curl localhost:9200/_cluster/state?pretty

We will append ?pretty at the end of all subsequent requests for indented readable JSON — the default JSON response is unindented (for requests to localhost:9200, Elasticsearch makes it pretty by default)

This returns a JSON response — the interesting part is:

..
"nodes" : {
"FopioI8WQXCZbNQrZ2jpSQ" : {
"name" : "cybertron-node-1",
"ephemeral_id" : "w7ctNxkxTm6HVQn6np6-DA",
"transport_address" : "127.0.0.1:9300",
"attributes" : { }
}
},
..

This indicates that we have 1 node running in the cluster. Notice the port number 9300. This is not the port that we use to access the cluster but rather the port that Elasticsearch uses for inter-cluster communication.

Another way to check system health is to check the _cluster/health endpoint:

curl localhost:9200/_cluster/health?pretty
{
"cluster_name" : "cybertron-cluster",
"status" : "green",
...
"number_of_nodes" : 1,
"number_of_data_nodes" : 1,
"active_primary_shards" : 0,
...
}

This indicates that:

  • We have a “green” or healthy cluster which does not contain any data.
  • There is 1 node that is designated to hold data (it’s empty right now)
  • There are 0 shards. A shard is a set of data that has data distributed horizontally across it.

These results will change soon as we add some data so let’s go…

Creating and updating new documents

As per Elasticsearch (I’ll shorten it to ES from now on) “a document is a basic unit of information that can be indexed”. Each index also has a type, with documents of similar types belonging to the same index e.g. we can put have a vehicle index and car type and truck type, with common attributes in each and some extended attributes in either. If you know your OOP, this sounds like inheritance.

This is how ES used to be, before version 6.0.0, but we will go ahead and use the same approach anyway since there are many ES production instances running older versions and quite a few of those would be looking to migrate.

Let’s look at all the indexes that currently exist — let’s make an HTTP/REST call to the ES endpoint that’s listening for user commands.

curl localhost:9200/_cat/indices?v
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size

This shows that there are currently 0 indexes in ES and we have a clean system.

Note: All the text in normal font is the command I execute and italics is the system response — in order to try out the commands yourself, copy/paste the lines in normal font

Let’s add some data. I want to create a search app around Transformers, who are a race of alien transforming robots. They are broadly split into the good Transformers called Autobots and the bad Transformers called Decepticons.

So I would imagine that it would be nice to have an index called transformers and a type autobots and another type decepticons.

Now we can create and index and define its mapping (its type and other ES related meta) and then add data but the nice thing about ES is that you can just write the data to an index and type — and it will infer the data type from the data itself. This is called dynamic mapping. This is a really nice feature but it can also lead to problems as we will see later

So let’s add data and let ES figure out the mapping or data types

curl -XPOST 'localhost:9200/transformers/autobots?pretty' -H 'Content-Type: application/json' -d'
{
"name" : "Optimus Prime",
"transforms_to" : "Freightliner FL86"
}'
{
"
_index" : "transformers",
"
_type" : "autobots",
"_id" : "qM5gomABYvQwn70j0LuL",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}

So you can see above that I told ES to create an index called transformers and type of autobots and it responds with metadata confirming its creation. The _id field is a unique identifier of this document and it will be different for you.

Notice I specified a json document with 2 scalar string fields — one is the name of the leader of the Autobots, Optimus Prime and another is the transforms_to field

Optimus Prime transformation

Also note I used the ?pretty modifier at the end of the command, this ensures that the response returned to us is pretty-printed

Now lets create a new transformer of type decepticon

curl -XPOST 'localhost:9200/transformers/decepticons?pretty' -H 'Content-Type: application/json' -d'
{
"name" : "Megatron",
"transforms_to" : "Walther P38"
}'
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "Rejecting mapping update to [transformers] as the final mapping would have more than 1 type: [autobots, decepticons]"
...
"status" : 400
}

Oops, what happened here? ES returned an error with code 400 which implies “ The request could not be understood by the server due to malformed syntax.”. And we can see the reason, “Rejecting mapping update to [transformers] as the final mapping would have more than 1 type”.

So the same index cannot have more than 1 type — this is actually new to Elasticsearch and was introduced in version 6.0.0 — unfortunately there is still a lot of stale documentation and Q&A out there, including the ES site itself, where this is not clearly specified so keep it in mind. I hope ES removes the type altogether since it doesn’t seem to serve any purpose but this may require heavy code refactoring internally.

So let’s start over and delete the current index

curl -XDELETE  'localhost:9200/transformers?pretty'
{
"acknowledged" : true
}

We have 2 choices now

  • Create an index named transformers and type all and use a field attribute called side with the value of either decepticons or autobots
/transformers/all
{
"name" : "Optimus Prime",
"side": "autobots",
"transforms_to" : "Freightliner FL86"
},
{
"name" : "Megatron",
"side": "decepticons",
"transforms_to" : "Walther P38"
}
  • OR Create 2 indices called autobots and decepticons with type all
/autobots/all
{
"name" : "Optimus Prime",
"transforms_to" : "Freightliner FL86"
}
/decepticons/all
{
"name" : "Megatron",
"transforms_to" : "Walther P38"
}

This is a subjective choice — I prefer the latter because it seems more DRY (don’t-repeat-yourself) and we create less data so let’s execute that

Note: Keep in mind, spend a lot of time studying the internals of Elasticsearch, as for any other database, since a well designed data model goes a long way in implementing a good search product.

Let’s also add a field of type date so we can see ES dynamic mapping in a more clear manner

Let’s create an autobot index (with type all, which is really a placeholder)

curl -XPOST 'localhost:9200/autobots/all?pretty' -H 'Content-Type: application/json' -d'
{
"name" : "Optimus Prime",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : "Freightliner FL86"
}
'

and a decepticon index

curl -XPOST 'localhost:9200/decepticons/all?pretty' -H 'Content-Type: application/json' -d'
{
"name" : "Megatron",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : "Walther P38"
}
'
Megatron can change into a modified Walther P38

Now let’s check if the indices are in the system

curl localhost:9200/_cat/indices?v
health status index       uuid
yellow open decepticons CGsMpbBJSVepQhYgbRiDkA ...
yellow open autobots F5IJK644SIO_0GQAVVKsNA ...

I have truncated the response but this shows that there are 2 indices in the cluster.

The health indicator is yellow. Remember, earlier in the lesson we saw that the cluster health was green. The health hasn’t eroded but this actually shows that the data is not in a healthy state (It was no data there at all before). It will turn green once a system has multiple backup copies of the data, called replicas. We won’t do this here since it’s out of scope for this lesson.

Now let’s search for the data itself. For this, we call the _search endpoint. We can do it in different ways e.g. the following searches a specific index and a type and the second searches all types of the same index (this is redundant now, but it still exists)

curl localhost:9200/autobots/_search?pretty=true
curl localhost:9200/autobots/all/_search?pretty=true

And this query searches all indices:

curl localhost:9200/_search?pretty=true

This returns a very rich result set — some of the important fields for us are:

{
"took" : 1,
"timed_out" : false,
...
"hits" : {
"total" : 2,
"max_score" : 1.0,
"hits" : [
{
"_index" : "decepticons",
"_type" : "all",
"_id" : "q858omABYvQwn70jobtG",
"_score" : 1.0,
"_source" : {
"name" : "Megatron",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : "Walther P38"
}
},
{
"_index" : "autobots",
"_type" : "all",
"_id" : "rM59omABYvQwn70jGbtm",
"_score" : 1.0,
"_source" : {
"name" : "Optimus Prime",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : "Freightliner FL86"
}
}
]
}
}

The _source subdocument actually contains the document itself — the rest is meta-data about the search

  • _index: Name of the index the data belongs to
  • took: time in milliseconds to execute the search.
  • hits.total: total number of documents returned
  • score: how relevant is this result to the query — we didn’t specify a certain search query so this field is irrelevant at the moment

Now let’s look at the fields — we know that Elasticsearch can dynamically create mappings i.e. cast fields into data types by inferring from the data format so let’s see how good it is…

curl localhost:9200/autobots/_mapping?pretty
{
"autobots" : {
"mappings" : {
"all" : {
"properties" : {
"earth_arrival" : {
"type" : "date"
},
"name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"transforms_to" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
...

So we can see that ES figured out using dynamic mapping that the string data formatted as “1986–12–29T14:12:12” was actually a date field and it stored it as a date type field and not a string type field so we can do date related operations here like query for documents 5 days before Dec 12, 1986

Another interesting thing to note, and again this was introduced in ES 5.0 is the type of the text field. You can see that its actually resolved to 2 types, text and keyword. Previously, people had problems storing string type data.

In Elasticsearch text type data can be used for free form search (e.g. search for the word “Optimus” in the name field in all documents in the autobot index) OR it can be used for aggregations (e.g. group and count all documents which have “firearm” in the transforms_to field).

It was tricky to define mappings for one or the either so ES by default creates both type of mappings for string fields. This is not something to worry about at this stage but it will be a useful thing to remember once you start optimizing your data. You can read more about this here too

OK, so let’s add some more data

curl -XPOST 'localhost:9200/autobots/all?pretty' -H 'Content-Type: application/json' -d'
{
"name" : "Bumblebee",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : "Volkswagen Beetle"
}
'
curl -XPOST 'localhost:9200/decepticons/all?pretty' -H 'Content-Type: application/json' -d'
{
"name" : "Starscream",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : "F-15 Eagle"
}
'
Bumblebee and Starscream

Let’s see if the data is there, we will search both indexes separately

curl localhost:9200/autobots/all/_search?pretty=true
curl localhost:9200/decepticons/all/_search?pretty=true

Oh, I just remembered, some Transformers can transform to more than 1 object

So let’s change the field transforms_to from a string field to an array of strings

Locate the _id of a document, we will use the Optimus Prime entry

curl localhost:9200/autobots/_search?pretty
{
"took" : 2,
...
{
"_index" : "autobots",
"_type" : "all",
"_id" : "
eBkzsWAB-JHRd1260ZjV",
        "_source" : {
"name" : "Optimus Prime",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : "Freightliner FL86"
}
...
}

And we pass this _id into the following REST call — this will be different for you

curl -XPOST 'localhost:9200/autobots/all/eBkzsWAB-JHRd1260ZjV/_update?pretty' -H 'Content-Type: application/json' -d'
{
"doc" : {
"transforms_to" : ["Freightliner FL86", "Cybertronian truck", "COBRA Sentry & Missile System tank"]
}
}
'

Notice that we are calling the _update endpoint — if you read the ES documentation and it mentions the Update API, this is what they are talking about.

Let’s take a look at the document again — and this time we can see that the field is now an array

curl localhost:9200/autobots/_search?pretty
{
"took" : 1,
...
"_source" : {
"name" : "Optimus Prime",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" :
[
"Freightliner FL86",
"Cybertronian truck",
"COBRA Sentry & Missile System tank"
]

}
}
]
}
}

Now, let’s add another field to the array

curl -XPOST 'localhost:9200/autobots/all/eBkzsWAB-JHRd1260ZjV/_update?pretty' -H 'Content-Type: application/json' -d'
{
"script" : {
"source": "ctx._source.transforms_to.add(params.more_transforms)",
"lang": "painless",
"params" : {
"more_transforms" : "Peterbilt Truck"
}
}
}
'

Here, we used the _update API to execute a script in ES’s own scripting language called Painless — and we told it to add the the value of the field params.more_transforms in the this script and append it to the document that’s returned from the URL. ctx._source refers to the source of the context of the URL.

And we search again and we get the additional element in the array

curl localhost:9200/autobots/_search?pretty
{
"took" : 1,
...
"_source" : {
"name" : "Optimus Prime",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : [
"Freightliner FL86",
"Cybertronian truck",
"COBRA Sentry & Missile System tank",
"Peterbilt Truck"
]
}
...

When dynamic mapping becomes a problem..

So, now we have a new idea — we want to plot the Transformers on a map — we have the location in the form of latitude and longitude

So using the _update API, let’s add a location to one of the documents

curl -XPOST 'localhost:9200/autobots/all/eBkzsWAB-JHRd1260ZjV/_update?pretty' -H 'Content-Type: application/json' -d'
{
"doc" : {
"location" : [73.06, 33.69]
}
}
'

In ES, we can define location in 2 ways:

  • As a json object:
"location": { 
"lat": 73.06,
"lon": 33.69
}
  • Or as a GeoJSON Point, where it’s expressed as an ordered array of 2 elements, where the first element is a longitude and the second is a latitude
"location": [33.69, 73.06]

We are using the latter above

Let’s look at the data:

curl localhost:9200/autobots/all/eBkzsWAB-JHRd1260ZjV?pretty

Or we can look at the document only using the _source endpoint like below

curl localhost:9200/autobots/all/eBkzsWAB-JHRd1260ZjV/_source?pretty
{
"name" : "Optimus Prime",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : [
"Freightliner FL86",
"Cybertronian truck",
"COBRA Sentry & Missile System tank",
"Peterbilt Truck"
],
"location" : [
73.06,
33.69
]
}

And it looks good — but there is a hidden problem

Let’s do a geolocation based search — let’s search for all autobots in a 500 km radius of a certain point. Since Optimus is within that area, we should see him there

This is a topic for another lesson but we are telling ES to search for all documents whose location field is in a 500 km radius of the the lat/lon in the specified location

curl -XGET 'localhost:9200/autobots/all/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "500km",
"location" : {
"lat" : 32,
"lon" : 73
}
}
}
}
}
}
'
{
"error" : {
"root_cause" : [
{
"type" : "query_shard_exception",
"reason" : "field [location] is not a geo_point field",
"index_uuid" : "vEGUAZ5cSSOWUsAQxf_nUQ",

Oops, ES doesn’t like this — it expects the location field to be of a type geo_point and not a pair of floats….

Let’s see what’s going on inside, let’s look at the mapping for this document

curl localhost:9200/autobots/_mapping?pretty
...
"type" : "date"
},
"location" : {
"type" : "float"
},
"name" : {
"type" : "text",
...

Oh, OK — so it’s saved as a float.

So ES read the input data, saw 2 floating numbers in array and assumed we were trying to insert an array of floats — remember this is dynamic mapping

So, we would like to change the mapping — thankfully there is an API for that…

So let’s give it a change mapping request

curl -XPUT 'localhost:9200/autobots/all/_mapping?pretty' -H 'Content-Type: application/json' -d'
{
"properties": {
"location" : { "type": "geo_point" }
}
}
'
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "mapper [location] of different type, current_type [float], merged_type [geo_point]"
...
"status" : 400
}

Ouch, ES doesn’t like it. Turns out ES doesn’t let you change the mapping or in other words its data type and associated meta once it has been defined, whether dynamically or manually by yourself. And this is actually one of the reasons why searches are so fast.

Hmmm… you know, we could delete this field and then define the new mapping and then re-add the data, that should work…

curl 'localhost:9200/autobots/all/eBkzsWAB-JHRd1260ZjV/_update?pretty' -H 'Content-Type: application/json' -d'
{
"script" : "ctx._source.remove(\"location\")"
}
'

Let’s add the new mapping — here is a list of valid data types

curl -XPUT 'localhost:9200/autobots/all/_mapping?pretty' -H 'Content-Type: application/json' -d'
{
"properties": {
"location" : { "type": "geo_point" }
}
}
'
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "mapper [location] of different type, current_type [float], merged_type [geo_point]"
...

Ouch, ES remembers the deleted field

So the only option is to either create a new field with a new name or create a new index, define the mapping before-hand and then add the data

Let’s start with a clean slate and create a new index

Let’s delete the index

curl -XDELETE localhost:9200/autobots?pretty

Let’s create the index and then define a mapping

curl -XPUT localhost:9200/autobots
curl -XPUT 'localhost:9200/autobots/all/_mapping?pretty' -H 'Content-Type: application/json' -d'
{
"properties": {
"location" : { "type": "geo_point" }
}
}
'

Or… we could do it together in one go

curl -XPUT 'localhost:9200/autobots?pretty' -H 'Content-Type: application/json' -d'
{
"mappings" : {
"all" : {
"properties" : {
"location" : { "type" : "geo_point" }
}
}
}
}
'

Let’s confirm the index exists

curl localhost:9200/_cat/indices?v

Let’s add some data using both types of location formats — I want to see a Transformer on a map

Plotting the result with an icon should show something like this
curl -XPOST 'localhost:9200/autobots/all?pretty' -H 'Content-Type: application/json' -d'
{
"name" : "Optimus Prime",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : ["Freightliner FL86", "Cybertronian truck", "COBRA Sentry & Missile System tank"],
"location" : [73.06, 33.69]
}
'
curl -XPOST 'localhost:9200/autobots/all?pretty' -H 'Content-Type: application/json' -d'
{
"name" : "Rodimus Prime",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : ["Dome Zero"],
"location" : {"lat": 31.482081, "lon": 74.336324}
}
'

Let’s confirm it was saved

curl localhost:9200/autobots/all/_search?pretty

And the mapping is correct too…

curl localhost:9200/autobots/_mapping?pretty
{
"autobots" : {
"mappings" : {
"all" : {
...
"location" : {
"type" : "geo_point"
},
...

Great, we have a geo_point

Let’s do a distance search…

curl -XGET 'localhost:9200/autobots/all/_search?pretty' -H 'Content-Type: application/json' -d'
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "150km",
"location" : {
"lat" : 32,
"lon" : 73
}
}
}
}
}
}
'
{
"took" : 6,
...
"_source" : {
"name" : "Rodimus Prime",
"earth_arrival" : "1986-12-29T14:12:12",
"transforms_to" : [
"Dome Zero"
],
"location" : {
"lat" : 31.482081,
"lon" : 74.336324
}
...

And it worked!

Even though we have only 2 documents, this will scale to millions of documents and still provide super fast text and geospatial search

Elasticsearch saves the day!

Conclusion

I expected this lesson to run for 2 or 3 minutes only but it feels like much longer. Elasticsearch is a very feature rich and complicated engine and the only reason it’s not as popular among developers as it should be is a high barrier to entry.

If you are new to search and need a quick SaaS solution, my top recommendation is Algolia. Even if you do end up setting up your own Elasticsearch cluster in your corporate intranet or for your startup, Algolia helps you implement use cases really well

But Elasticsearch is my recommendation if you mainly need to:

  • set up your own search engine as part of an independently deployable product stack or
  • for an company-internal project or
  • if you want to secure your data in your data center or
  • you need advanced search capabilities that Algolia doesn’t provide

At the end of the day, as a developer, knowing a good search engine is a very useful skill for your knowledge toolbase.

Images reused under the Fair Use law

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.