Can Elasticsearch Store Data?

Spoiler alert: you can, and I’ll show you how. But it’s not always advisable.

Image by Tumisu from Pixabay

You want a terrific search engine, and store your data in it too? Elasticsearch will happily do it, even though some insist that it’s not a document store, let alone a data store!

Don’t listen to them, because Elasticsearch is very capable and reliable and will store your data as well as making it searchable. This is advice from someone who’s been using Elasticsearch for close to ten years.

There are two types of data you might want to store in Elasticsearch:

  1. Your JSON documents, containing numbers, lists, text, geo coordinates, and all the other formats Elasticsearch supports.
  2. Binary data

Let’s look at both of these in detail.

Storing JSON data in Elasticsearch

Let’s take a look at one of the example documents from my tutorial on Elasticsearch. This is part of a search query I did on the twitter index:

{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3Vko724Bb6gkOIFPeQcs",
"_source" : {
"username" : "eriky",
"post_date" : "2019-12-08T14:10:12",
"message" : "I wrote an article on Elasticsearch"
}
}

As you can see, all the fields are there in the _source field.

If your documents are stored externally too and you use Elasticsearch purely for its indexing power, you can opt to disable the _source field. There are several reasons why you probably don’t want to disable this though:

  1. You can use the re-indexing features Elasticsearch offers. Since it has the source document at hand it will happily re-index your data, e.g. to accommodate for a change in your mapping, or consolidating multiple indexes into one.
  2. The update-by-query API only works if there is a _source document
  3. You can use Elasticsearch’s on-the-fly highlighting feature, which works great for text search.
  4. There’s a lot of ease in debugging your queries, by directly looking at the resulting documents for a query.
  5. In future versions: the ability to repair index corruption automatically

So with a bit of overhead in terms of storage space, you do get many features in return. Unless storage is really a concern, it’s a no brainer — you’ll want to store the source document!

Storing binary data in Elasticsearch

You can do this too. Elasticsearch has a binary field type, which you can use in your mapping. An example of such a mapping could be:

{
"mappings": {
"properties": {
"username": {
"type": "text"
},
"message": {
"type": "text"
},
"profile_image": {
"type": "binary"
}

}
}
}

When putting data into such a field, you’ll need to base64 encode it:

{
"username" : "eriky",
"message" : "I wrote an article on Elasticsearch",
"profile_image": "U29tZSBiaW5hcnkgYmxvYg=="
}

Don’t bother decoding it, it’s not an image. But hopefully, you get the idea.

I wouldn’t use this for big files, but it’s really not that bad of an idea to use it for small files, especially if that reduces the technology stack you need for your project.

You might be wondering if people are actually doing this. The answer is yes. I’ve worked with clients that store close to a billion documents with small (around 200 bytes) binary sensor data included. Again, I would not recommend it for large files, e.g. anything beyond a few hundred kilobytes. Please leave a comment if you do!

Conclusion

Tech Explained

Understandable, practical and useful explanations of technology

Erik-Jan van Baaren

Written by

A writer at heart and software/data engineer by profession. Subscribe to my low-volume newsletter at https://techexp.substack.com/

Tech Explained

Understandable, practical and useful explanations of technology

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade