Programming with Erik

814 Followers
·

Can Elasticsearch Store Data?

Spoiler alert: you can, and I’ll show you how. But it’s not always advisable.

Image for post
Image for post
Image by Tumisu from Pixabay

You want a terrific search engine, and store your data in it too? Elasticsearch will happily do it, even though some insist that it’s not a document store, let alone a data store!

Don’t listen to them, because Elasticsearch is very capable and reliable and will store your data as well as making it searchable. This is advice from someone who’s been using Elasticsearch for close to ten years.

There are two types of data you might want to store in Elasticsearch:

  1. Your JSON documents, containing numbers, lists, text, geo coordinates, and all the other formats Elasticsearch supports.
  2. Binary data

Let’s look at both of these in detail.

Storing JSON data in Elasticsearch

By default, Elasticsearch keeps a copy of all the JSON documents you offer it for indexing in a field called _source. You get a copy of this stored data on each query that matches the document. So yes: you are able to store your data in Elasticsearch and retrieve it too. It’s a document store as well.

Let’s take a look at one of the example documents from my tutorial on Elasticsearch. This is part of a search query I did on the twitter index:

{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3Vko724Bb6gkOIFPeQcs",
"_source" : {
"username" : "eriky",
"post_date" : "2019-12-08T14:10:12",
"message" : "I wrote an article on Elasticsearch"
}
}

As you can see, all the fields are there in the _source field.

If your documents are stored externally too and you use Elasticsearch purely for its indexing power, you can opt to disable the _source field. There are several reasons why you probably don’t want to disable this though:

  1. You can use the re-indexing features Elasticsearch offers. Since it has the source document at hand it will happily re-index your data, e.g. to accommodate for a change in your mapping, or consolidating multiple indexes into one.
  2. The update-by-query API only works if there is a _source document
  3. You can use Elasticsearch’s on-the-fly highlighting feature, which works great for text search.
  4. There’s a lot of ease in debugging your queries, by directly looking at the resulting documents for a query.
  5. In future versions: the ability to repair index corruption automatically

So with a bit of overhead in terms of storage space, you do get many features in return. Unless storage is really a concern, it’s a no brainer — you’ll want to store the source document!

Storing binary data in Elasticsearch

Some people want to store binary data in Elasticsearch. Usually not for searching, but simply to keep stuff together. An example might be a user profile and the accompanying (small) profile image.

You can do this too. Elasticsearch has a binary field type, which you can use in your mapping. An example of such a mapping could be:

{
"mappings": {
"properties": {
"username": {
"type": "text"
},
"message": {
"type": "text"
},
"profile_image": {
"type": "binary"
}

}
}
}

When putting data into such a field, you’ll need to base64 encode it:

{
"username" : "eriky",
"message" : "I wrote an article on Elasticsearch",
"profile_image": "U29tZSBiaW5hcnkgYmxvYg=="
}

Don’t bother decoding it, it’s not an image. But hopefully, you get the idea.

I wouldn’t use this for big files, but it’s really not that bad of an idea to use it for small files, especially if that reduces the technology stack you need for your project.

You might be wondering if people are actually doing this. The answer is yes. I’ve worked with clients that store close to a billion documents with small (around 200 bytes) binary sensor data included. Again, I would not recommend it for large files, e.g. anything beyond a few hundred kilobytes. Please leave a comment if you do!

Conclusion

Elasticsearch will store all the data you put into it by default, so it works both as a search engine and a document store. Elasticsearch can also store binary data, albeit with a little warning: keep the size reasonable since the internals of Elasticsearch were not build to store large binary blobs, but to allow for astonishingly fast indexing and searching.

Written by

Software developer by day, writer at night.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store