Elasticsearch — Implementation at Easygo

J
Easygo
Published in
3 min readApr 14, 2020

At Easygo, we’ve been using Elasticsearch for multiple parts of our applications for more than a year now.

Beyond logging, we first turned to Elasticsearch as a solution to our lookup problem and more recently our Sportsbook search.

The initial problem that Elasticsearch solved for us was our bet look up feature. As the volume of bets on our site increased so did our data and sadly response times. Slow bet lookups were a major drawback for user experience on the site as verifying results and sharing bets in our live chat are core parts of our sites functionality.

Our initial approach with the bet search was store all the bets in rolling indices.

Rolling indices are a common pattern in Elasticsearch development where new indices are created on an interval (i.e. daily, monthly, yearly). Each day a new index would be created to hold the bets for that day, and the previous day’s index would become readonly and eventually ‘shrunk’/compressed.

This approach was initially successful but storing millions of bets each day soon started to stretch our storage capacity despite the compression.

What we learnt was that indices containing large numbers of small documents tend not to benefit much from compression as the documents are gzipped individually and gzip really only works well when you’re compressing a large amount text. So we had to look for different approaches.

Elasticsearch positions itself as a fast text search database and to achieve that, there are compromises that have to be considered — where you have to ask questions about your data and how you want to use it before you can get the most out of Elasticsearch.

With the bet search, the first question we had to ask was whether it was more important to have bets fetched fast briefly, or slow forever. In other words was it more important for a user to be able to quickly access their bets from today, or this week etc, or for them to be able to fetch their bets from a year go but with slower response times.

In this case we decided faster was better and that archiving would be dealt with outside of Elasticsearch. So we had decided that we would delete old data, great, our storage problem solved.

However, there was a caveat; large bets should still remain.

On the surface this seems like a straight forward problem, delete all the records where the wager amount is below our threshold. But like many things in elasticsearch, deletion has a price.

In Elastic, deletion of individual documents is not efficient, especially when you’ve got millions of documents per index. So, saving a small number of large bets in an index of millions of small ones would choke up our cluster. Deleting all the small bets wasn’t an option.

We had to re-evaluate how our data was structured at a higher level. We learned that deleting indices was much more efficient than deleting documents individually and perhaps segmenting our indices by the “value” of the data they contain could make a deleting our a less valuable a lot simpler.

Our approach was to extend the segmentation we get from the rolling index approach by breaking our single rolling index into 3 indices: small, medium and large.

The small index, containing our lowest value bets, is rolled over every day and completely deleted after one week. The medium index is deleted after a month and the large index is never deleted.

This approach allowed us to keep our fast lookup on all recently created bets and large bets while preventing the size of our data growing unsustainably.

The obvious tradeoff here is that players who make small bets will not be able to access them as easily once they have been cleared from elasticsearch, but this kind of compromise is characteristic of many solutions you will uncover when developing with elasticsearch.

--

--