Choosing the Elastic stack as a time series database

Published in

Kudos Engineering

5 min readAug 7, 2018

With a set of features on the horizon that required time series data, and the need to improve our existing features that already used time series data, the engineering team decided to look at technology choices that could fulfil our needs.

The first step was to define our use case, our requirements and to research some alternative time series data stores. We then discussed our options and arrived at the conclusion illustrated on the whiteboard below:

Deciding which time-series databases to investigate

The output of that discussion was to conduct two spikes: firstly to investigate InfluxDB, and secondly to investigate Elasticsearch, along with both of their respective stacks. Our spike template included the following questions, (derived from others’ research whose URLs are lost in the swirl of engineering work, sorry!)

Spike questions

Cost

Operational complexity
How easy is it to scale if the TSDB storage is low?
How easy is it to scale the performance with increased data?
Are there any operations tasks that regularly need to be carried out?

Capabilities

Does the TSDB support metrics and events?
Can metadata be associated with an event?
What is the precision i.e. the smallest increment of time between events?
What is the consistency model?
Can the data have a Time To Live?

Performance

How many events per second can be written in, for a given scale of the system?
How many events per second can be read out, for a given scale of the system?
Bytes per point after any compression that occurs? (What volume of space is required?)

Query capabilities

How is the data queried? DSL? API?
How easy is the query mechanism to use?
Can the query mechanism aggregate data? Can it do it by date?
Can we de-dupe data? e.g., if we ingested the same event twice? A uniqueness constraint.

Ingestion

How is the data sent to the TSDB? API? Log scraping? Multi language clients?

Export

How could the entire TSDB be exported for use elsewhere or in another TSDB?

Maturity

How mature is the TSDB and its ecosystem? Is it likely to die soon? Does it have good support?

Community

How large and active is the community using the TSDB? If I have a problem will I be able to find someone to help me solve it?

Support

Is there any paid support for the TSDB and if so how much?

Security

What are the mechanisms for authentication and authorisation?
What other security implications are there?

Integration

How easy is it to integrate with other services? Is there anything specific that helps?

Visualisation

How can the data from the TSDB be visualised?
Are there any dashboards / high level visualisations?
Are the dashboards internal to the TSDB or can they be shared on a website?

InfluxDB

InfluxDB turned out not to fit our requirements due to our dataset being very wide. Rather than a couple of data series with millions of points we have almost ten million data series with fewer points. In InfluxDB we would need one series per entity: where a series is defined as the “collection of data that share a retention policy, measurement, and tag set.” Tag sets are indexed, field sets are not. InfluxDB’s speed is based on the fact that tag sets are stored in-memory, whereas the field sets are stored on-disk. Consequently we fell into the ‘Probably infeasible” category for Influx’s General Hardware Advice. Compounding that problem even more was the fact that we would have been performing queries that Influx deem as “complex”, primarily by performing aggregations on the data. So the amount of queries we would be able to perform at a given hardware level would be lower than they estimated.

I have no doubt that InfluxDB is a capable time-series database having used it previously, but for our needs it just didn’t fit.

Elasticsearch

Elasticsearch is generalised tool, so as a sanity check we researched opinions about its use as a time series database and found the whole spectrum, ranging from “Don’t be crazy, use a specialised TSDB” to “It works great!”, including some positive views from some teams at CERN who are processing a lot of data. So the next step was to validate it against our use cases.

The result was that it seemed to fit our data needs very well. Each event could be a document inside the index, each with a set of data representing its entity and metadata. Also the aggregations that we needed to do were well supported and one of the core features. The rest of the Elastic stack also seemed a good fit for our needs; Kibana provided easy to use visualisation of the data and Logstash provided us with an out of the box method of ingesting the data. When compared to all of the questions in our requirements, plus a set of use cases derived from our needs, the Elastic stack fared very well.

The spike continued on to use the data in a few test cases. The stack was easy to install on local dev laptops using the official Docker images and the use cases were all met with ease.

The final question was hosting and cost. As we’re a small team we wanted minimal ops work, preferably a hosted system, and a reasonable cost. There were a few alternatives:

Host on Kubernetes. Although we love Kubernetes for running our services we felt that would be too much operational work to run our own Elastic stack. We weighed this up against the low cost if we were able to reuse our existing Kubernetes nodes (although that was an uncertainty).
Use Elasticsearch on AWS. This option took away some of the operational concerns, but not all of them. The upside was that the cost was in the middle of our option.
Use Elastic’s own hosted Elasticsearch Service on Elastic Cloud. This option provided the least operational interaction at a slightly higher cost. However the versions of the Elastic stack are always the latest and it’s very easy to start small, at a lower cost, and scale up. This is the option we chose.

How’s it been?

At the time of writing we’ve implemented a few features using Elasticsearch as our time series database and the team has been very happy with how it’s gone. We’re ingesting our data via Logstash running on Kubernetes and reading from Google PubSub via a plugin, the setup was fine although involved some work (hosted Logstash on Elastic Cloud would be great). Elasticsearch has performed well and as we ramped up our usage it was easy to increase the cluster size on Elastic Cloud. We had an instance where we left the upgrade too late on our monitoring cluster, causing the auto scale to fail, but Elastic support fixed it for us pretty quickly on request. Kibana has been great for the visualisations that we need and it’s been easy to explore the data.

All in all we’d recommend trying out Elasticsearch for your time series use cases.

If you like what you read and think you could contribute to our team at Kudos then take a look at our careers page to see what roles we currently have open.