Time Series DB are getting momentum!
While celebrating each new year, it’s often customary to look for the past year in the mirror. Specialised DB-Engines website did the exercice earlier this year and elected PostgreSQL the RDBMS of the year 2017. Well deserved Postgre!
DB-Engines tries to rank every single database whatever their use case or model, so it’s no surprise that relational databases are culminating at the top of the ranking. There are many kind of databases that address different use cases :
- Relational DB (PostgreSQL, MariaDB)
- Document Stores (MongoDB, CouchDB)
- Key/Value Stores (Redis, RocksDB)
- Column Stores (HBase, Cassandra)
- Search engines (Solr, Elasticsearch)
- Graph DBMS (Neo4J, JanusGraph)
- Time Series DBMS (Prometheus, Influxdb)
- and others…
What is also interesting in the last DB-Engines post is that Time Series DBMS consecutively scored a second year with the strongest growth among all others.
Time Series DBs increased their score by about 70%
Among all kind of databases, Time Series are the category with strongest growth. This category was identified as a trend since mid-2015 and it has started to show off since early 2016.
While being the strongest trend, Time Series interest is young and still it hasn’t gain the same adoption than Relational Databases or others but it’s only time before every developer stack include a metrics solution. In any case, the Time Series Database field is currently booming with new solutions, projects, contributions and tools. At OVH, we participate in this effort by contributing to projects like Noderig, Beamium, Warp10, Grafana plugins, and others.
But why should you care about Time Series?
Measuring is all about taking better decisions
Time Series are used to track changes over time. It also provides a factual way to express a feeling by measuring things.
“If we have data, let’s look at data. If all we have are opinions, let’s go with mine.”
Jim Barksdale, former Netscape CEO
I think this quote clearly translates the idea behind Time Series adoption : measuring.
To synthesize Coda Hale presentation, measuring :
- costs nothing to be added in developer code
- help you understand how your code runs in production, so as your business
- improves your mental model about your business
- optimises the way you think
- help take better decisions
Fixing code is a business driven decision. Servers running fine is a business driven decision. Developing a feature is a business driven decision. Measuring application code and infrastructure, helps you understand your business, and opens a data driven business, based on measurable KPIs which at the end, allows taking better decisions for your business.
What changed since early 2010s so that the landscape has been changed significantly?
A growing popularity!
As listed by DB-Engines, many kind of data are actually Time Series data. A quick overview for example :
- sensors measurements (water cooling, presence, …)
- monitoring data (CPU, memory, disks, …)
- stock exchange (Euros, Dollars, Bitcoins)
- ressource consumption (fuel, …)
- events, stats, signals (time spent, …)
- IoT data (parking occupancy, energy grid, …)
- Health (heart rate, …)
- and many others, basically any data that can be visualised as a charts
In all these domains, collecting data offers many benefits regarding resources optimisations, tracking, forecasting, business intelligence, …
Since early 2010s, many things happened. First Graphite has matured since its first inception in 2008. By the time, many Time Series use cases where implemented inside traditional databases like MySQL or PostgresSQL but as the volume of data has rapidly grown, these databases couldn’t handle this kind of workload. OpenTSDB has been released in 2010 with significant improvement from Graphite and its flat data model. OpenTSDB was based on HBase, which pushed back many limits. From there, new use cases started to be implemented, mostly for monitoring purpose but other projects has also been pushed into a proper Time Series model. Then other projects entered in the game, KairosDB, InfluxDB, Prometheus and Riak TS to name a few of them. Today, modern Time Series platforms combine more functionalities than just plotting charts and some of them have become true analytics platforms like Warp10.
Trend over Time Series DBMS popularity
Many solutions exists on the market to satisfy different criteria and choose the right one can be challenging :
- Distribution : Open Source, Open Core, Proprietary
- Source language : Java, Golang, Python, Erlang, …
- Versatility : distributed, standalone, embedded
- Data model : modern, flat, metric/tags, text
- Data storage : full resolution vs fixed database
- Supported data types : Integers, Floats, Booleans, Strings
- Query model : interactive, job, MapReduce
- Analytics : server side functions, SDK, custom
Trends are important, but not the only criteria, measuring the ecosystem is also interesting. Ecosystem includes aspects like libraries compliancy, contributions, adoption by other projects, protocol compatibility and usability, project roadmap and more.
At OVH Metrics we have extensively tested and worked with many Time Series Databases, and in this posts we want to give you a quick feedback with our feeling on a few of them:
According to DB-Engines report, InfluxDB is the current leader in the Time Series Database field. This leading position can be easily explained: InfluxDB is here since late 2014, and its Open Core model, its single binary distribution and a well thought community approach have greatly helped to give them a large adoption, specially for small projects.
InfluxDB has consistently improved over the years, iteratively rebuilding its storage layer to fix lots of early problems. Two main issues still limit its usefulness for big projects: the not-stellar performance of Kapacitor, the components used to run continuous queries (we name this Loops at OVH Metrics), and a consequent slowness when using authentication.
Originally InfluxDB proposed a SQL like query syntax, that we don’t think is the best syntax when dealing with Time Series model. More recently, Influx has published an interesting language that could combine both simple queries and data flow : IFQL. We don’t have yet any feedback on using IFQL in large projects, with their associated challenges.
As the eldest of the Time Series DB field, Graphite has lost a bit of its popularity, as the younger tools propose more features in fancier ways. Graphite offers a rather complete functions set to manipulate a data set, even if its original data model was rather limiting, with a flat dot-separated metrics naming. This naming strategy made more difficult to build complex queries, with regular expression to match for the desired selection, but at the time, it was fine and offered many features for small volumes.
For some years the project slowly declined, without any major evolution. But in the last months, Graphite has received a new surge of interest with new features as storing data using tags, joining other modern data models.
Inspired by Borgmon (the Google internal monitoring tool), OpenTSDB was the first open-source Time Series DB to propose a modern data model: each time series is defined by a metric name and a list of tags.
Scalability being one of the main tenants of OpenTSDB, it uses HBase as storage engine. That means that in order to use it you need to install and manage a Hadoop cluster with HBase, so its deployment isn’t exactly easy. OpenTSDB was massively used at OVH before switching to OVH Metrics.
Prometheus is a Borgmon copycat and one of the main InfluxDB challengers. If their current trend continues, Prometheus format could evolve to become a de facto monitoring standard in a foreseeable future. Evolving from a pure in-memory Time Series database, Prometheus has added a storage layer and can be used as a persistent store. Still it’s not available as a distributed solution so it neither will scale for high workload nor offer high-availability.
Prometheus is more analytics oriented than the precedent solutions, with a query language, PromQL. While PromQL is more complete than OpenTSDB, it still lack many functions to be used as an analytical Time Series platform. Inside Google many people disliked Borgmon due to its language syntax and the fact that it wasn’t a service, each team needing to deploy and manage its own Borgmon. Prometheus, as a Borgmon clone, unfortunately shares the same flaws. It’s worth to note that since then Google has switched to Monarch, a metrics service like @OvhMetrics.
At OVH, we integrate with the Prometheus ecosystem by providing custom exporter, and we developed the Beamium scraper, that proposes metrics pushing (instead of pulling) and fine grained authentication, and offers a DFO (Disk Fail Over) in case of network issues.
Warp 10 takes its origins from Borgmon like OpenTSDB but in addition to its APIs, it provides a rich data manipulation framework called WarpScript containing hundreds of functions. From this point of view, it’s the only challenger to Graphite and its large library.
Warp 10 supports several storage engines, with a standalone version storing on LevelDB and a distributed one relying upon HBase like OpenTSDB. Its data model being somewhat more flexible than OpenTSDB one, Warp 10 solves the scalability issues that we observed in OpenTSDB for very large datasets.
Warp10 has a emerging community and, even if it is still a bit behind in term of user experience, we consider it today as the most mature Open Source technology for operating Time Series at scale. As one of our technologies of choice, we’re contributing tools (Tour, Forge) to the projects to enhance user experience.
At Metrics, we think each of these solutions have a different value to offer and it isn’t our role to decide what’s best for you. For a given need, you could be interested by the simplicity of an OpenTSDB or PromQL query, and for another, you could need WarpScript for advanced analytics use cases. This is why we consider our platform to be agnostic on the Query layer, and we’ve implemented some of them to match our customers’ need. Currently, we support :
- PromQL / Prometheus
- WarpScript / Warp10
Being agnostic, you can push metrics with a protocol, and query the same metrics with another one. Or query the same metrics twice with two different protocols. This way, there is no vendor lock-in.
Working on a Time Series database is hard and full of challenges :
- Write vs Read patterns
- Dealing with massive cardinality issues
- Dealing with ephemeral Time Series
- Storage Data model design vs generic Query Pattern
- Combine Real Time and Analytical workloads
We will publish other posts covering how we answer to these challenges.
How to start with Time Series?
If you’re convinced on the need for a Time Series solution, here is are quick first steps to get started :
- Identify your Time Series data or business KPIs
- Instrument your applications to expose these KPIs
- Choose a managed service like OVH Metrics or start with a self operated Open Source solution
- (Optional) Use Noderig or a collector (node-exporter, telegraf, scollector, …) to get infrastructure metrics
- (Optional) Setup Beamium for scraping your app and/or infrastructure metrics
- Set up a dashboard for visualising your KPIs from collected Metrics
- You can now correlate application and infrastructure
Now you should have a better understanding of your business issues.
Follow us on twitter: @OvhMetrics . Do you want to work with us on these kind of challenges? Ping us on twitter :)