Sitemap

Time series databases vs OLAP

4 min readJun 10, 2019

--

Time series databases(TSDB) are becoming very popular these days. Wikipedia defines a time series database as follows:

A time series is a series of data points indexed (or listed or graphed) in time order.  Most commonly, a time series is a sequence taken at successive equally spaced points in time. Thus it is a sequence of discrete-time data. A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (or a datetime range).

While these definitions are accurate they don’t paint a complete picture of a time series database. As a result, TSDBs are usually lumped into OLAP database systems. While TSDBs can be considered a subclass of OLAP systems, their read-write patterns are distinct enough that time series databases deserve to be in their own class.

In this post, we will look at a TSDB from database operations (CRUD — Create, Read, Update and Delete) standpoint and see how they differ from OLAP databases.

Create

Time series databases are very write-heavy. The volume of writes can be as high as 10s of millions of writes per second. While OLAP databases are write heavy, the number of writes per second is not O(millions). As a result, at large write volumes, OLAP databases tend to prefer batch ingestion instead of an event stream ingestion.

Time series data is often machine generated. So, the write volume is continuous. In an OLAP system, the write volume usually spikes at the end of a time period like the end of an hour, day, month or a quarter.

Schemaless: Time series databases historically don’t have a schema. OLAP databases usually need an upfront schema before they ingest the data.

Read

The reads in a time series database also differ in several important ways:

In a TSDB, recent data is more important than older data. Also, recent data is read more frequently than older data. As data gets older it becomes exponentially less valuable than newer data. The number of reads to older data also diminishes very quickly. While this behavior may be true in OLAP systems, the value curve is not as steep as in TSDBs. For example, my older bank transaction data is still very valuable, whereas my metrics from yesterday is not as valuable.

In a TSDB, only a small (often single digit) percentage of data is ever read, the rest of the data is write only. In a TSDB, even though a lot of data is collected, only a small percentage of data is ever read. In an OLAP database, all the data ingested is usually consumed. For example, all your sales data is usually summed up, so it is accessed at least once.

Skewed/Hot-key reads: Since dashboarding and alerting is a common use case for time series data, the small percentage of the data that is read, is very frequently read. During major incidents when everyone is looking at this data, this also leads to hot keys. In an OLAP system, since the data is uniformly accessed, the data access patterns are not as skewed.

Low latency reads: TSDBs are queried via dashboards that need to run at interactive speeds i,e. O(100ms). Even complex analysis of long-range data is expected to be finished in a few seconds.

Mostly aggregates but point lookups when needed: Most of the queries in TSDBs are aggregates, but point reads of individual data points may also be needed.

Update

Data in a TSDB is immutable for practical purposes. More accurately, the data in a TSDB becomes immutable over time. In a majority of cases, esp when the data is machine generated, the data is often immutable from the start. On rare occasions, data in a TSDB deleted or updated usually to fix errors. Updating the data in a TSDB is the exception and data in a TSDB can be considered immutable for practical purposes. OLAP systems don’t assume that the data is immutable.

Delete

In a TSDB, data is deleted after a certain time. So, in a system with a constant write rate, we would be deleting the same amount of data every second as we are writing it. So, in a time series database system, the effective write operations to the database are 2x the actual number of writes. So, most data is deleted in bulk. In an OLAP system, the deletes are infrequent and very low volume.

Conclusion

One can argue that the traditional OLAP systems are being increasingly optimized to support similar CRUD patterns as TSDBs. While that is true, the OLAP systems are still designed and used to support traditional data warehouse style workloads as opposed to TSDB workloads. Until that gap is bridged, don’t you think of TSDBs should be in a class of their own?

--

--

No responses yet