Design of a Cost Efficient Time Series Store for Big Data

Roman Leventov
Nov 29, 2017 · 15 min read

Design overview

Design of a time series store with three decoupled subsystems. Light blue lines mean the flow of uncompressed row-oriented data; dark blue lines — of compressed columnar data; red lines — of query results.

Network is the bottleneck

Partial partitioning for uneven key distributions. Each box is a partition. The partition with “other values” could have thousands of “long tail” values.


Cloud Object Storage

Data partitions in Parquet format in HDFS


Cassandra or Scylla

Stream processing system

Computation tree


Drawbacks of the proposed time series system

Roman Leventov

Written by

Software engineer and designer, author.

