Design of a Cost Efficient Time Series Store for Big Data

Roman Leventov
Nov 29, 2017 · 15 min read

Design overview

Design of a time series store with three decoupled subsystems. Light blue lines mean the flow of uncompressed row-oriented data; dark blue lines — of compressed columnar data; red lines — of query results.

Network is the bottleneck

Partial partitioning for uneven key distributions. Each box is a partition. The partition with “other values” could have thousands of “long tail” values.

Storage

Cloud Object Storage

Data partitions in Parquet format in HDFS

Kudu

Cassandra or Scylla

Stream processing system

Computation tree

Platform

Drawbacks of the proposed time series system


Roman Leventov

Written by

Software engineer and designer, author. timeandspace.io

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade