Image for post
Image for post
Photo by Fabrizio Verrecchia on Unsplash

VictoriaMetrics: achieving better compression than Gorilla for time series data

Everybody who read Gorilla: A fast, scalable, in-memory time series database paper from Facebook, knows that:

Is it possible to achieve better compression rates for time series data than Gorilla?

Gorilla compression algorithm

  • To group data by individual time series.
  • To sort (timestamp, value) data points for each time series by timestamp.
  • To encode subsequent timestamps using delta-of-delta encoding. This substitutes big timestamps with small deviation values for the interval between subsequent data points. The deviation values are encoded with smaller number of bits comparing to the original timestamps.
  • To XOR binary representation of subsequent floating point values. Gorilla paper clams this usually gives binary values with big number of leading and trailing zero bits, which may be effectively packed into smaller number of bits.

Gorilla gives 3x-8x compression for typical time series data, i.e. it compresses each 16-byte (timestamp, value) data point to 2-5 bytes. The compression ratio varies depending on randomness of the original data — higher randomness results in lower compression ratio.

Gorilla compression analysis

0.0, a=0000000000000000, b=3FB999999999999A, a^b=3FB999999999999A
0.1, a=3FB999999999999A, b=3FC999999999999A, a^b=0070000000000000
0.2, a=3FC999999999999A, b=3FD3333333333334, a^b=001AAAAAAAAAAAAE
0.3, a=3FD3333333333334, b=3FD999999999999A, a^b=000AAAAAAAAAAAAE
0.4, a=3FD999999999999A, b=3FE0000000000000, a^b=003999999999999A
0.5, a=3FE0000000000000, b=3FE3333333333333, a^b=0003333333333333
0.6, a=3FE3333333333333, b=3FE6666666666666, a^b=0005555555555555
0.7, a=3FE6666666666666, b=3FE9999999999999, a^b=000FFFFFFFFFFFFF
0.8, a=3FE9999999999999, b=3FECCCCCCCCCCCCC, a^b=0005555555555555
0.9, a=3FECCCCCCCCCCCCC, b=3FEFFFFFFFFFFFFF, a^b=0003333333333333
1.0, a=3FEFFFFFFFFFFFFF, b=3FF1999999999999, a^b=001E666666666666

As you can see, the majority of a^b results have small number of trailing zeros. This means that the compression ratio for typical floating point values isn’t so great as advertised in the Gorilla paper. You can verify the results and play with your own data sets on this page.

Improving time series compression

0, a=0000000000000000, b=0000000000000001, a^b=0000000000000001
1, a=0000000000000001, b=0000000000000002, a^b=0000000000000003
2, a=0000000000000002, b=0000000000000003, a^b=0000000000000001
3, a=0000000000000003, b=0000000000000004, a^b=0000000000000007
4, a=0000000000000004, b=0000000000000005, a^b=0000000000000001
5, a=0000000000000005, b=0000000000000006, a^b=0000000000000003
6, a=0000000000000006, b=0000000000000007, a^b=0000000000000001
7, a=0000000000000007, b=0000000000000008, a^b=000000000000000F
8, a=0000000000000008, b=0000000000000009, a^b=0000000000000001
9, a=0000000000000009, b=000000000000000A, a^b=0000000000000003

Now a^b results are much better from compression point of view — they contain a lot of leading zeros, so they may be effectively packed into smaller number of bits. Verify the table and play with your own data on this page.

How to convert [0.0 … 1.1] series with step 0.1 to [0 … 11] series with step 1? Multiply by 10! Any floating-point series may be converted into integer series by multiplying by 10^N, where N is the maximum number of decimal digits after the point across all the values in the time series. The only problem is the result may exceed 64 bits — default integer size used in modern computers. How to deal with it? Normalize the integer by dividing by 10^M where M is the minimum value that allows fitting all the time series values into 64 bits and removing common trailing decimal zeros.

Why multiplying and dividing by 10^N instead of 2^N as in the standard floating-point encoding scheme used in modern computers? Because we are humans and prefer rounding metric values to decimal points, not binary points :) This opens up opportunities for better compression ratio as we saw above.

Gauges and Counters

  • Gauges — arbitrary values that may go up and down at any time — memory usage, cpu usage, speed, temperature, pressure, etc.
  • Counters — non-decreasing sequences such as total requests processed, total bytes read, total kilometers passed, etc.

Counters may be converted to Gauges by applying a single delta-encoding step. It substitutes absolute value by the speed the value grows. Since the speed is usually smaller than the absolute value, this reduces the number of bits required for storing the time series, thus improving compression ratio for Counters.

Using general-purpose compression

Conclusions

  • Converting floating-point values to integer values by applying 10^X multiplier.
  • Converting Counters to Gauges by applying delta-encoding.
  • Applying general-purpose compression algorithms on top of the encoded data.

These techniques give better compression ratio for VictoriaMetrics comparing to competitors — it compresses typical node_exporter time series data to 0.4 bytes per data point. This is 10x times better than 4 bytes per data point for the same data in Prometheus, which uses the original Gorilla compression algorithm.

P.S. Stay tuned — VictoriaMetrics will be open sourced soon. Both single-node and cluster versions :) In the mean time read other articles about VictoriaMetrics.

Update: we open-sourced VictoriaMetrics!

Image for post
Image for post

Follow us on Twitter 🐦 and Facebook 👥 and join our Facebook Group 💬.

To join our community Slack 🗣️ and read our weekly Faun topics 🗞️, click here⬇

Image for post
Image for post

If this post was helpful, please click the clap 👏 button below a few times to show your support for the author! ⬇

FAUN

The Must-Read Publication for Creative Developers & DevOps Enthusiasts

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store