VictoriaMetrics: how to migrate data from Prometheus. Filtering and modifying time series.

Roman Khavronenko
6 min readAug 24, 2020

Previous story VictoriaMetrics: how to migrate data from Prometheus explains how to use vmctl tool for migrating Prometheus snapshots into VM. But some details like time series filtering and other specifics remain uncovered. This story will give more details about relabeling, compression, filtering time series by time and labels.

Photo by eflon / Attribution 2.0 Generic (CC BY 2.0)

Relabeling

Vmctl itself does not provide labels management functionality because it is already supported by VM itself. User can define relabeling rules for all ingested metrics by specifying --relabelConfig flag. For example, to change metric up{instance="localhost:9090"} to up{instance="localhost", port="9090"} create relabel.yml file with following configuration:

And point VM to read the configuration file:

./bin/victoria-metrics --relabelConfig relabel.yml

According to the configuration, every ingested time series which satisfies the specified regular expression will be automatically updated with new label port. See it yourself by manually sending new time series via Prometheus import format:

curl -d 'requests_total{instance="localhost:9090"} 123' -X POST 'http://localhost:8428/api/v1/import/prometheus'                                        curl -G 'http://localhost:8428/api/v1/export' -d 'match=requests_total'{"metric":{"__name__":"requests_total","instance":"localhost","port":"9090"},"values":[123],"timestamps":[1598089314604]}

So if you need to relabel time series before migration or backfilling — just configure VM with according relabeling configuration.

Filtering by time

Vmctl uses Prometheus TSDB package for reading Prometheus snapshots. The package is well documented and simple to use. Besides reading time series from the snapshot it also allows to filter them by time or label values.

Filtering time series by time in vmctl is controlled by following two flags:

--prom-filter-time-start - The time filter to select timeseries with timestamp equal or higher than provided value. E.g. '2020-01-01T20:07:00Z'
--prom-filter-time-end - The time filter to select timeseries with timestamp equal or lower than provided value. E.g. '2020-01-01T20:07:00Z'

Let's try to filter time series from the snapshot in previous story by time:

./bin/vmctl prometheus --prom-snapshot=snapshots/20200806T072444Z-365a858149c6e2d1 --prom-filter-time-start=2020-08-02T00:00:00Z --prom-filter-time-end=2020-08-03T00:00:00Z
Prometheus import mode
Prometheus snapshot stats:
blocks found: 7;
blocks skipped by time filter: 5;
min time: 1596349620000 (2020-08-02T07:27:00+01:00);
max time: 1596607200000 (2020-08-05T07:00:00+01:00);
samples: 168012658;
series: 6943.
* Stats numbers are based on blocks meta info and don't account for applied filters.

Note that the total amount of blocks in the provided snapshot is 7 but 5 of them are marked as skipped because of specified time filter. Every block contains a meta information with max and min timestamps in it, so vmctl can filter blocks that are out of a specified time range in advance without actually reading the data. Nevertheless, time filter will be applied again during the import time to filter out samples inside the blocks as well. Because of that, stats info from the message above doesn't show the exact number of samples and series to import.

Filtering by labels

To filter time series in the snapshot by labels vmctl provides two flags:

--prom-filter-label - Prometheus label name to filter timeseries by. E.g. '__name__' will filter timeseries by name.
--prom-filter-label-value - Prometheus regular expression to filter label from "prom-filter-label" flag. (default: ".*")

The snapshot we use contains a set of time series generated by recording rules. The convention for rules naming implies separating aggregation level, metric and operation type via colon mark — level:metric:operation. Let's take advantage of that and migrate recording rules only:

./bin/vmctl prometheus --prom-snapshot=snapshots/20200806T072444Z-365a858149c6e2d1 --prom-filter-label='__name__' --prom-filter-label-value='.*:.*'
Prometheus import mode
Prometheus snapshot stats:
blocks found: 7;
blocks skipped by time filter: 0;
min time: 1596349620000 (2020-08-02T07:27:00+01:00);
max time: 1596698684531 (2020-08-06T08:24:44+01:00);
samples: 228164332;
series: 24248.
* Stats numbers are based on blocks meta info and don't account for applied filters.
Found 7 blocks to import. Continue? [Y/n] y
7 / 7 [------------------------------------------] 100.00% 2 p/s
2020/08/23 11:08:25 Import finished!
2020/08/23 11:08:25 VictoriaMetrics importer stats:
idle duration: 291.517073ms;
time spent while importing: 3.105398727s;
total samples: 6639476;
samples/s: 2138043.00;
total bytes: 172.6 MB;
bytes/s: 55.6 MB;
import requests: 32;
import requests retries: 0;
2020/08/23 11:08:25 Total time: 4.403631893s

The import process was quite fast and this is expected — only 6639476 samples out of 228164332 were imported and the rest was filtered out by specified label filter. So let's query imported series in Grafana:

Seems like everything is correct and only recording rules were imported into VM. The same approach may be used to import a subset of time series for a specific job or specific prefixes in the name.

Significant figures

In the previous example we imported only recording rules from the snapshot and number of imported samples was 6639476 — which is (6639476*100) / 228164332 = 2.9% of all samples in the snapshot. Let's check occupied disk space by VM for storing time series for recording rules only:

du -d2 -h victoria-metrics-data
0B victoria-metrics-data/snapshots
0B victoria-metrics-data/indexdb/snapshots
4.0K victoria-metrics-data/indexdb/162DDEC63073CF09
32K victoria-metrics-data/indexdb/162DDEC63073CF0A
36K victoria-metrics-data/indexdb
0B victoria-metrics-data/data/big
7.8M victoria-metrics-data/data/small
7.8M victoria-metrics-data/data
7.8M victoria-metrics-data

The compression ratio for imported samples is 7.8MB / 6639476 = 1.2B per sample and it is not as good results as in the previous story where we had 0.32B per sample. From those results we know, that the whole snapshot after being imported into VM occupies ~74MB of disk space. And now we see that 2.9% of samples occupy 7.8MB which is 10.5% of the snapshot. But why recording rules samples are so expensive? Let's check what is stored in one of the recording rules which we imported:

curl -G 'http://localhost:8428/api/v1/export?start=1596458059&end=1596458059' -d 'match=instance:cpu_utilization:ratio_avg{job="node_exporter_1"}'
{"metric":{"__name__":"instance:cpu_utilization:ratio_avg","job":"node_exporter_1","instance":"localhost:9100"},"values":[0.05055757575781,0.05058181818236,0.05067878787913,0.05083030303065,0.05109090909107,0.05139393939439,0.05166060606074,0.05195151515192,0.05221212121262,0.05255151515177,0.05289090909092,0.05315151515134,0.05050303030279,0.04776363636338,0.044975757576],"timestamps":[1596458071478,1596458091478]}

Most compression algorithms detect patterns and sequences in data in order to encode it using fewer bits than original representation. For example, cumulative counters could be compressed via delta encoding because every next counter value is larger than previous. In our case, values like 0.05055757575781 are the result of floating point arithmetic and measurement errors like in the following example:

Moreover, such values likely showing false precision, result into unexpected issues and overconfidence in the accuracy:

Such values heavily impact the compression ratio according to information theory because of the need to store meaningless figures. And even more important, such values are hard for a human to interpret or even tell immediately which value 0.05055757575781 or 0.05058181818236 is bigger or smaller without spending seconds to compare digits one by one. If you ask your colleague or friend to pronounce one of the numbers above then with high probability value will be rounded to first 3 or 5 figures because the rest isn't really that important to mention.

In order to improve compression, vmctl allows to limit the number of significant figures before importing. For example, number of significant figures can be limited to 8 by specifying flag --vm-significant-figures=8. Then, value 0.05055757575781 will be rounded to 0.050557576 and when import process is finished — the same export query will return the following results:

curl -G 'http://localhost:8428/api/v1/export?start=1596458059&end=1596458059' -d 'match=instance:cpu_utilization:ratio_avg{job="node_exporter_1"}'
{"metric":{"__name__":"instance:cpu_utilization:ratio_avg","job":"node_exporter_1","instance":"localhost:9100"},"values":[0.050557576,0.050581818,0.050678788,0.050830303,0.051090909,0.051393939,0.051660606,0.051951515,0.052212121,0.052551515,0.052890909,0.053151515,0.05050303,0.047763636,0.044975758],"timestamps":[1596458071478,1596458091478,1596458111478,1596458131478,1596458151478,1596458171478,1596458191478,1596458211478,1596458231478,1596458251478,1596458271478,1596458291478,1596458311478,1596458331478,1596458351478]}

Limiting number of significant figures for recording rules samples improved compression ratio to 5.6MB / 6639476 = 0.84B per sample:

du -d2 -h victoria-metrics-data                                                                      
0B victoria-metrics-data/snapshots
52K victoria-metrics-data/indexdb/162DF6472D579ACA
4.0K victoria-metrics-data/indexdb/162DF6472D579AC9
0B victoria-metrics-data/indexdb/snapshots
56K victoria-metrics-data/indexdb
0B victoria-metrics-data/data/big
5.6M victoria-metrics-data/data/small
5.6M victoria-metrics-data/data
5.6M victoria-metrics-data

Worth to mention, that there are real-world scenarios where high precision is very important, measured and calculated carefully to eliminate possible errors. From my personal experience, such cases are rare and often require some specific software to keep accuracy and precision on high level. But when you need to store just ordinary time series for CPU and memory usage — applying --vm-significant-figures is safe enough and will result into much higher compression ratio, will save some extra disk space after the migration and eliminate false precision. The most common case for using this flag is to reduce number of significant figures for time series created by aggregations such as average, rate, etc. Please note, that vmagent also can be configured with this flag for all ingested or scraped time series.

--

--

Roman Khavronenko

Distributed systems engineer. Co-founded VictoriaMetrics.