VictoriaMetrics: how to migrate data from Prometheus

Roman Khavronenko
6 min readAug 12, 2020

The most important thing in evaluating any new technology is the ability to compare to something you already know. That's why evaluating VictoriaMetrics (hereinafter VM) as time series long-term storage is so easy for Prometheus users. It supports the same scraping protocol, the same configuration format, similar query language MetricsQL, and even more features to become a drop-in replacement for Prometheus. And historical data migration is an inevitable part of this process.

Photo by Craig Cloutier / CC BY-SA 2.0

Even though VM and Prometheus have a lot of common in terms of protocols and formats, the implementation is completely different. VM is a highly optimized TSDB written from scratch specifically to address performance issues while processing high volumes of time series data. It supports both "pull" and "push" protocols for data ingestion. So migrating data from Prometheus to VM is a process of converting and ingesting data via one of the supported protocols. For the data migrating process I'll use a simple tool called vmctl which does exactly this: reads Prometheus snapshot and ingests data into VM.

Prometheus snapshot

Unfortunately, I didn’t find any public Prometheus data that can be used as a reference for the migration process. So I had to create my own with the following config:

The rules.yml file consists of node exporter specific recording rules borrowed from GitLab and node exporter examples to generate more time series.

Before starting migration we need to take a snapshot of Prometheus data. The concept and importance of making snapshots before doing any manipulations with data is well described here. Prometheus was scraping data for few days and collected about 320MB of data:

du -d1 -h snapshots
320M snapshots/20200806T072444Z-365a858149c6e2d1

So let’s try to migrate data from the snapshot to VM via vmctl tool!

Vmctl

Vmctl supports data migration from InfluxDB, Prometheus and Thanos. Actually, Thanos and Prometheus are using the same storage engine under the hood and as result, they have the same data layout on disk. Since snapshots are just hard links to the data folder — there is no difference for vmctl which snapshot it reads: Prometheus or Thanos snapshot.

To start the migration process we need to specify the following flags:

  • — prom-snapshot — file path to Prometheus snapshot;
  • — vm-addr — address of running VictoriaMetrics service (localhost:8428 by default).

I’ll be running migration on my 13" mbp-pro where I have a single-node version of VM (v1.39.4), vmctl binary (v0.1.1) and copied Prometheus snapshot. VM is started with only one extra flag:

./bin/victoria-metrics --selfScrapeInterval=1s

Flag ` — selfScrapeInterval` makes VM to scrape its own metrics so we can analyze them later. Let’s run the vmctl:

./bin/vmctl prometheus --prom-snapshot=snapshots/20200806T072444Z-365a858149c6e2d1
Prometheus import mode
Prometheus snapshot stats:
blocks found: 7;
blocks skipped: 0;
min time: 1596349620000 (2020-08-02T07:27:00+01:00);
max time: 1596698684531 (2020-08-06T08:24:44+01:00);
samples: 228164332;
series: 24248.
Filter is not taken into account for series and samples numbers.
Found 7 blocks to import. Continue? [Y/n] y

As the first step, vmctl provides the snapshot stats and waits for user permission to proceed. We see that “few days” of Prometheus scraping is actually a 228164332 samples for 24248 series. The number of series was almost constant all the time since there was no churn in time series. Let’s proceed with answering Y:

7 / 7 [--------------------------------------------] 100.00% 0 p/s
2020/08/11 20:06:29 Import finished!
2020/08/11 20:06:29 VictoriaMetrics importer stats:
idle duration: 3.328381194s;
time spent while importing: 1m26.482662623s;
total samples: 228164332;
samples/s: 2638266.74;
total bytes: 4.5 GB;
bytes/s: 52.3 MB;
import requests: 1045;
import requests retries: 0;
2020/08/11 20:06:29 Total time: 1m29.745175897s
./bin/vmctl prometheus 139.39s user 50.16s system 211% cpu 1:29.77 total

As unit in progress bar, vmctl uses Prometheus data blocks, so it may not be that responsive while migrating huge ones. It is so because time series filtering is done on-flight while reading data from the snapshot. And it is hard to say in advance how much data is left to import.

During migration vmctl was ingesting on average 2.6kk samples/s and the whole process took 1m29s to migrate 320MB snapshot. More details about printed stats and performance tuning may be found here. For now, when both vmctl and VM service are on the same machine and no network involved —I'll try to tweak it a bit by increasing importer concurrency:

./bin/vmctl prometheus --prom-snapshot=snapshots/20200806T072444Z-365a858149c6e2d1 --vm-concurrency=8
...
2020/08/11 20:11:40 Import finished!
2020/08/11 20:11:40 VictoriaMetrics importer stats:
idle duration: 7.245308365s;
time spent while importing: 53.683616464s;
total samples: 228164332;
samples/s: 4250166.94;
total bytes: 4.5 GB;
bytes/s: 84.2 MB;
import requests: 1046;
import requests retries: 0;
2020/08/11 20:11:40 Total time: 55.485859442s
./bin/vmctl prometheus --vm-concurrency=8 191.30s user 22.18s system 384% cpu 55.525 total

Now it is 4.2kk samples per second with importer concurrency set to 8. To scale it further I'd recommend to increase concurrency to number of available CPU cores for VM service. If with this setting VM does not utilize all available CPU cores during the import — try to increase --prom-concurrency or even spawn multiple vmctl processes on separate instances to avoid resource contention.

Let’s check the VM metrics during the import. I’ll use Grafana configured with Prometheus datasource pointed to VM address and the official Grafana dashboard for VM:

VictoriaMetrics stats during import. Ingestion speed.

We see that all import requests were served via `/api/v1/import` endpoint. See more about ways how to import data into VM here. Let’s check the disk usage now:

VictoriaMetrics stats during import. Disk usage.

So in total we see that the imported snapshot occupies about 73MB of disk space with avg 0.346 bytes per sample. Let’s verify this by checking VM data folder (set by flag " — storageDataPath"):

du -d2 -h victoria-metrics-data
424K victoria-metrics-data/cache/rollupResult
1.9M victoria-metrics-data/cache/metricName_tsid
1.8M victoria-metrics-data/cache/metricID_metricName
524K victoria-metrics-data/cache/metricID_tsid
4.6M victoria-metrics-data/cache
0B victoria-metrics-data/snapshots
0B victoria-metrics-data/indexdb/snapshots
504K victoria-metrics-data/indexdb/162A4C7CA766B94A
4.0K victoria-metrics-data/indexdb/162A4C7CA766B949
508K victoria-metrics-data/indexdb
0B victoria-metrics-data/data/big
74M victoria-metrics-data/data/small
74M victoria-metrics-data/data
79M victoria-metrics-data

The "/cache" folder contains various temporary caches for optimizing query speed which could be cleaned up at any time. It is not empty because some queries were already served for Grafana to plot the dashboard.

The "/indexdb" folder contains inverted index for all stored time series. The index is used for speeding up lookups while searching or ingesting data. The size of "/indexdb" depends on the number of unique time series stored in VM.

Folder "/data" contains the actual time series values. It is split in two subfolders: “/data/small” and “/data/big”. These are parts of VM internal structure MergeSet and define the merging stage. While accepting and processing new data, VM merges previously saved data parts into bigger ones. The merging process significantly improves compression, reading speed and reduces the number of files on disk.

As we see, current compression ratio is quite good comparing to Prometheus:

  • (73MB + 424KB) / 228164332 = 0.32B per sample in VM
  • 320MB / 228164332 = 1.4B per sample in Prometheus

Now let’s verify that imported data is actually accessible in VM. Since most of our scrape targets were node exporter jobs - let's install the node exporter dashboard in Grafana:

Node exporter dashboard

We see that metrics from the snapshot are queryable and time range is exactly what we’ve seen in snapshot stats. Try it yourself and use vmctl to migrate historical data from Prometheus to VM. See also backfilling docs for more details and common gotchas.

See more about compression improvements and time series filtering via vmctl in the Filtering and modifying time series.

--

--

Roman Khavronenko

Distributed systems engineer. Co-founded VictoriaMetrics.