Persistent Data Structures in VictoriaMetrics (Part 1): vmagent

Zhu Jiekun
DevOps.dev
Published in
4 min readMay 3, 2024

--

Series Introduction

VictoriaMetrics is an open-source, high-performance time series database. It serves as an alternative to Prometheus and also the long-term storage solution of Prometheus and has been adopted by many companies in production. While VictoriaMetrics provides detailed documentation and examples, there is a lack of discussion regarding how it persists data on disk.

Given the increasing number of active members in the community, it would be beneficial to have a blog post that serves as a bridge between the user guide and the source code, helping individuals make more meaningful contributions.

This series aims to provide insights into how VictoriaMetrics organizes and operates on-disk data. It does not require any prior knowledge of the Go programming language. However, it is good to have a basic understanding of VictoriaMetrics’ components.

1. Persistent Data Structures in VictoriaMetrics (Part 1): vmagent
2. Persistent Data Structures in VictoriaMetrics (Part 2): vmselect

vmagent Intro

vmagent is a lightweight agent that scrapes exposed metrics, similar to how Prometheus does. It can also serve as a receiver, capable of accepting time-series data in various protocols, including Prometheus remote-write and InfluxDB line.

vmagent also modify data by relabeling, then remote write them to VictoriaMetrics or any compatible targets.

FastQueue

Now, let’s dive into vmagent. The scraped data is initially appended to the remote-write context (not shown in the diagram). Once a certain amount of data has accumulated or after a specified time interval (default 1 second), it will be flushed to the FastQueue of the respective remote-write context. Each remote-write context also manages multiple worker goroutines that consume data from the FastQueue and send it to the remote-write target.

Ideally, the FastQueue sends data to the worker through channel, which is a data structure in Go that serves as in-memory queues. However, due to limited memory resources, it cannot buffer a large amount of data when the remote-write target is unavailable. In such cases, the FastQueue will write temporary data to disk for buffering.

We can easily come to the conclusion:

  • Each remote-write configuration is represented by a remote-write context.
  • Each remote-write context contains a FastQueue and multiple workers.
  • Each FastQueue manages an in-memory channel and an on-disk file (folder).

On-Disk Data Structure

If you start a vmagent, you will be able to see the folders it creates for each remote-write context. For example:

./vmagent-remotewrite-data
└── persistent-queue
├── 1_E3C1E1E1733E59E4
│ ├── 0000000000000000
│ ├── flock.lock
│ └── metainfo.json
└── 2_740390E5C841CCAC
├── 0000000000000000
├── flock.lock
└── metainfo.json

vmagent generates the folder name based on the index (sequence) and URL of each remote-write context:

 ...
// Hash value of `URL`. e.g.: E3C1E1E1733E59E4
h := xxhash.Sum64([]byte(URL.String()))

// Index + Hash value. e.g.: 1_E3C1E1E1733E59E4
queuePath := filepath.Join("./vmagent-remotewrite-data", "persistent-queue", fmt.Sprintf("%d_%016X", argIdx+1, h))
...

The folder name always remains the same unless the index and URL are modified. In the scenario that vmagent is accidentally exited and restarted, it can read from the same folder and resume processing unfinished remote-write data.

Questions:

1. What happens if vmagent is exited and restarted with one remote-write URL modified?

2. What happens if we remove one remote-write config instead of modifying it in Question 1?

3. What is the difference between removing the first remote-write config and the last remote-write config in Question 2?

Each remote-write folder contains three files: one (0000000000000000) for the data, one for the metadata (metainfo.json), and one for the lock (flock.lock).

The data file holds pending data in []byte format for remote write, but they are not just the serialized time series data. vmagent supports both the Snappy and the zstd compression algorithms. It’s determined either by a flag or through automatic negotiation during startup. Before being sent to the FastQueue, the remote-write data needs to undergo the following steps:

  • Serialization into []byte based on the remote-write protocol.
  • Compression into a new []byte using either the Snappy compressor or the zstd compressor.

Questions:

Assuming the remote-write target does not support the zstd compression algorithm, vmagent will send the data compressed with the Snappy algorithm. Now:

1. The Remote-Write target goes offline for a 10-minute upgrade, and vmagent start buffering the data to local files.

2. The Remote-Write target completes the upgrade and now supports the new zstd compression algorithm.

In this case:

1. What algorithm-compressed data will the remote-write target receive?

2. Assuming that vmagent is also restarted after the remote-write target restarts, what algorithm-compressed data will the remote-write target receive?

Further Reading

You can find the corresponding codes for:

--

--