Monitoring of multiple OpenShift clusters with VictoriaMetrics

Rafal Szypulka
3 min readJul 1, 2021

--

In this article, I will describe a simple setup of centralized monitoring based on VictoriaMetrics for a couple of OpenShift clusters.

VictoriaMetrics is a fast and cost-effective time-series database that supports PromQL and even extends it. It can collect data from Prometheus instances using remote write API and also from other sources using protocols like InfluxDB, Graphite, and some others. Check VictoriaMetrics documentation for more details about supported data inputs.

I deployed VictoriaMetrics on a single Ubuntu VM and configured remote write output of all Prometheus instances in order to forward all metrics from all my clusters to the central VictoriaMetrics instance.

VictoriaMetrics setup

VictoriaMetrics has been deployed using Docker Compose on a single Ubuntu VM with 8 GB RAM, 4 CPU cores, and 500GB disk. Besides the core VictoriaMetrics container, I deployed also a vmagent — a data collection component that will be used for self-monitoring of the VictoriaMetrics server, Grafana for data visualization, and node_exporter for the local OS monitoring. You can find the Docker Compose config here.

Prometheus Operator setup

Prometheus on the OpenShift platform is managed by the Cluster Monitoring Operator and Prometheus Operator. Customizations to OCP Prometheus are very limited and can be done via cluster-monitoring-config ConfigMap.

The configuration is simple: edit or create (if it doesn’t already exist) cluster-monitoring-config ConfigMap. Add externalLabels that will identify the cluster and define the remoteWrite endpoint i.e. our VictoriaMetrics TSDB.

apiVersion: v1
metadata:
name: cluster-monitoring-config
namespace: openshift-monitoring
data:
config.yaml: |
prometheusK8s:
externalLabels:
cluster: labdev
type: dev
remoteWrite:
- url: "http://victoriametrics-url:8428/api/v1/write"
kind: ConfigMap

Deduplication

The Cluster Monitoring Operator on OpenShift creates two Prometheus replicas that act as a HA pair and each of both instances scrapes the same set of targets. After we enabled the remoteWrite setting, each of the replicas was configured to forward metrics to VictoriaMetrics. Generally, VictoriaMetrics can handle the deduplication of metrics, but it requires the same set of externalLabels for each replica. Prometheus Operator internally adds an additional prometheus_replica label with a unique value per replica. Unfortunately, this behavior cannot be currently changed for Prometheus managed by the OpenShift Cluster Monitoring Operator, but we can drop this label at VictoriaMetrics (thanks to Roman Khavronenko for pointing this out), using the --relabelConfig option, and the following config:

- action: labeldrop
regex: "prometheus_replica"

Note that, generally available version of Prometheus Operator, which you can deploy from the OpeartorHub on OpenShift, can be configured using prometheus.prometheusSpec.replicaExternalLabelNameClear: true. It disables prometheus_replica label, so deduplication in VictoriaMetrics should work without relabeling.

Grafana dashboards

Grafana has been configured to query data directly from the VictoriaMetrics API which is compatible with Prometheus query API.

OpenShift comes with several Grafana dashboards that can be easily modified in order to support multiple clusters. External labels we defined as an additional Prometheus Operator configuration are added as labels to every metric forwarded to VictoriaMetrics. We can use it for configuring an additional template variable within the dashboard and filters in dashboard panel queries. You can find an example here.

OpenShift Grafana dashboard with additional cluster dropdown menu

Conclusion

My VictoriaMetrics instance was configured to keep the last 30 days of data which is about 330 billion stored data points with 2,5 million active time series. I’ve been running a VictoriaMetrics instance for over two months and it works stable without any performance degradation. The resource usage and data compression are outstanding. VictoriaMetrics is one of the most interesting options for centralized long-term storage for Prometheus metrics.

Learn how IBM can help in your journey to cloud and AI at ibm.com/garage.

--

--