Monitor your Cloud Storage with Prometheus and `dd` command

How to monitor your OpenStack or RHEV network storage when you don’t have access to the storage.

Grafana panel for RHEV Storage

In our test lab we manage our own RHEV installation (RHEV = RedHat Enterprise Virtualization). The set up gives us a lot of flexibility to create and destroy several dozen VMs at will to fit our needs.

The network storage device that provides disk storage for the VMs however, doesn’t give us any ability to monitor its well-beings directly besides using the provided proprietary application. I would like to have a single Grafana dashboard to monitor our lab infrastructure and one remaining item is the network storage health panel.

While we can’t access the network storage device’s health-telemetry directly, we can measure VM disk I/O which would give us a very good indicator of the storage performance.

Here is a summary of the recipe:

  1. Create a VM with ‘tiny’ or ‘micro‘ profile. Install your OS of choice.
  2. Install Prometheus node_exporter.
  3. Set up a cronjob to create a large file at an interval.
  4. In Prometheus config yml file create a new job to monitor your new VM.
  5. On Grafana dashboard create a graph for the endpoint’s CPU iowait.
Iowait spikes corresponding to the cronjob schedule

Cron job

Make sure you create a file as large as your VM RAM and oflag=dsync to eliminate side effect of I/O caching.

# Create a 1GB file every 25 minutes
# cat /var/spool/cron/root
*/25 * * * * dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync

Optional Alert

I set up this rule to receive an alert when average iowait > 40%. You can tune the averaging duration and iowait threshold to better suit for your environment.

ALERT Slow_Storage
IF avg by (instance) (rate(node_cpu{job="storage-perf", mode="iowait"}[1m])) > 0.40
FOR 10m
LABELS {
severity="page"
}
ANNOTATIONS {
summary = "CPU iowait on {{ $labels.instance }} > 40% for 10 minutes"
}

Happy monitoring!