Time of day based notifications with Prometheus and Alertmanager

Tom Fawcett
3 min readNov 18, 2018

--

Prometheus + Alertmanager is a powerful combination for monitoring. One feature it lacks, however, is proper support for time of day based notifications.

Whilst this feature is still outstanding (GitHub issue) I have been working around it with the following solution.

Update: new readers may instead wish to investigate mute_time_interval

The problem

I want to prevent low priority alerts being sent at nighttime.

Prometheus has a time() function. However, it will only ever return the time in UTC. Being in the UK (specifically timezone Europe/London) this works fine some of the year. But once British Summer Time (BST) takes effect, any solution relying solely on this function would need manual adjustment.

Fortunately, Prometheus’ Query Language (PromQL) is sufficiently powerful to remove the need for manual adjustment.

The solution

First a reminder of the definition of European Summer Time, which BST follows:

European Summer Time is the variation of standard clock time that is applied in most European countries … in the period between spring and autumn, during which clocks are advanced by one hour from the time observed in the rest of the year…

European Summer Time begins at 01:00 UTC … on the last Sunday in March and ends at 01:00 UTC … on the last Sunday in October each year

source

We can follow this definition in PromQL like so:

- record: is_european_summer_time
expr: |
(vector(1) and (month() > 3 and month() < 10))
or
(vector(1) and (month() == 3 and (day_of_month() - day_of_week()) >= 25) and absent((day_of_month() >= 25) and (day_of_week() == 0)))
or
(vector(1) and (month() == 10 and (day_of_month() - day_of_week()) < 25) and absent((day_of_month() >= 25) and (day_of_week() == 0)))
or
(vector(1) and ((month() == 10 and hour() < 1) or (month() == 3 and hour() > 0)) and ((day_of_month() >= 25) and (day_of_week() == 0)))
or
vector(0)

This expression works through each of the four tests, returning 1 if the test is true. If it reaches the bottom it returns 0.

A rough verbal description of the expression:

  1. Is the month after March but before October?
  2. Is the month March, and the day after the last Sunday?
  3. Is the month October, and the day before the last Sunday?
  4. Is the month October, the day the last Sunday, and the time before 01:00? Or is the month March, the day the last Sunday, and the time after or equal to 01:00?

If none of the above are true then it is not European Summer Time.

I’ll note at this point that though this expression has worked well for me it could probably be refactored.

With the difficult bit out of the way, we can then use our new is_european_summer_time metric to convert the UTC function time() into a Europe/London version:

- record: europe_london_time
expr: time() + 3600 * is_european_summer_time

We then use this to create a Europe/London version of hour() :

- record: europe_london_hour
expr: hour(europe_london_time)

And finally (for the Prometheus configuration) we create an alert that fires during our desired night period:

- alert: QuietHours
expr: europe_london_hour >= 23 or europe_london_hour <= 6
for: 1m
labels:
notification: page
severity: critical
annotations:
description: 'This alert fires during quiet hours. It should be blackholed by Alertmanager.'

The final pieces of the solution are on the Alertmanager side.

We prevent the QuietHours alert from ever reaching a real receiver:

routes:
- match:
alertname: QuietHours
receiver: blackhole
receivers:
- name: blackhole

but still use it to inhibit, and therefore prevent the sending of, warning level alerts:

inhibit_rules:
- source_match:
alertname: QuietHours
target_match:
severity: warning

Conclusion

Though Prometheus + Alertmanager may currently lack proper support for time of day based notifications, PromQL is powerful enough to compensate.

This example showed a simple inhibition based solution designed to work for timezone Europe/London. However, the principles used should transfer to other solutions and timezones.

--

--