Configure AlertManager receivers in Verrazzano

Julian OI
Verrazzano
Published in
7 min readApr 11, 2023

--

Verrazzano includes Prometheus, Grafana and AlertManager to assist sysadmins identifying trends that could eventually evolve into issues. Remember that in Tech, everything can fail and things can go south fast. In this blog, I will be describing the process on how to setup AlertManager to send notifications via Email to alert Developers and/or Operators when something has fallen out of place in your application or infrastructure.

Prometheus pulls metrics from different components in Verrazzano including, but not limited to Kubernetes Worker Nodes, Kubernetes API endpoints, Application metrics and so on. You can define rules and be alerted when a resource’s behavior breaches your Business policies. When this happens, Prometheus will push alerts to AlertManager, which in turn, determines who should be the receiver of a notification coming in and how to appropriately route it. For our test case, I will be showing you how to configure an SMTP receiver in a Verrazzano alert notification pipeline.

Enough Intro, let’s get into action.

Verrazzano configures Prometheus and AlertManager using the community kube-prometheus-stack Helm chart . Chart’s values are defined in your cluster’s Verrazzano Custom Resource under prometheusOperator component. The example below shows a fully working AlertManager configuration, declared as override values, passed internally to the kube-prometheus-stack chart in Verrazzano.

apiVersion: install.verrazzano.io/v1beta1
kind: Verrazzano
metadata:
name: verrazzano
namespace: default
spec:
components:
argoCD:
enabled: true
prometheus:
enabled: true
prometheusOperator:
overrides:
- values:
alertmanager:
enabled: true
alertmanagerSpec:
podMetadata:
annotations:
sidecar.istio.io/inject: "false"
config:
global:
resolve_timeout: 15m
smtp_auth_password: <YOUR PASS>
smtp_auth_username: ocid1.user.oc1..aaaaaaaae
smtp_from: vzalerts@acme.com
smtp_hello: prometheus.vmi.system.XXXX
smtp_smarthost: smtp.email.oci.oraclecloud.com:587
receivers:
- name: "null"
- name: oci_smtp
email_configs:
- to: sysadmin1@acme.com
route:
group_by:
- alertname
receiver: oci_smtp
repeat_interval: 1h
routes:
- match:
alertname: Watchdog
receiver: "null"
profile: dev

You can edit the Verrazzano CR in your cluster directly — kubectl edit vz — to match above snippet or apply it through the kubectl apply command line. Either way, it will get you a simple SMTP receiver configured in AlertManager.

Let’s break it into smaller chunks to explain what each section does.

First, AlertManager component in Verrazzano 1.5 (latest version at this time) is not enabled by default. To enable it, set “enabled: true” inside prometheusOperartor component. Then, it needs to be excluded from the Istio Service Mesh, as it needs to communicate with Prometheus directly. Setting “sidecar.istio.io/inject: “false” will do the trick. Oh by the way, Verrazzano uses Istio Service Mesh enabled with mTLS to secure all traffic with Applications and Verrazzano System components.

prometheusOperator:
overrides:
- values:
alertmanager:
enabled: true
alertmanagerSpec:
podMetadata:
annotations:
sidecar.istio.io/inject: "false"

Managing alert recipients in AlertManager is accomplished by defining a route section. All incoming Alerts enter the root route . Then the path to a child routes (receiver) is chosen based on a set of filters/matchers. You can define a single route block to catch all alerts or configure multiple child routes block, with different matching criteria, to deliver notifications through different channels.

In this case, I am opting for a single route block, grouped by alertname, to be sent to a receiver under the attribute name:oci_smtp. In the next section we have to define what type of receiver is oci_smtp. But you can guess based on the name

route:
group_by:
- alertname
receiver: oci_smtp
repeat_interval: 1h
routes:
- match:
alertname: Watchdog
receiver: "null"

Note that Alertmanager has a prebuilt alert “Watchdog” to confirm that the entire notification pipeline is working as expected. It needs to be included. By pairing it with a receiver “null”, we avoid this always-firing alert to be sent out.

Receivers

We have linked a route with a receiver. Now, oci_smtp receiver details must be described under “receivers” section. Receivers can reach recipients by email (SMTP), SMS, Slack or Webhooks for custom alerting. We just need to declare its settings within a <service>_configs section.

In our example, oci_smtp single receiver is defined by adding “receivers.name: oci_smtp”. It will be identified as an SMTP reciever by appending “email_configs” attribute and listing one or multiple Email recipients in “receivers.name.email_configs.to[]”.

You can define multiple SMTP receivers by providing each one a unique name for different types of alerts, each receiver can have its own config.receivers.email_configs section. More details about routes, matchers, and all possible combinations in AlertManager public documentation.

global:
resolve_timeout: 15m
smtp_auth_password: <YOUR PASS>
smtp_auth_username: ocid1.user.oc1..aaaaaaaae
smtp_from: vzalerts@acme.com
smtp_hello: prometheus.vmi.system.XXXX
smtp_smarthost: smtp.email.oci.oraclecloud.com:587
receivers:
- name: "null"
- name: oci_smtp
email_configs:
- to: sysadmin1@acme.com

Because we are sending notifications using a single SMTP server, all of its SMTP Service properties were added to config.global section instead of adding them under “config.receivers.name.email_configs”. All global.smtp_* configuration attributes expect standard SMTP Server properties. SMTP user, password, an authorized/approved email sender account — vzalerts@acme.com — , smtp_smarthost as your SMTP server. I have set those values according to my OCI Email Delivery Service. I tested OCI Email Delivery service because it is quick to setup and includes all features email providers offers to send secure emails. See Email Delivery documentation for more details.

Next, configure a Prometheus rule to test Alerts in Verrazzano. My suggestion is to start with a simple rule, then progressively customize it until you have your requirement completely covered. With that in mind, this step configures a simple alert in Prometheus to identify resources that are unavailable. The resulting rule defined as “expr: up == 0” uses Prometheus own expression language

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
release: prometheus-operator
name: prometheus-operator-test
namespace: verrazzano-monitoring
spec:
groups:
- name: mayday_mayday_server_down
rules:
- alert: mayday_service_down
expr: up == 0
for: 1m
labels:
serverity: page
annotations:
summary: Mayday Mayday. Service is Down

Create a YAML file with the contents above and kubectl-apply’ed it in your Verrazzano Cluster. Make sure the rule is created in verrazzano-monitoring namespace.

Then, get your Prometheus endpoint from “vz status” output. Authenticate with Verrazzano’s Single-Sign-On user and go to Alerts section. You should see a new “mayday_service_down” Alert. Initially, when no service is down, mayday_service_down alert will be label as “Inactive”.

To confirm that your rule syntax is valid, you can type it in the expression search bar in Prometheus. Give it a go by typing “up”. Notice that a number next to each resource that matches up alert will display a number next to it. Number “1” means resource is running and “0” for resources not running. This is why in the alert created before the expression looks for up == 0

If the rule finds a match, Prometheus will move it to Pending state.

An alarm will remain in pending state, if matching criteria remains valid for a given time and no remedy action has been taken. For mayday_serivice_down alert, we set the attribute for: 1m which will give an operator 1 minute to fix it. Go grab a coffee or you will have to stare at your screen for 1 minute before it changes state to “Firing” from “Pending”. At this point, you should see a new incoming email in your inbox!!!

Eventually, you may start getting overwhelmed by the number of emails you may have received. So, I would suggest as a next step to add group_interval, and repeat_interval to control the rate and number of notifications in alertmanager.config.route spec.

# How long to wait before sending a notification about new alerts that
# are added to a group of alerts for which an initial notification has
# already been sent. (Usually ~5m or more.)
[ group_interval: <duration> | default = 5m ]

# How long to wait before sending a notification again if it has already
# been sent successfully for an alert. (Usually ~3h or more).
[ repeat_interval: <duration> | default = 4h ]

To sum up, Verrazzano gives you the option to notify your teams when something unexpected has happened. You do not have to go trough a lengthly process to install and configure any additional notification software. Just use AlertManger that is bundled with you Verrazzano Cluster. It takes a few steps for Teams to get alerts.

Furthermore, if you have requirements to customize Rules and Receivers further, remember that you can rely on experiences published by OpenSource communities around Prometheus, AlertManager, etc. Verrazzano honors those configurations because it installs OpenSource products. It is as simple as applying configurations in the Verrazzano Custom Resource and Verrazzano Operator will take care of the rest. Nothing else needed. Your team will be enabled to take remedial actions faster.

Have a good one !

Troubleshooting

Check prometheus-operator-kube-p-operator pod logs in verrazzano-monitoring namespace to find out what it may have been missed configuring the service.

Enhance Security

The most recent Prometheus Operator version added support for smtp_auth_password_file. With this attribute, AlertManager can read smtp password from a secret.

apiVersion: v1
kind: Secret
metadata:
name: alertmanager-config
namespace: verrazzano-monitoring
type: Opaque
data:
smtp_password: <base64 Encoded Password>

Configure alertManager to replace smtp_auth_password with clear plain text and set smtp_auth_password_file instead.

prometheusOperator:
overrides:
- values:
alertmanager:
enabled: true
alertmanagerSpec:
podMetadata:
annotations:
sidecar.istio.io/inject: "false"
secrets:
- alertmanager-config
config:
global:
resolve_timeout: 15m
smtp_auth_password_file: /etc/alertmanager/secrets/alertmanager-config/smtp_password
....

--

--