如何在 Ubuntu 22.04 安裝 Prometheus Alertmanager

Ivan Cheng
16 min readFeb 21, 2023

我們預計會在 Loki 建立告警規則並透過 Alertmanager 進行發送,接下來就來介紹如何在本地端安裝 Alertmanager 吧。

安裝 Alertmanager

創建一個新目錄來存儲 Alertmanager 檔案和配置文件

sudo mkdir /opt/alertmanager
cd /opt/alertmanager

從 Prometheus 存儲庫下載最新的發行版本

export VERSION=0.25.0
sudo curl -LO https://github.com/prometheus/alertmanager/releases/download/v$VERSION/alertmanager-$VERSION.linux-amd64.tar.gz

將其解壓縮並確認是否有執行權限

sudo tar -zxvf alertmanager-$VERSION.linux-amd64.tar.gz
sudo chmod a+x /opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager

透過符號連結,在 /usr/local/bin 目錄建立 alertmanager 指令

sudo ln -s /opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager /usr/local/bin/alertmanager

現在可以使用 alertmanager 指令了,驗證一下是否生效。

alertmanager --version
alertmanager, version 0.25.0 (branch: HEAD, revision: 258fab7cdd551f2cf251ed0348f0ad7289aee789)
build user: root@abe866dd5717
build date: 20221222-14:51:36
go version: go1.19.4
platform: linux/amd64

檢視 alertmanager.yml 配置文件如下

route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'

# Inhibition rules allow to mute a set of alerts given that another alert is firing.
# We use this to mute any warning-level notifications if the same alert is already critical.
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
# Apply inhibition if the alertname is the same.
# CAUTION:
# If all label names listed in `equal` are missing
# from both the source and target alerts,
# the inhibition rule will apply!
equal: ['alertname', 'dev', 'instance']

此時我們可以透過下列指令運行 alertmanager

sudo alertmanager --config.file=/opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager.yml
ts=2023-02-21T06:30:26.440Z caller=main.go:240 level=info msg="Starting Alertmanager" version="(version=0.25.0, branch=HEAD, revision=258fab7cdd551f2cf251ed0348f0ad7289aee789)"
ts=2023-02-21T06:30:26.440Z caller=main.go:241 level=info build_context="(go=go1.19.4, user=root@abe866dd5717, date=20221222-14:51:36)"
ts=2023-02-21T06:30:26.444Z caller=cluster.go:185 level=info component=cluster msg="setting advertise address explicitly" addr=192.168.0.18 port=9094
ts=2023-02-21T06:30:26.447Z caller=cluster.go:681 level=info component=cluster msg="Waiting for gossip to settle..." interval=2s
ts=2023-02-21T06:30:26.534Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager.yml
ts=2023-02-21T06:30:26.535Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager.yml
ts=2023-02-21T06:30:26.551Z caller=tls_config.go:232 level=info msg="Listening on" address=[::]:9093
ts=2023-02-21T06:30:26.551Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=[::]:9093
...

此時已經可以瀏覽 http://your_alertmanager_ip:9093/#/status

註冊服務

每次都透過下指令的方式啟動 alertmanager 太麻煩了,可以配置 systemd 將 alertmanager 作為服務運行。

創建一個 Systemd 設定檔

sudo vi /etc/systemd/system/alertmanager.service

添加以下內容

[Unit]
Description=Prometheus Alertmanager
After=network.target

[Service]
ExecStart=/opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager --config.file=/opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager.yml
Restart=always

[Install]
WantedBy=multi-user.target

啟動 alertmanager 服務

sudo service alertmanager start

檢查 alertmanager 服務是否運行

sudo service alertmanager status
● alertmanager.service - Prometheus Alertmanager
Loaded: loaded (/etc/systemd/system/alertmanager.service; disabled; vendor preset: enabled)
Active: active (running) since Tue 2023-02-21 14:32:44 CST; 5s ago
Main PID: 1190775 (alertmanager)
Tasks: 8 (limit: 4612)
Memory: 13.0M
CGroup: /system.slice/alertmanager.service
└─1190775 /opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager --config.file=/opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager.yml

Feb 21 14:32:44 loki systemd[1]: Started Prometheus Alertmanager.
Feb 21 14:32:44 loki alertmanager[1190775]: ts=2023-02-21T06:32:44.363Z caller=main.go:240 level=info msg="Starting Alertmanager" version="(version=0.25.0, branch=HEAD, revision=258fab7cdd551f2cf251ed0348f0ad7289aee789)"
Feb 21 14:32:44 loki alertmanager[1190775]: ts=2023-02-21T06:32:44.364Z caller=main.go:241 level=info build_context="(go=go1.19.4, user=root@abe866dd5717, date=20221222-14:51:36)"
Feb 21 14:32:44 loki alertmanager[1190775]: ts=2023-02-21T06:32:44.365Z caller=cluster.go:185 level=info component=cluster msg="setting advertise address explicitly" addr=192.168.0.18 port=9094
Feb 21 14:32:44 loki alertmanager[1190775]: ts=2023-02-21T06:32:44.370Z caller=cluster.go:681 level=info component=cluster msg="Waiting for gossip to settle..." interval=2s
Feb 21 14:32:44 loki alertmanager[1190775]: ts=2023-02-21T06:32:44.433Z caller=coordinator.go:113 level=info component=configuration msg="Loading configuration file" file=/opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager.yml
Feb 21 14:32:44 loki alertmanager[1190775]: ts=2023-02-21T06:32:44.434Z caller=coordinator.go:126 level=info component=configuration msg="Completed loading of configuration file" file=/opt/alertmanager/alertmanager-0.25.0.linux-amd64/alertmanager.yml
Feb 21 14:32:44 loki alertmanager[1190775]: ts=2023-02-21T06:32:44.440Z caller=tls_config.go:232 level=info msg="Listening on" address=[::]:9093
Feb 21 14:32:44 loki alertmanager[1190775]: ts=2023-02-21T06:32:44.440Z caller=tls_config.go:235 level=info msg="TLS is disabled." http2=false address=[::]:9093
...

啟用開機時執行 alertmanager 服務

sudo systemctl enable alertmanager

移除服務

如果您決定完全刪除 alertmanager,請停止該服務並刪除 Systemd 設定檔。

sudo service alertmanager stop
sudo systemctl disable alertmanager
sudo rm -rf /etc/systemd/system/alertmanager.service
sudo systemctl daemon-reload
sudo systemctl reset-failed

刪除 alertmanager 存放目錄並移除符號連結即可

sudo rm -rf /opt/alertmanager
sudo rm -rf /usr/local/bin/alertmanager

測試郵件發送功能

更新 alertmanager.yml 配置文件如下

global:
smtp_smarthost: 'your_smtp_ip:your_port'
smtp_from: 'your_from_mail_address'
smtp_require_tls: false
route:
group_by: ['alertname']
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'team-infra-mails'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'
- name: 'team-infra-mails'
email_configs:
- to: 'your_infrateam_mail_address'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']

重啟 alertmanager 服務

sudo service alertmanager restart

我們可以使用 amtool 來創建臨時警報以測試配置

cd /opt/alertmanager/alertmanager-0.25.0.linux-amd64

./amtool --alertmanager.url=http://localhost:9093/ alert add alertname="mssql-login-failed" severity="critical" class_type="LOGIN" computer="localhost" eventID="33205" server_principal_name="ivan_cheng" source="MSSQLSERVER" statement="使用者 'ivan_cheng' 的登入失敗。 原因: 密碼與提供的登入密碼不符。 [用戶端: 192.168.0.92]"

瀏覽 http://your_alertmanager_ip:9093/#/alerts,臨時警報有順利推送到 alertmanager。

若 SMTP 伺服器配置無誤,應該就可以收到信件了。

可透過下列指令,查詢系統日誌到底哪裡出錯

tail -n 1000 /var/log/syslog | grep -E 'level=error|level=warn'

預設 require_tls 是開啟的,若沒有支援則可以關閉。

Feb 21 08:36:26 loki alertmanager[1188131]: ts=2023-02-21T00:36:26.206Z 
caller=notify.go:732 level=warn component=dispatcher receiver=team-infra-mails integration=email[0]
msg="Notify attempt failed, will retry later"
attempts=1
err="'require_tls' is true (default) but \"your_smtp_ip:your_port\" does not advertise the STARTTLS extension"

下一篇我們將介紹如何使用 Grafana Loki 告警與記錄規則,透過剛剛安裝的 Alertmanager 發送到電子郵與 Line Notify 通知訊息。

參考文件

--

--

Ivan Cheng

動若不止,止水皆化波濤;靜而不擾,波濤悉為止水。水相如此,心境亦然。