How To — Grafana Alerts on PagerDuty & Telegram for Shipchain Validators

Shipmate.FR.nl
Shipchain (un) Official Community
6 min readAug 25, 2020

Assuming you have been able to follow the tutorial below to set up your own Grafana Dashboard, we are going to cover in this article how to set up notification channels, on Telegram and PagerDuty, that you can send your Grafana alerts to.

Indeed, some parameters are crucial for a Validator, such as the % of blocks that the node is voting on (a too low % would induce slashing for the Validators and his Delegators), or ones that are related to the system health (uptime, disk space, cpu/ram usage…).

Now, considering the responsiveness expected from Validators nodes’ admins (few hours), receiving these alerts by email may not be sufficient to draw their attention in due time, hence the need for more advanced notification channels.

Now let’s get started…

We will first set up the notification channels on Grafana (incl how to sign in and set up PagerDuty and create a Telegram bot), and then assign them on the alerts of your choice.

STEP 1: Set up a notification channel on PagerDuty

  • Download the PagerDuty app on your phone and sign-up from there.
  • Sign-in on your computer with that account.
  • Stick to the Free Plan, it should be enough for what you need.
  • Click on your profile name on the top right hand corner and click to “My Profile”
  • Set up your emails and Test the mobile notifications.
  • Go to Configuration > Escalation policies and make sure to include reminders to your notifications as per the print screen below.
  • Don’t forget to repeat the policy X times, otherwise the notification will appear only once on your phone with no reminder.
  • Come back to the Home Page and in the Services section, click on New Service
  • Give the service a name. For Grafana, choose: “Use our API directly”.
  • Once created, you will find your Integration Key in the settings of your new Service.
  • Come back to the Home Page and in the Services section, click on New Service.
  • Give the service a name. For Grafana, choose: “Use our API directly”.
  • Once created, you will find your Integration Key in the settings of your new Service.
Give it a name. For Grafana, choose “Use our API directly”.
Once created, you will find your Integration Key in the settings of your new Service.
  • Login to grafana
  • Click to the left Bell icon
  • Add notification channel
  • Select PagerDuty and the options you prefer.
  • Choose a name for the channel and paste your Integration Key in the field.
  • Click on Send Test. If everything went well, you should receive notifications on your phone app and email address.
Choose a name and paste your Integration Key in the field.

STEP 2: Create a Telegram bot and a Telegram channel

Open Telegram and search for @BotFather user and message them the following:

You:
/newbot
BotFather:
Alright, a new bot. How are we going to call it? Please choose a name for your bot.
You:
TypeYourBotName
BotFather:
Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot.
You:
TypeYourBotNameBot
BotFather:
Done! Congratulations on your new bot. You will find it at t.me/shelley_monitor_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.
Use this token to access the HTTP API:
99...:AA......TvF8
Keep your token secure and store it safely, it can be used by anyone to control your bot.
For a description of the Bot API, see this page: https://core.telegram.org/bots/api

Create a channel in Telegram and name it whatever you like. e.g. My Shipchain Node Alert or similar

  • Invite the bot you just created to that channel and promote it as admin
  • Type at least one message on the channel /!\ this is very important.
  • Get the Chat id by typing this address on your browser, adding the API TOKEN provided by @BotFather few steps above:
https://api.telegram.org/bot<YOUR API TOKE FROM ABOVE>/getUpdatesThe website will return the following code. What you're interested is what's written after "id":{"ok":true,"result":[{"update_id":1112223334445,
"channel_post":{"message_id":1,"chat":{"id":-<YOUR_CHAT_ID>,"title":"ShelleyMonitoring","type":"channel"},"date":1576534122,"text":"/bot","entities":[{"offset":0,"length":4,"type":"bot_command"}]}}]}

STEP 3: Set up a notification channel on Telegram

  • Login to grafana
  • Click to the left Bell icon
  • Add notification channel
  • Select Telegram
  • Enable/disable settings you prefer
  • Fill the BOT API Token (provided by @BotFather)
  • Add the chat ID below (it starts with -)
  • Click Test notification. If everything went well, you will see a new post from the bot on your Telegram channel.
  • Save it.
Fill the BOT API Token (provided by @BotFather). Add the chat ID below (it starts with -)

STEP 4: Create your Grafana Alerts

  • Please be aware that you can only set Alerts on Grafana on Charts/Graphs.
  • You will find the screen below after a right click on the chart > Edits
  • Go the Alert section, create a new Alert and set it up as per the print screen below.
  • This alert below will calculate every minute the average CPU load of the Query A and raise an alert if the load is above 80%. It will also raise an alert if the variable is not responding at all (important for shipchain related variables).
  • You can chose as many Send To Notifications channels as you have set up before.
It is good practice to try “Test rule” and see if the variable is different to (null).
Here are all the alerts I have set up on my node. Vote percentage < 80% is worth being notified of but the real critical one is the one < 20% when slashing could occur.

STEP 5: Wake up in the middle of the night!

  • You can see below some examples of what push notifications from PagerDuty and via email will look like.
  • Here is what the bot looks like. A potential useful application of that channel and bot is to bring your Delegators in so they can be notified in real-time of any alerts on your node and react accordingly before being slashed (would happen in the Vote percentage < 20% alert is not being addressed within 5–6 hours). Delegating temporarily to another node just before that deadline would remove any exposure to the Delegator but of course affect strongly the Validator losing its delegations (thus leading him/her to take faster corrective actions next time).

You are now ready for the next article: How-to set up a Dead Man Snitch on your node. Why? Because the biggest vulnerability of the above method is that it won’t trigger anything if the Grafana service has crashed or if the node itself is shutdown. One needs a different approach to cover all potential situations…

Stay tuned!

shipmateFRnl

Note: The Telegram Bot tutorial was heavily inspired from this publication:

https://gist.github.com/ilap/cb6d512694c3e4f2427f85e4caec8ad7

--

--