How To — Dead Man Snitch & PagerDuty Alerts for Shipchain Validators

Shipmate.FR.nl
Shipchain (un) Official Community
7 min readAug 30, 2020

Setting up a Dead Man Snitch is critical to any system administrator for the simple reason that all other alert systems (like the Grafana ones covered in our previous article) are based on the assumption that these systems are ACTIVE and thus CAPABLE of sending the alerts. But what if the server is completely down or the services in charge of sending the alerts failed? Nothing will happen then. On the other hand, a Dead Man Snitch will address these situations, by following the opposite concept: the server will send periodically a sign of life, confirming that everything’s alright, and you will be alerted when it stops doing it.

In this tutorial, we will mix up two techniques covered in our two previous articles (links below), how to set up cronjobs and how to relay the alerts via a PagerDuty service.

Disclaimer: contrary to previous articles, this one is using features that are *not* free (notifications after minimum 15mn of downtime and integration with PagerDuty). If receiving an alert only after a maximum of 1h59mn of downtime, by email, is sufficient to you, then you can stick to the Free membership at Dead Man Snitch.

Now let’s get started…

We will first sign-up to Dead Man Snitch, then set up a Cronjob on your server that will ping to your DMS under certain conditions. When that Cronjob stops sending these packets (for whatever reasons), Dead Man Snitch will alert you in the way you’d prefer.

STEP 1: Sign-up to Dead Man Snitch

  • No rocket science here, follow the link below and sign-up to Dead Man Snitch.
  • Please again be aware that this tutorial is based on the paid membership called Private Eye with the main advantages of being warned faster (15mn intervals), with Push Notifications on your phone and being able to build multiple snitches (which implies one can share costs with multiple admins for up to 100 snitches — provided you trust your team members enough as they will be able to access to all email addresses and integration keys setup on the account).
  • We suggest to try first the free membership which will still allow you to be alerted within 1–2h to any email addresses you will setup in the DMS (useful if you already pay attention to push notifications from your email app).
Do not underestimate the importance of the intervals. Check the explanation below.

STEP 2: Create a New Snitch and pick-up its URL

  • Setting up a new snitch is elementary. Minimum set up is a name, an interval and an email address to receive the alert.
  • Once done, go to Setup and note down your Unique Snitch URL, you will need in the next steps.
  • /!\ The definition of 15mn interval is a bit misleading since it doesn’t represent the time you’d be warned of an incident. Actually with this interval, the maximum downtime before you receive an alert could be up to 29mn. Indeed, the DMS will check its records and send alerts at fixed times, 18:00, 18:15, 18:30… provided the node was down for the entire period (meaning, down at 18:15 AND 18:00 = ALERT!!). So, in the most extreme example, should your node stops pinging at 18:01, you will not receive an alert at 18:15 but rather 18:30. So, pick the free membership (60mn interval) only if you’re ready to be notified only after a maximum of 1h59mn of downtime.
This is how it looks like when everything’s fine. Our DMS receives sign of life from the server every minute.
Me poking at the DMS at 7:15 after I shutdown my node at 7:02 wondering why it didn’t send an alert… =)

STEP 3: Set up the Cronjob

  • The point of this step is to set up an automatic task running every minutes on your server, who will send a sign of life to your Dead Man Snitch. As long as it does, everything is fine. When it stops, then the alert mechanism described above will be activated.
  • You are free to decide what is the condition for the cronjob to send this sign of life. In this particular example, we are picking up whether the Shipchain node is currently caught up. If it isn’t, then it is not validating blocks either, so you are at risk of being slashed and you would like to be warned. Of course, if the command returns nothing (eg. the service of the node is down) or if the cronjob doesn’t even work (eg. the server itself is down), then you would also like to be alerted.
sudo apt-get install jq
crontab -e
  • Do note that the first row of the cronjob needs to be looking like ((you can add PATH as many as you like but make sure each path are separated by ‘:’)
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/snap/bin
  • Please note the content of this row will differ for every users. You can fortunately check the PATH above which is correct for your own installation.
type hydra
  • This command should return something like “hydra is hashed (/usr/local/bin/hydra)”. In that case you needed to have at least PATH=/usr/local/bin in the first row of the cronjob.
  • Then the next row of the cronjob should look like this (replace the URL by the Unique Snitch URL you noted down earlier)
*/1 * * * * hydra -o json client status | jq .is_caught_up | grep true && curl https://nosnch.in/[yourDMSlink]

STEP 3: Relay Dead Man Snitch alerts into PagerDuty (optional)

  • With the Private Eye membership, the Dead Man Snitch Android app will send you a Push Notification to alert you which may be sufficient to you. Therefore the rest of this tutorial is optional.
  • Our main reason to go further and leverage the integration with PagerDuty is to receive reoccuring alerts (every x minutes) — the DMS app will send you only one — until the issue is resolved and to set up specific ringtones (what is the point of an emergency alert if you could miss it…).
  • In Integration Type, chose Dead Man’s Snitch. All other options can be kept as is.
  • For the next step, you need to go back to deadmansnitch.com and in Integrations, click to +ADD next to PagerDuty.
  • It will redirect you to pagerduty.com and asks for your credentials and approval to linkup deadmansnitch with your account. Go ahead.

STEP 4: Wake up in the middle of the night! (again)

  • Thanks to these alerts, you will know your server is not behaving as planned, giving you the opportunity to log in and check.
  • See below examples of what it will look like on your phone.

This third article marks the end of a first serie meant to ensure any (every?) Shipchain validator nodes can be alerted in time to perform corrective actions before being slashed. Network’s security is at stake, this is a big deal.

Should you have any questions, observations or suggestions of another security measure, feel free to reach out to us on one of the Telegram channels below.

Thanks for reading!

shipmateFRnl

Note: many thanks to Leeward Bound, former CTO of Shipchain Inc. for his precious help in setting up the clever Crontab command described above.

--

--