Security best practices for a cosmos validator

2pilot

Published in

Coinmonks

5 min readOct 9, 2023

General security memo

Do not manage your node from root. Use separate user for that
Use ssh keys instead of passwords and forbid root to ssh to your host
Log / ban unauthorized login attempts. Could be done with a tool like fail2ban
Keep your system and third-party libraries up to date with the latest security patches

Monitoring

It is important to set up proper monitoring right from the start to avoid missing blocks and getting jailed. While there are many ways to configure the best monitoring system that fits your needs, let’s describe next what we are currently using on our daily basis.

Prometheus is a system montoring tool that collects metrics from your hosts and allows you build graphs and alarms based on them. Out of the box your cosmos node provides useful prometheus metrics like consensus height, validator missed blocks, mempool size and many more that could be found here. Node exporter is an excellent tool to extract huge amount of metrics related to your host like cpu load, memory pressure, disk iops, network traffic etc.

Grafana — allows you to build dashboards and alarms from many different dasources and prometheus is among of the supported data sources. Are you experiencing cpu spikes ? Is your disk about to run out of space ? Or maybe you need to understand for how long your server was not available to the internet. All that is very easy spot or prevent when you have proper dashboard configured.

PagerDuty — incidient response tool across all your digital infrastructre. While you can foward your alarms to email / telegram / discord it is much better when you have dedicated tool that collects and groups your alarms by their severity, handles escalation to another person if the issue wasn’t resolved in time, provides automatic on-call rotation between your team members. You can also have different kind of alarms based on their severity to allow alarm to ignore “Do not disturb” mode and wake you up if it is an urgent matter.

Those are the basic tools that we use but here are some other services worth mentioning:

Tenderduty — monitoring for tendermint chains
Panic — monitoring and alerting for blockchains
Cosmos-exporter — if you need more metrics that are available from the default tendermin exporter

Ports

Use principle of least privilege when considering which ports to open at your validator node. Ideally you only need p2p port open ( 26656 ) by default. The rest could be blocked in your node config files:


# disable rpc port
# ~/.project/config/config.toml
[rpc]
laddr = "tcp://127.0.0.1:26657"
cors_allowed_origins = []


# disable rpc port
# ~/.project/config/app.toml
[grpc]
enable = false
address = "0.0.0.0:9090"
[grpc-web]
enable = false
address = "0.0.0.0:9091"


# disable json-rpc port ( only for evm compatibale chains like zetachain or haqq )
# ~/.project/config/app.toml
[json-rpc]
enable = false
address = "0.0.0.0:10545"
ws-address = "0.0.0.0:8546"


# disable api ( lcd = api = rest )
# ~/.project/config/app.toml
[api]
enable = false
swagger = false
address = "tcp://0.0.0.0:1317"

or via firewall. When using firewall keep in mind that you also need to whitelist your ssh port also ( 22 any custom port you have )

Check out this example on how easy it is to find validators with rpc port opened. Opening RPC/GRPC port for a node that isn't optimized for heavy query workloads can lead to it going offline after just a few demanding requests. This makes attacker`s life much easier.

DDoS

Even if you properly closed all non mandotory ports for validation it is possible to spam your host through p2p port and make it inoperative for the duration of the attack. To avoid this you can setup your validator node to only communicate with a set of trusted sentry nodes via direct link and make it inaccessible to the outside world.

Source — https://forum.cosmos.network/t/sentry-node-architecture-overview/454

This way it is impossible to spam your validator, only it’s sentry nodes. But is should be very easy to scale / change them in case of such attack.

Keeping all the unneccesary ports closed and your validator ip hidden is very important to avoid slashing penalties and reduce risk for the network since having multiple nodes down at the same time is bad for network stability.

Key protection

TMKMS — tendermint key management system. This is a separate process which extracts signing logic from your validator node and can run separately from your validator host. It is also very easy to plug in various signing mechanisms like:

All of those options will protect your private key when your host was compromised.

Horcrux — a multi-party-computation (MPC) signing service for tendermint nodes. It allows you split your key into parts and store each part on a separate host. You can configure how many key parts is required to collect your private key signature. For example it could be 2 parts out of 3 total. This means that in order to compromise your private key, attacker needs to get access to 2 of your hosts.

More info about how to setup horcrux and tmkms could be found in our previous articles

TMKMS with quark-1 (neutron) testnet

Intro

medium.com

horcrux/docs/signing.md at main · strangelove-ventures/horcrux

A threshold Tendermint signer. Contribute to strangelove-ventures/horcrux development by creating an account on GitHub.

github.com

Double sign prevention

double_sign_check_height — when set in config.toml to some non zero value like 5 / 10 / 15 your validator node after restart will panik if it participated in consensus in the last 5 / 10 / 15 blocks. This will help to avoid double sign in many cases, for example on migration when old process wasn’t killed properly. While it doesn’t give you 100% double sign prevention it is still covering a lot of unexpected use cases. The only downside is that your node need to always skip configured amount of blocks after restart. This will require some additional configuration if you are using cosmovisor for upgrades.

While double_sign_check_height is a great option to have it is not recommended to solely rely on it when it comes to double sign prevention. TMKMS and Horcrux modules discussed previously provide more advanced double sign prevention mechanisms.

While this article is far from a comprehensive security manual it should give you a good starting point to protect your validator from major threats thats out there.

Security best practices for a cosmos validator

TMKMS with quark-1 (neutron) testnet

Intro

horcrux/docs/signing.md at main · strangelove-ventures/horcrux

A threshold Tendermint signer. Contribute to strangelove-ventures/horcrux development by creating an account on GitHub.

Written by 2pilot