This is a short high level overview to show you what is our software and hardware infrastructure @ ADA Point Pool.
Overall goal of our pool is that people have a quality and low fee pool to delegate to which provides max return on investment (ROI) every EPOCH.
The strategy to achieve that goal is for us to have the biggest uptime possible at competitive price/block. In other words having high reliability; no downtime, which means we should produce blocks every time we are assigned to produce them.
Being a highly reliable staking pool means we have to take care of many services in our stack. Lets try to list some: electricity, hardware components, reliable internet, backups, operating system, custom scripts, Cardano Jormungandr node.
If we would try to manage all of them it would be very expensive, therefore we decided to go with a common practice, to lower some of this costs. We use an IaaS provider which is common in the industry, therefore we can delegate some of the responsibilities to companies whose core goal is to provide the most basic computing services to their clients.
ADA Point Pool
Custom workflow scripts
Cardano Jormungandr node
- no electricity downtime, power batteries and generators
- hardware is virtualized, if some component fails there is no downtime
- data center is usually situated or have a good connection with the internet backbone cables so network latencies are much better
- automated backups
- popular OS can be quickly installed & booted
- other driver, hardware optimizations that happen in the background
- server grade hardware Xeon, ECC RAM, …
- relatively speaking low cost for what you get
- not directly in line with decentralization effort
- no customizations at this levels, for example you cannot power your server with solar power
- if reliability of this services is bad you cannot directly affect this yourself
The pluses in our case out-weighted the minuses.
To address the negative sides. Decentralization — wise we have a separate backup system. We have a spare IaaS provider and if that fail a direct hardware machine when we can run our nodes if the need arises. This can happen with IaaS if reliability becomes bad or from other social pressures that come with running decentralized nodes.
ADA Point Pool services
For good basic low level step by step setup you can follow an awesome Chris Graffagnino tutorial. Here we will describe some things that we do beyond that.
We are running a server version of Ubuntu 18.04.
There are other important parts for pool operators that you have to take care of on the operating system level:
- Operating system should be updated with latest security patches
- Properly configure firewall, we use ufw
- DO NOT install or add any services that are not crucial to node service operation
- Setting correct operating system permissions; This really helps with what you can screw up unintentionally. Also it helps with what an attacker can do in your system
- Use SSH keys to login to the system
Also a useful tip is to use mobile SSH client on the go. We use JuiceSSH. This is really handy especially after stability upgrades if you get an alarm so you can quickly check if everything is in order wherever you are. Also in an emergency you can do a quick fix or revert just so you don’t miss any blocks.
Custom workflow scripts
There are always some scripts that devops(sysadmins, IT) need to automate some tasks so they are not repeated everytime. This should all be tracked with a version manager like git. So when we are updating scripts we can track changes and revert if something goes wrong. Also it is more easy to transfer them to a new computer.
Our core script responsibilities
- Spun up a new node instance by just adding config with port number to config/nodes/3100 for example.
- Generating config from predefined peer list pinging every server and sorting results by latency.
- Check any historic hashes for any currently running node. For example check if hash 5 blocks ago matches with Cardano explorer node or any other node.
- Check if nodes’ last block hash is not changing.
- Gracefully kill process(node) script.
- Custom “state” script which has one word state and knows how to describe internal node errors to other scripts.
- Leader block schedule dates.
Note: All our scripts are stateless so we can compose them as we want. Data that we need is stored separately in files. We are saving:
- Last start time for each node: To know how long we are bootstraping — “Restarting”)
- Block hashes for each node: To compare older hashes if our node is faster than the one we are comparing it to
- Storage sizes for each node when bootstraping — “Restarting”: To see if size is increasing while bootstrapping
With this kind of scripts we compose other more complex behaviors like operational scripts.
- Block hash is not changing (for n seconds) restart the node
- If node is on a different fork restart the node and optionally purge the whole storage
- Start script that updates and sorts trusted peers. Good guard here is to check that you don’t intentionally run two leaders with same script. Other guard should be in a script that promotes the leader through the API.
- Stop script that knows how to gracefully kill the node process
- Min script which tells us how much minutes we have until the next block we need to mine
Cron job is program that can trigger some scripts periodically. Cron job is made for periodic tasks. Therefore you should use Cron jobs or similar to repeat a process.
Our cron job automation makes sure that scripts are run every 10 seconds to check if everything is in order and take appropriate actions. In the past we added some extra checks when closing in on a block for some extra safety.
Basic script API on our system is similar to any resource/entity API
start 3100 — run node on port 3100 from config 3100
start — starts all nodes in config/nodes/* folder
stop 3100 — stop node on port 3100
stop — stops all nodes // custom check if leader and block is close by
stats 3100 — shows stats for node running on port 3100
stats — shows all nodes stats
state 3100 — one word state “Off”, “Error”, “Restarting”, “Running”
state — show state of all nodes
uptime 3100 — show uptime for node port 3100
uptime — show uptime of all nodes
This takes care of the basic node operations.
Monitoring is recording some parameters in the stack to make informed decisions to further improve the stack so it is even more aligned with our strategy to have an awesome uptime.
For example; if we record what state is the node in with a resolution of 10s we can see what node was doing when block time came, and maybe see what went wrong if block was not mined.
Giving the example above this is one of the most important parts of the stack even though it is not directly linked to uptime. It is more of a feedback loop for us to improve the overall system stability.
There are multiple levels of monitoring:
- Our IaaS provides basic hardware, network logs.
- Operating system has its’ own logs
- We provide software logs for node and scripts.
- Jormungandr node has its’ own logs defined in config file or passed directly through command line interface.
Our software logs are normal files with some specific formatting.
We further use Graphana + Prometheus. You can visually represent your logs there and add alerts if needed.
Cardano Jormungandr node
Last service in our stack is the node software itself. Its’ name is Jormungandr and is being build and maintained by IOHK. Cardano network is dependent on this node software as the the whole network is just a network of Jormungandr nodes.
In the future IOHK and maybe some other companies will provide other node software. They already have the Haskell version of the node in the pipeline which is awesome as primarily we have much experience and are a big fans of use of functional programming.
Current Jormungandr node is written in Rust which is another awesome language. The main problem in our opinion is that when programming in rust it is impossible to screw up memory with memory pointers which is by the way the main problem with software programs nowadays. Even Microsoft claims that 70% of their sofware problems is memory safety and we will increasingly see the use of memory safe languages in the future.
The node software was the biggest problem to achieve our planned uptime in the first ~30 epoch. There were still some bugs and the most annoying were:
- Node gets stuck after some time; Action: Restart needed
- Node is on a forked chain; Action: Restart needed
- Node bootstraping is long, stuck: Action: Restart needed, check peers
- Node drifting (blocks not updating) is very often: Action: Remove public_id, public_address, listen_address from config
Updating your node to the latest version!!
We are always checking Jormungandr github repository and release logs. This is the best!!! strategy to improve node uptime. Stability of the node was greatly improved with 0.8.6 release.
Maybe the image above will be something normal in a couple of weeks but having a node uptime of more than 6 hours before 0.8.6 was really hard work.
In the early stages we were even testing custom builds of master branch just so we could improve stability!
Hot bootstrapping, trusted peer nodes
Looking at the most common node issues listed above we can find out the second most important strategy.
To achieve best uptime you need to have a fast bootstrap. So when the node fails for some reason you can restart it and have it up in a single minute. This was really important before 0.8.6. The trick is to have a list of fast private bootstrap nodes and not just a list of common nodes from which everyone was bootstrapping.
This public common nodes are overloaded and it is up to chance if you will boot in 6 minutes, 12 minutes, 16 minutes or 25 minutes. Also some more bad news for public nodes.
Your node fails =>
There is a high chance the network has some problems =>
That means other nodes are failing =>
That means they all want to bootstrap from public nodes =>
Bootstrapping is really slow =>
So when you need public nodes the most, chances are you will have the most problems with them.
Therefore we spun up another virtual instance and added 3 nodes with public_id. There is one more problem with these instances. We noticed that if you define public_id they are much more unstable and drift faster.
These 3 nodes are defined as trusted peers and bootstrapping nodes for each other. This works great but you should have a fallback to some public trusted peers just in case all 3 fall out of sync at the same time; therefore the node would not have a peer to bootstrap from.
All these 3 nodes are a trusted peer for the main node which is on a separate computer and cannot be influenced by these nodes. Given nodes with public_id are a bit unstable now you should not run them on the same machine for now.
You should never install compilers and tooling on your server stack. Why? Compiling is very compute intensive, it can happen that you miss a block if compiling and it is your turn to produce a block. You can compile your binaries on the exact same operating system with the same basic instruction set and transfer it to the live server. Also you can just take the binaries from the tag release page if you use the most common architectures. For example for github tag 0.8.6 see assets at the bottom of the page.
Couple more thoughts about software maintenance.
We believe it is always the best idea to keep things as simple as possible. Do not create 4 tiers of caching with different settings. Also if something doesn’t bring improvements, or just minor, even if you used a lot of time for doing it just throw it away.
Test on testing machines. Never test and change things on production if not tested properly. Why change something that works as it should! The only exception is emergency but even then you have to think things through and have a fallback plan if things do not behave as you planned.
We predict that in the future the node software will be so mature to not need much of the tooling described here. Some parts will still be nice to have, like fast bootstrapping, which works as a caching system. The barrier to enter the pool operator game will be much lower and fees should be smaller than now. We will also have predefined docker images so the pool should be few clicks away.
/remindme 1 month from now
We are really happy with our current architecture and the fact that objectively we have achieved our defined goal, to have a high quality pool at low fees. In the future we will write more about specific sections in this article and some more about financial incentives for the pools and other game theory.
About author. My nick is Gwearon. I am a computer programmer with about a two decades of programming experience. My interests are functional programming, smart contracts and math.
Let us know if you find any mistakes.
Happy node operating,