Delegate Update: Failover Script

In the last weeks all delegates have a hard time to keep their nodes in sync because there are some problems in the network. Some of the delegates are currently trying to clarify the problems and fix them with the ARK Dev team.

The problem is that it need much time to find out the problems and fix them. Until the network is stable again every delegate have to keep his delegate forging and building their own failover scripts or something else. That’s why we developed our own failover script and node setup which we want to introduce in this article. With this failover script we can promise that we’ll only miss maximum 2 blocks when we fall out of sync by any reason.

Server Setup

We’ve improved our server setup that we can run this failover script and our websites (calculator and pool). The setup is the following:

  • master (runs the failover script and communicates with the relays)
  • relay-1 (running ark node who is currently forging)
  • relay-2 (running ark node who act as a normal relay in the network)
  • relay-3 (running ark node who is responsible for the calculator and the pool website)

The master node itself have small specifications (1 vCore, 1GB RAM, 20GB SSD). The 3 relays all have the same setup: 4 vCore, 4GB RAM, 40 GB SSD).

This setup is also perfect for updates on the nodes. When a node needs to be updated we update the non forging relay first and when everything works fine we switch the forging node and update the other relay.

The Script

The script right now is quite simple. It opens up a ssh connection to the current forging relay (relay-1) and checks the ark.log every second. When the log outputs the message “Blockchain not ready to receive block” the script recognize it and set the delegate secret on the second relay (relay-2) who is in sync with the network. This step only takes 2 seconds and the second relay is now the forging delegate.

The next step is to rebuild the relay (relay-1) that get out of sync. We download the current snapshot and import it. Rebuilding the database and restarting the node process. The relay is now back in sync with the network and waits for the next failover switch.

Both steps has finished and both relays work again. Right now we’re telling the script that the relays (forging and relay) have been switched and then we’re restarting the script and it start checking if the forging relay (relay-2) falls out of sync.

Next steps

We’re facing the problem right now that we can switch the forging delegate within 2 seconds but need to wait a whole forging round to be activated as a forging delegate. We’ll trying to fix that problem but right now it seems that we can’t do anything against it. We’ll get in contact with the ARK Devs for that topic. That seems that we’ll “only” miss 2 blocks when we get out of sync.

We’ll improve the failover script and maybe make it open source. We don’t know if we should make the script open source because every delegate should know what they’re doing and the script is also in an alpha version and still in “testing”.

TL;DR

  • new server setup (4 server, 3 running nodes and one master node)
  • new failover script (maximum 2 blocks missed when get out of sync)
  • trying to only miss 1 block when get out of sync but it seems like we can’t do anything against it
  • thinking about making the failover script open source

Vote for reconnico!

Proposal: https://forum.ark.io/topic/382/reconnico-community-developer-and-ark-enthusiast-90-profit-share-8-development-2-cost-coverage-devnet-supporter

Website: https://pool.reconnico.com/