The LiquidEOS team has been working on strategies to significantly increase the stability of the EOS mainnet. Gaps in block production should not happen and below we propose some initial solutions. We have identified three main issues around EOS block production reliability:
- There is a six-second “downtime” for the entire network when a block producer who is supposed to be on schedule is not producing. This may occur for a number of reasons: misconfiguration, a crashed computer or process, a BP transitioned from a standby position and wasn’t ready, or a node which is not yet synced with the blockchain.
- There is a lot of manual syncing between block producers when updating blacklist entries and other configurations that are vital for reliable block production. Syncing manually creates room for error, and the effects of missing a blacklist entry could be worse than not producing.
- The handoff between BPs is sometimes slow because of the arbitrary nature of the schedule, which doesn’t take into account the geographical distances or network latencies between BPs. This is part of the reason producing BPs occasionally miss some blocks at the beginning of their six-second cycles.
Proposed Stability Solutions
The LiquidEOS team recently created a working group to evaluate these issues and contributed two solutions that could greatly improve these situations if adopted by the block producers.
The Watchdog: https://github.com/bancorprotocol/eos-bp-watchdog
This is a simple but very important tool, which will limit the potential damage caused to the main net when a BP stops producing. A block producer who runs the Watchdog tool will be automatically removed from the schedule if the system identifies an issue with their production.
Once the problem is fixed, the block producer can re-register themselves and return to the schedule, all with minimal impact on the continuity of the chain.
A plugin that sends a “heartbeat transaction” every couple of minutes with lots of useful data, including all the configurations and information that BPs are currently exchanging manually in groups off-chain. This also ensures that a standby BP is ready to produce in case they suddenly enter the top 21.
We also developed a contract that keeps the data accessible in a table and a dApp to easily view the information Github
Heartbeat Viewer — Note that the hardware specs portion of the data is provided voluntarily by each BP and is not validated by LiquidEOS nor anyone in any way. It is intended to replace the current manual voluntarily process. Because it can be spoofed, it should not be used as a validation source. We are working on ways to validate this information by bench-marking the active producer.
Adoption and System Contracts
If all BPs install The Heartbeat, it can be used by the system contract to decide whether a standby BP is ready to produce and avoid including them in the schedule if they are not.
To solve the issue of delayed handoffs between BPs, we also added BP-to-BP latency data in this heartbeat transaction, which will allow the system contract to calculate the ideal schedule automatically.
The team at LiquidEOS strongly believes introducing these strategies into the block production process will greatly increase the stability of the EOS mainnet and produce a faster and more efficient blockchain for the community.
We welcome feedback on our proposed solutions, and created the EOS Reliability Tools Working Group to invite ideas and further discussion about how to resolve these and other issues: https://t.me/EOSBPReliabilityTools
Big thanks to the following people for helping build, test or support these tools:
Tal Muskal- LiquidEOS
Nate D-Aloha EOS
Follow LiquidEOS on social media-