Distributed Validator Technology on Eth2
Mara Schmiedt & Collin Myers | February 2021
- DVT (fka SSV), a joint research effort supported by the research team and a grant from the Ethereum Foundation, seeks to bring better infrastructure resilience to the eth2 Beacon Chain
- The solution aims to offer an active-active redundancy configuration for institutional providers, staking pool operators, and at-home validators to mitigate potential failures at the Validator Client (VC) and Beacon Node (BN) level
- The testing of the first Go-based DVT implementation set up by the Blox.io team is already up and running, the core team plans on releasing the audited, open-source implementation of the middleware in Q2-Q3 of this year.
- Check-out the Eth2 Calculator to learn more about the economics of an DVT configuration for your at-home or cloud-based validator set-up.
Read our previous article on DVT (fka SSV) on ETH2 here.
Eth2 Validator Responsibilities & Potential Failure Modes
The key areas of responsibility for validators on the eth2 network, which are also key determinants of their associated reward performance and operative risk, can largely be grouped into two categories: Liveliness and Safety.
- Do not go offline. If you go offline, make sure you aren’t offline when everyone else is.
- Do not produce slashable attestations or blocks through double-signing. Make sure you do not run multiple instances with the same key.
- Keep your validator private key safe. Protect against potential compromise.
Today, different potential modes of validator failures are mitigated through different solutions.
Protecting against failures in liveliness at the Beacon Node and Validator Client level is primarily achieved through redundancy in associated components such as hardware provider, hardware location, cloud provider, cloud location, and validator client(s). Due to anti-correlation penalties in Eth2, which are designed to multiply slashing penalties if a group of validators commits similar offenses at the same time, diversified configurations and redundancy are particularly important for validators of all sizes.
Most providers leverage active-passive redundancy configurations to protect against service defaults in one or more infrastructure components of their primary setup. In this type of configuration one or multiple secondary nodes serve as a backup upon detected node failure. This type of protection is difficult to implement and requires extended expertise and testing if you are an average at-home validator or operate a trustless staking pool. In addition to this, the current specifications and implementations create a limitation to the sub-component level at which redundancy can be introduced. Validator Clients for example need to communicate with dedicated, client-specific Beacon Node instances making active failover between nodes that operate on different clients an impossible task..
Safety measures for slashing protection already exist at both the local level, which is enabled by default in existing validator clients such as Prysm. In addition, slashing protection can be implemented remotely. Unfortunately, neither of these can protect against faulty configurations where multiple validator instances are run with the same validator key or local slashing protection is deactivated for performance gains. Unfortunately, one of the root causes for recent slashing events that have occurred on the eth2 chain has been a result of these types of faulty setups.
To date, there has not been a single, out-of-the-box solution that effectively mitigates all the aforementioned potential modes of validator failures — the release of the Secret Shared Validators middleware changes this.
Distributed Validator Technology (DVT)
DVT (fka SSV) middleware provides the highest degree of infrastructure resilience against all types of potential failures at both the Validator Client (VC) and Beacon Node (BN) level on the eth2 network. You can think of it as a large multi-sig for distributed consensus finding duties on the Ethereum blockchain. For the short take check out this presentation or keep reading read on.
DVT consists of 4 key components: Distributed Key Generation, Shamir Secret Sharing of BLS Signatures, Secure Multi-Party Computation and the DVT BFT Consensus Layer. The end result of these 4 pillars can be seen in the graphic below, which represents an DVT validator on Eth2 utilizing a 3 of 4 threshold signature across 4 key shared VCs and 4 BNs configuration. The number of BNs and number of VCs depends on the requirement, and need not be the same.
Distributed Key Generation (DKG)
Distributed key generation is a cryptographic process in which multiple parties contribute to the calculation of a shared public and private key set. Each representative that is a part of this distributed ceremony owns a portion of the private key, preventing a single party from having direct access or control to the entire private key.
DKG is a core primitive for DVT as it enables multiple machines to work together with the same validator private key. In addition, it also builds the foundation for the threshold scheme (e.g. 3 of 4) that will be used for a given DVT set-up.
Shamir Secret Sharing of BLS Signatures
Secret sharing is a mechanism by which a secret (i.e. private key) is split and distributed across different participants in a way that each participant holds a share of that secret. To reconstruct the secret a predefined threshold of shares needs to be combined (e.g. 3 of 4), individual shares cannot be used to reconstruct the secret.
Secret sharing will be primarily used for the management of a given validator key between VCs in an DVT configuration. This is a core primitive as it enables active redundancy across instances and enhanced key security.
BLS signature schemes facilitate secure cryptography within a blockchain protocol, which allows a user to verify that a signer is authentic. This method allows validators to sign messages, and these resulting signatures are then aggregated and verified at scale. This enables a full Proof-of-Stake system with a massive number of validators to function efficiently in production. BLS signatures schemes are used by modern blockchains such as Eth2, Chia, Filecoin, and Algorand.
BLS Signatures are deterministic in the sense they depend only on the message and the signer’s key, which means any two BLS signatures on a given message with the same key are identical, making multi party computation much easier (discussed later).
A primary benefit of BLS signatures are its additive properties, which means they are friendly to aggregation. This allows multiple signatures to be combined into one, enabling messages to be signed, without requiring reconstruction of the private key or a full MPC.
A base layer blockchain such as Eth2 using BLS signature schemes allows for DVT’s to collectively work together to sign messages across a group of nodes in an efficient manner.
Secure Multi-Party Computation (MPC)
MPC allows for multiple participants to compute a function of their inputs in a privacy-preserving way. Applying MPC to secret sharing allows for secret shares to be distributed amongst participants, to perform decentralized computation of these inputs and generate the secret-shared output without reconstructing the secret on a single device.
MPC will be primarily used to enable distributed network instances and/or operators to securely coordinate the key generation and reconstructing ceremony in a decentralized capacity on separate machines. MPC allows for the computation of the proof of custody required in later phases of Eth2.
Consensus (Istanbul BFT)
The final coordination mechanism required to enable an DVT configuration is a local consensus algorithm used amongst Beacon Nodes that utilizes threshold signatures. The consensus algorithm is utilized to achieve fault tolerance between the different beacon nodes. A leading candidate for this is Istanbul BFT (IBFT), which is a deterministic leader based consensus algorithm that can tolerate up to ⅓ node failure.
Threshold signatures in a cryptosystem are used to protect information by encrypting it and distributing it among a cluster of fault-tolerant computers. The message is encrypted using a public key, and the corresponding private key is shared among the participating parties. As a result, to sign a message a predetermined threshold of participants must cooperate in the signature protocol (such as 3 of 4, 5 of 7, or 7 of 10).
Utilizing a threshold signature scheme allows for a private key to be split into a specified number of shares amongst a group of network operators with a corresponding threshold for how many of these share signatures are required to command a validator (aka send and sign messages to the beacon chain). In this context, no individual network operator would have unilateral control over an Eth2 validator, creating a trustless signature process, removing the single point of failure drawback that exists in the industry today.
Istanbul Byzantine Fault Tolerance
IBFT uses a group of network operators to determine if a proposed block is suitable for addition to the chain. One beacon node of the DVT configuration is arbitrarily selected as the leader and is responsible for block proposal and sharing it with the other Beacon nodes. If a supermajority (<66%) of the Validators deem the block to be valid it is added to the blockchain.
In the event that the chosen leader goes offline during the consensus process, IBFT will undergo a fast leader change and choose another DVT node to take on that role. In Eth2 slot times are 12 seconds and to be on par with current Eth2 clients the DVT should be able to receive a new block at the start of a slot & produce an attestation to it within 4 seconds. However, aggregators on Eth2 wait 8 seconds to collect attestations, setting a time ceiling for how long an additional leader must be selected before the DVT receives an inclusion delay of 1. IBFT is able to come to finality in 3 rounds, which requires each round of IBFT to be completed in under 1 second, assuming we want to protect against the potential of one leader falling out.
Infrastructure Providers — Enabling Active-active Redundancy Configurations
Infrastructure providers that currently serve retail and/or institutional staking needs, tend to offer cloud provider and region based redundancy as well as multi-client support for an added layer of resiliency against client-bug related defaults.
As previously discussed, redundancy is, in most cases, achieved through active-passive cluster configurations that consist of multiple redundant VC and/or BN nodes. However, as the name “active-passive” implies, not all nodes in this configuration are active. In such configurations the passive i.e. failover node(s) serves as a backup that’s ready to take over as soon as the active (a.k.a. primary) node experiences disruptions. Creating this type of configuration to support multi-client redundancy in a single deployment is a difficult task to achieve given current technical restrictions.
DVT configurations allow for active-active cluster redundancy across all layers of the infrastructure’s sub-components including active redundancy across different validator clients enabled through the DVT API. The main purpose of an active-active cluster is to achieve minimal service disruptions and enhanced resiliency against various types of node failures.
Furthermore, it creates a potential pathway to enabling more dynamic deployments that can be configured to operate across different provider and/or in-house setups depending on customer diversification needs and/or risk tolerance.
Decentralized Staking Pools — Preventing Single Operator Failures
Today, the most prominent implementation for Eth2 staking pools relies on a single operator architecture. Single operator architectures rely on a single operator with a single validator key operating a given staking pool, which is normally 32ETH. It is common for these networks to have programmatic limits on the amount of pools any 1 network operator can run and a shared withdrawal key that utilizes threshold signing.
When users deposit into the staking pool, Ether accumulates into a smart contract on the ETH1 chain and is then matched to a whitelist of validator addresses round-robin style in chunks (for example Lido does 32ETH chunks). If there are 4 different network operators in a pool network, the distribution algorithm can distribute stake first to the validator who has the least allocated to their node, based on the network operator pool limit (described above).
For users depositing less than 32ETH there is a high probability that their stake will be supported by a single operator and therefore introduce the risk of single operator failure. In addition to single operator failure, all pool participants have an increased probability of experiencing correlated downtime or slashing across the different pools, in the event that a >⅓ proportion of pool operators have similar set-ups and client configurations.
Existing Eth2 staking pools can utilize an DVT configuration to add provider diversification compared to existing single operator setups in an attempt to further protect depositors against accidental slashing and downtime.
It is important to note that the migration to this type of configuration for existing pools prior to a 2-way Eth2 with existing deposits is not possible. Given the current roadmap of the Beacon Chain, existing staking pools who aspire to be decentralized or trustless will need to go through this product evolution.
At-Home Validators — Improving Infrastructure Resiliency
At-home validators that operate their own infrastructure without the support of a dedicated provider, often lack the means or technical capabilities to implement multi-level redundancy and additional security measures into their existing configurations.
The DVT middleware allows at-home validators to distribute their validator signing power across a distributed set of active-active redundant nodes, dramatically decreasing the risk of validator failure and associated downtime penalties.
Set-up for at-home validators could take place through an open-source DappNode like the package that would simplify the set-up process and include a maturing community to interact with if users came across difficulties.
Written by Mara Schmiedt and Collin Myers.