DVT landscape part 1

MarkoInEther
14 min readDec 12, 2022

--

Separating ETH stakes into juicy cuts…

DVT stands for Distributed Validator Technology and lets you split your Ethereum validator into several parts, distribute those to different operators who will then jointly run it. It makes your validator decentralised & resilient, not going offline or getting slashed when the minority of operators misbihaves.

For a more in-depth read on DVT, check out What is DVT and why do we need it or see its implications in The future of DVT…🚀.

Current Landscape

At the time of writing, there are two industry leaders developing DVT solutions, ssv.network and obol.tech. Both are currently on testnet with plans of mainnet launch in 2023.
In this article, I will compare and contrast these solutions highlighting their similarities & differences. Recently, the founders of both teams put out very good articles explaining their respective design decisions.
Obol’s Designing Non-Correlation: Deep Dive Into DVT and Charon’s Architecture and SSV’s Designing SSV: A Path to distributed staking infra for Ethereum.

Acknowledgements

Great thanks to the founders, Alon from SSV and Oisin from Obol for their valuable inputs & feedback regarding architecture & design of their solutions.
For general feedback my thanks go to Isaac and Jorge from Nethermind, Ariel from Blox and last but certainly not least to Robert for stylistic and grammar corrections.

Disclosure: I am currently working as a contributor for SSV DAO. I have strived to represent both solutions to the best of my ability but am cognizant of my inherent bias.

Table of Contents

· Architecture
Overview
Approaches
· Networking
Obol
SSV
To summarise
· Software Stack
Consensus
Client software
SSV
Obol
· Key management
SSV
Obol
· Risks, Edge cases, and how to avoid them
Loss of validation key
Leakage of validation key
· Tokenomics
· To Conclude

Terminology

  • Validator — A node in the Ethereum network that is staking 32eth and participating in the network consensus, thus earning staking rewards.
  • Staker — User renting validation services from operators. Usually, owners of the Ethereum Validators.
  • Operator — Entity running a validation service, running Ethereum validators on the staker’s behalf.
  • Cluster — A group of operators running single or multiple validators together.
  • Validation key — Key for signing validator duties.
  • Partial key — Partial validation key used by operators to sign validator duties. A threshold of these partial signatures must be aggregated to reconstruct a valid signature.

Architecture

Both SSV and Obol are under heavy development, and the current state of both networks is not final. Thus, depending on when you are reading this, the following info may no longer be accurate. I will update the blog on a periodic basis to reflect reality.

Overview

In Obol, operators run validators in isolated clusters using their middleware client combined with a regular validator client. For operators to be able to coordinate around setting up a new cluster, Obol developed a dedicated website called the DV Launchpad. Unlike SSV, it does not have its own token. To facilitate payments, Obol is developing a set of smart contracts that will allow for the distribution of rewards directly from the withdrawals once enabled. Until then, Operators and Stakers must deploy custom smart contracts to handle payments or handle them in an off-chain manner.

SSV network consists of a set of on-chain smart contracts, the native SSV token, and SSV validator clients connected via a permissionless public network. Smart contracts handle the discovery, coordination, and transfers between stakers and operators. The SSV token is used for payments. Operators run SSV validator clients that connect together via a P2P pub/sub network. They dynamically form clusters based on stakers’ requirements expressed via smart contracts and perform validator duties on their behalf. The payments for their services are done in SSV tokens and are facilitated by the SSV smart contract.

Approaches

Seemingly the same goal lands itself in two quite different approaches in architecture and design decisions regarding both tech stacks. There is much that SSV and Obol have in common, but I will focus on the differences between them for greater clarity. It is Illuminating to look at their websites.
SSV’s punchline is “Distributed Validator Infrastructure for Developers’’ and Obol’s “Distributed Validators for Ethereum”. Attentive readers will notice the main difference is SSV’s focus on “Infrastructure for Developers”.
To achieve this goal of becoming a decentralised infrastructure for staking applications, SSV decouples staking into capital allocation (stakers provide Eth) and validation services (operators performing validation duties).

Obol, on the other hand, puts heavy emphasis on non-correlation, making different clusters completely isolated from each other. This approach seems to be suited for home stakers and SaaS operators, who can now decentralise their infrastructure.

Let’s look at their engineering focus and design decisions:

SSV consists of a public operator network to which anyone can outsource their validation duties to by interacting with the SSV smart contract. Basically, you control your validators but let the network run them for you.

The upside of SSV having a public network is that stakers can provide their ETH, and operators can offer their services without any prior communication & coordination. This approach offers considerable flexibility both for operators ;
who do not need to coordinate to form clusters
and for stakers;
who do not need to communicate with operators or pick predefined clusters. They can pick any operators with the cluster being automatically created.
This allows for smart contract-only staking, making it usable for DAOs, dApps, and other automated solutions.

Obol does not have a public network but rather uses isolated operator clusters because of their heavy emphasis on non-correlation. This has the benefit of avoiding any network-wide level issues and should require a smaller bandwidth since operators only communicate with nodes they run the cluster with without receiving or broadcasting third-party messages.

Networking

Obol

In Obol, the validator clusters are isolated from each other and do not pass any messages to other clusters. This means there is no danger of a network split since there is no network, only isolated clusters.

This much simpler networking topology of isolated clusters is easier to reason about and should require less bandwidth. The drawback of this approach is less redundancy and resiliency. If an operator temporarily loses direct connection to his cluster peers, he will not be able to get the undecided duties and sign them, since no other operator outside the cluster knows the current state.

This is in contrast to SSV, where an operator can be served current undecided duties by any other operator in his subnetwork (subscribed to the same topic). In Obol, operators are not expected to run a large number of clusters since users will be mainly choosing between predefined ones. However, if this turns out not to be the case and an operator needs to run a large number of different clusters, they will have to maintain a large number of open P2P connections, which may become unwieldy.

Also, the tradeoff for having all clusters isolated is that operators need to coordinate with both other operators and stakers before running one. It is also not possible for stakers to choose specific operators without their explicit agreement to create a new cluster or to change current cluster operators. However, changing operators is on the roadmap so stay tuned.

SSV

For all payments, key distribution, and operator management SSV uses a set of smart contracts. This leverages the immutable and uncensorable properties of Ethereum, ensuring that validator data needed by operators is impossible to censor and that there are no communication bottlenecks and single points of failure in the system. Operators need to listen to these contracts which provide them with all the necessary data (partial keys, validator activation status, cluster changes) to run validators while streaming payments for their services.

When an operator is instructed by the smart contract to run a new validator, they download all the data from the chain and coordinate with other operators via a public network that all operators are a part of. This operator network utilises pub-sub protocol on top of P2P. Peer-to-peer communication between the nodes is encrypted and signed, making it easily verifiable and attributable. Pub-sub stands for publish-subscribe networking. Operators publish new messages and subscribe only to relevant topics(subnets) they want to receive messages from. To be specific, every time a new validator is added to the network it is deterministically assigned to one of 128 consensus subnets based on its public key. Thus, an operator subscribes to a subnet only if it operates a validator that is assigned to it. In the previous version of SSV, there was a special subnet for each operator. This old approach was problematic, since for each new validator a new subnet had to be created, creating a network with many small subnets resulting in non-reliable message propagation across the network.

In the V2 design all messages are propagated by all subnet participants, making the network more robust in two crucial ways — 1 not requiring a direct P2P connection amongst cluster nodes, 2. Making DOS attacks harder, since there is no direct link between the message and its sender.

There is also one special “decided subnet” that all operators subscribe to. It contains all fully signed messages (consensus subnets contain partially signed ones) and allows for more robustness and faster syncing times.

This more mesh-like design allows for greater flexibility. Stakers can choose and change operators on-the-fly, without any coordination. Operators can join the network and start offering their services without any prior coordination with stakers or themselves. Stakers are not forced to predefined clusters but can choose any operators they wish.

Another important benefit of having a public network is the observability nature we know and love from blockchains. Dashboards, analytic tools, and valuable resources can be built using these data. This allows for easy monitoring and rating of all the operators, building various dashboards and tools. This may seem like a small thing, but can actually bring huge benefits down the line for stakers and the whole ecosystem. It can allow for high-trust anon Operators, where the trust is based on their performance alone. Something we are missing dearly in the current staking landscape, where “trust me, we have a good reputation” is the default modus operandi.

Having a public network also allows for dynamic cluster creation, removing the centralization bottleneck and allowing the network to adapt to operators being compromised, going rogue, or being coerced by powerful actors. You can easily switch operators fully automatically without any need for coordination.

The potential drawback of this solution also comes from this greater flexibility. In a public mesh-like network, it is possible to have network-wide issues, such as network splits or network wide attacks such as amplification attack affecting the whole network.
The bandwidth load will also be higher due to network coordination messages and the greater number of messages passed.

To summarise

Both SSV and Obol are bringing robustness and fault tolerance to etherum staking. Obol heavily focuses on non-correlation, and separation of concerns, both on the network and software levels. SSV focuses on creating a permissionless and distributed staking network for outsourcing staking duties and creating an open marketplace for staking services.

Software Stack

Consensus

Both SSV and Obol use the same QBFT mechanism with committees of 3f+1 total operators. This algorithm tolerates up to f of the operators being Byzantine without affecting consensus. Practically it makes for clusters of 4(1 byzantine), 7(2biz.), 10(3biz.) etc. This consensus mechanism has a restriction of requiring a leader for each voting round. Thus, if the leader is not available, there will be a small time lag in performing validator duties. This mechanism is not developed by the DVT teams, and there is already work being done on a new leaderless mechanism which is on the roadmap to be implemented into SSV once ready.

Client software

SSV

SSV uses a custom validator client built with the go-eth2 client library. This gives it better flexibility since the validator client can be adjusted based on the network needs and does not need to sync with another client software to perform validator duties. Besides performing regular validator duties, it is handling changes in validator clusters, adding and removing different operators to them.
To secure the validator keys, it will use a remote signer architecture allowing for complete separation between validation duties and enabling third-party signers such as Web3Signer.

The benefit of this solution compared to middleware is that there is no additional software that needs to be run to coordinate with the validator node, hence less overhead and potentially better performance.

Obol

Obol runs as a middleware, meaning it is a separate software run in parallel with a standard validator client. This has the benefit of separating concerns of message aggregation and signing.
Thus, even if buggy, Obol middleware cannot cause slashing event since it does not hold the signing key. Slashing can still take place but is dependent on both the validator client, as well as the middleware being buggy.
Another benefit is less development overhead, leaving the maintenance of the signing duties and slashing database to the teams building the validator clients.
Their client is currently compatible with Teku and Lighthouse validators. Also, there is currently an issue regarding signature aggregation timing that should be fixed with this PR.

Key management

SSV

SSV currently handles key management by letting the staker split the validation key into parts. These are encrypted to the public keys of the chosen operators and submitted on chain by calling registerValidator() on SSV smart contract.

This offers large robustness benefits for the network and its operators since all validator data can be fully reconstructed from the blockchain alone. Thus, in a doom’s day scenario, even if the operator loses all its data and all the validator keys, they can reconstruct everything just from the on-chain data, requiring them to only safely backup and hold their one operator key.

This makes the life of an operator much easier and the key management much safer since they don’t need to back up every new validator partial key and send its copy over the internet to a different secure location for safekeeping.

SSV is also working on implementing zero-coordination Distributed key generation (DKG). This will allow the validator key to be created & managed directly by the operators and will remove the requirement for the staker to perform the key splitting and custody. This DKG will be fully automagic and will not require any manual coordination from either stakers or operators.

This unlocks new use cases, such as completely trustless liquid staking derivatives or smart-contract-only staking services essential for DAOs and other purely on-chain entities.

Obol

Obol runs its clusters only under the DKG paradigm. Their setup currently requires manual discovery and coordination of all parties running the cluster and a higher standard for running an operator node since neither the staker holds the key nor is it available on the blockchain.

DKG comes with the huge benefit of not requiring any party to have control of the validation keys, but also with a larger responsibility in terms of key management. In their design, neither staker holds the the keys nor are they available on-chain. Thus, putting the full responsibility of storing and backing up validator keys on operators. This should not pose any problems under normal circumstances but makes their system more trustful from the security perspective.

Risks, Edge cases, and how to avoid them

Loss of validation key

In Obol, if 2 partial keys were to be lost at the same time (n=4) due to a flooded data centre or whatever else and there is no backup, the validators affected are effectively dead, since the key is unrecoverable and there is no one else holding it.

There is currently no way to fully mitigate this risk other than choosing your operator cluster very wisely and requiring operators to hold keys in multiple locations, have strong backups and recovery processes in place. This risk would be eliminated if & when a proposal to trigger voluntary exit via smart contract is implemented (0x03 withdrawal credentials or similar). This will allow the end user to exit his validator directly from the execution layer.

In SSV this is not an issue, since the full validator key is both held by the owner of the validator and can be retrieved by the operators from the blockchain. Thus, the loss of 2 or even all the key parts is not an issue in SSV. If you use DKG in SSV, the full key will no longer be held by the staker, but the encrypted key shares will still be retrievable from the blockchain. Thus, operators can lose all of their validator keys with no affect, they only need to take a good care of their operator key.

Leakage of validation key

To put things into context, currently, if a validation key gets compromised, it could result in bribery attacks or slashing, as no staking service uses DVT and holds the full validation key. With DVT the attacker always needs to get his hands on the threshold of partial keys which makes the event much less likely. To illustrate the point, if we think that there is a 1/1000 chance that a key gets compromised in the next 5 years, the chance to compromise the lowest possible threshold (3) is a million times harder than the status quo and a chance of it being compromised is 1/10⁹, a very, very small.

Granted, this presumes that key leakage events are uncorrelated, which may not be the case. A zero-day code vulnerability in any part of the software stack could affect all nodes running it. To prevent this from happening, both teams push towards client diversity by making their solutions compatible with different execution clients, beacon and validator nodes.

Both focus on separating private key management from other parts of the stack. Obol does this by not keeping the keys in their middleware and letting the validator client handle them and SSV by using a remote signer, a special software enclave that handles signing duties and already has multiple implementations.

Putting encrypted key shares on-chain brings a huge amount of robustness to the SSV system as stated in the previous section, but there is some tradeoff. Given the low probability of this happening, it seems a reasonable one.

There is an attack vector where an attacker would try to get a hold of the old operator’s private keys. This may be easier to do for inactive operators since they do not have much incentive to secure their old keys at that point. If an attacker were successful they could decrypt all the partial keys held by this operator.

Still, this would need to be done for at least three operators to have a chance of success and they would have to had run clusters together in the past, in order for that attacker to reconstruct the full key.

Even if an attacker manages to do so, this validator would still need to be active at the time of the attack, which is most probably not the case since we are talking about old inactive operator keys that are no longer being used.

However, the prudent thing for a staker can do is to pick at least two, highly reputable operators that have a good track record and can be expected to have good security practices and operate long into the future.

This edge case could be avoided altogether by being mindful of this fact and rather exiting validators in case they were forced to replace more than a threshold of operators (currently three), due to their inactivity.

A more realistic scenario stakers should be mindful of, is not to change all operators in a cluster at once, as this is effectively giving the key into the hands of both old and new clusters. To prevent this problem altogether, always change a strictly smaller number of operators than is the consensus threshold (currently three). Replacing one or two operators is safe and should be more than sufficient to switch a non-performing operator. If more than two need to be switched, it would be advisable to exit the validator and re-enter again with the new one.

It is important to keep things in perspective and not go overboard when thinking about the edge cases, since both of these solutions offer an orders-of-magnitude improvement over the status quo.

Tokenomics

At the time of writing, Obol has no token, hence this part will cover only SSV. Also, NOTE that SSV tokenomics is currently in active development and will most probably undergo important changes.

The SSV token serves three main functions, payments for validation services, a network fee, and governance:

  1. Facilitating payments from stakers to operators for validation services. Since withdrawals are not yet enabled, it is currently not possible to pay the operator from the staking rewards directly.
  2. Network fee. This is a part of the validation service fee paid by the stakers. It goes directly to the DAO treasury and is used to fund the protocol development.
  3. Governance token. It serves as a standard governance token, with similar mechanics as can be seen in many other protocols.

For more information, check SSV tokenomics documentation.

To Conclude

To conclude, I would like to stress that both of these solutions are an orders-of-magnitude improvement over the status quo. I hope I have been able to illuminate the main differences in this article.

In Part 2, I will focus on more practical questions such as how you can currently use each of the protocols, what is their current state of development and when mainnet.

--

--

MarkoInEther

Passionate about pushing my limits, self experimenting, bio-hacks, complex and distributed systems, mechanism design.