Eth2 Testnet Updates
More than double the validators expected at mainnet launch already running in the testnet
At the time of writing, our eth2 testnet for our Prysm client has more than twice the number of genesis validators there will be at the launch of eth2 mainnet. That is, Prysm can comfortably handle more than enough required to launch the system with no hiccups. We now have 35061 active validators and 1462 more pending activation into the blockchain.
Awesome tools such as https://eth2stats.io created by the Alethio team has given insight to users as to how their node is faring compared to others, and the fantastic block explorers https://beaconcha.in and https://beacon.etherscan.io have added even more useful features to see what validator your eth1 deposit information corresponds to.
First voluntary exit included on the blockchain
One of the key features of eth2 phase 0 is that validators will be able to voluntarily stop their duties and withdraw their balance onto a shard for transferability and usage down the line. This “exit” operation is generated by a validator which is then processed by the chain until it is immutably stored in a block forever. We have finished implementing the entire voluntary exits workflow in Prysm and the first ever exit occurred in our testnet in block 124256. Validator with index `0` in the validator registry exited the chain and was able to withdraw the validating ETH approximately 2 days after the exit was completed:
We consider this an important milestone because it gets us closer and closer to the real thing, allowing users to experience the full extent of the features available in phase 0 of eth2.
Massive improvements on sync
From 0.2 blocks per second to now > 20 blocks per second
The moment we launched our mainnet capable testnet, we weren’t certain on syncing benchmarks, knowing that we would probably have a bunch of critical bottlenecks that users would point out. Surely enough, we went from having 0.3 blocks per second during the first week, all the way to seeing greater than 20 blocks per second on our consumer laptops after a few recent improvements. We’re confident there is still a long way to go and know we will get there eventually, to the point where the bottlenecks feel non-existent.
After radically improving our caching, creating background workers that offload repetitive computation into a single goroutine allowed us to further scale our validator count and survive difficult scenarios such as a high volume of skipped slots. A big thing holding us back from 100 blocks per second or even more is the lack of immutable, native types in Go, forcing us to copy huge amounts of data for some of the most mundane operations. Several of our users were even seeing 6Gb+ in memory usage of their beacon node, making it completely unrealistic for the typical staker. Needless to say, our top priority as we become more production ready is to ensure we are competitive with user expectations for running a node in eth2, and we’ll use the awesome user feedback we’ve received so far to reach that goal.
Merged Code, Pull Requests, Issues
New, Super-Optimized Fork Choice Rule
We have adopted Protolambda’s optimized fork choice algorithm called proto-array. Previously, we used the naive spec implementation plus a few minor optimizations tweaks, but it was simply not enough to sustain 30k validators as we have seen CPU utilization goes as high as 140%
As we were looking for a new fork choice optimization, the proto array version caught our eye as it was simple and yet easy to reason about. With the help of Paul Hauner (Sigma Prime) and Proto himself, we were able to implement the new proto array fork choice within a week. The end result was nothing but impressive, cpu utilization went down to 40% and determining the chain head is now just an O(1) look up. The proto array implementation has been merged to master and gated by feature flag ` — proto-array-forkchoice`. Please give it a try!
End to end tests now include feature configurations + test initial sync
Prysm has been E2E testing every PR for a few weeks now, although it was limited to mainly testing finality and that the chain runs for a few epochs. Ivan has toughened the E2E testing to make sure validators are all properly acting and being rewarded using our beacon chain API. It also now tests any new feature flags (experimental features, enabled through flags) that benefit from testing in runtime.
There has also been an E2E sync test added that makes sure initial sync reaches the same head as the beacon nodes it’s connected to. All these changes ensure that our sync works as expected and Prysm as a whole has less bugs that reach production!
Validator slashing protection completed and running in Prysm
While slashings very rarely occur in normal runtime conditions, we have implemented simple local slashing protection into our validator client. It’s currently experimental, so enable it with the “ — protect-proposers” and “ — protect-attesters” flags. The validator client database will be saved to where your “datadir” is configured.
With this protection, a validator will not sign a slashable message based on its local past history, so it won’t protect from incidents like having 2 concurrently running validator clients representing the same public key. But it certainly will protect from a faulty or malicious beacon node trying to request a slashable message from your client.
Using a new, custom state data structure for improving beacon node memory usage
The big focus for the team this week was to create a custom data structure for the beacon chain state that we could use for better immutability, handling of required copying, and for faster state root computations. Previously, we were using a giant protobuf struct that didn’t bring us much benefit, as it was fully mutable. Instead, we took advantage of Go’s features and created a data structure with unexported fields, allowing them to be accessed only via copying. That is, if you want to fetch the state.FinalizedCheckpoint(), the getter will return a full copy of that information alone, without resorting to expensive generic copying from functions such as proto.Clone. We also refactored mutating of this state so that computing the state root would be way more efficient than before.
Unfortunately, moving from directly accessing data structure fields into a pattern of getters and setters was not easy at all. We had to almost refactor our entire beacon-chain folder in Prysm, leading to a week of headaches and difficult bugs. After that gargantuan effort, the feature has been completed and we are consistently seeing higher blocks per second during sync and much lower resource consumption than before. We believe this is a step in the right direction, and will be further optimizing the immutability of this structure in future PRs.
Persisting eth1 data deposits to disk to prevent recomputing them on node restarts
In our previous testnets whenever we started up our node, we would request all deposits from the start of deposit contract deployment. This was not an issue as we usually only had 700 deposits in our minimal testnet. However, after upgrading our testnet to match the mainnet version of the spec, the required number of deposits to process at startup became close to 25,000. This required an alternative approach to what we had before, so instead of re-requesting deposits from the start each time we started up the node, we saved the deposits at checkpoints of every 100 logs. This allowed us to start up quickly and only request eth1 blocks/ deposits logs from our previous checkpoint, shortening node start up time with the mainnet spec by a great deal.
The first, end-to-end slashing event triggered on an eth2 testnet
Slasher has been at work for some time at within Prysm and the client has detected quite a few slashing offences in the old testnet. Running it in our mainnet config testnet with over 20000 validators didn’t break anything, but memory requirements did get rather high at ~16 gb. We still have several optimizations to add that can make it much lighter, but at this stage we are focusing on making it fast and accurate.
Our hash slinging slasher already detected a few slashing offenses on our latest testnet and now we need to include the detected events into a block. This milestone is exciting as it completes our slasher implementation life cycle, satisfying all the requirements to handle bad actors in real conditions. Preston already made a modified validator that can act malicious and try to break the network. Seeing it get slashed and kicked out is going to be exciting progress!
Handling an even larger number of validators and beacon nodes in the network
Now that we’ve stress tested handling the mainnet number of validators in Prysm as part of our testnet, we need to simulate a more “real world” environment of having many beacon nodes distributed across the network. Our nodes currently have anywhere from 40 to over 110 peers, although the network mesh is pretty centralized by our nodes being some of the first to be discovered after users dial our bootnodes. Instead, we want a more distributed yet nearly fully connected network that can maintain resilience. We might end up seeing a lot more forking scenarios than we expect, and we’re excited to see the power of the fork choice rule in Prysm redesigned by Terence to accomplish this.
Interested in Contributing?
We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).
Official, Prysmatic Labs Ether Donation Address
Official, Prysmatic Labs ENS Name