Eth 2.0 Dev Update #44 — “More Optimizations”
Our biweekly updates written by the entire Prysmatic Labs team on the Ethereum Serenity roadmap.
🆕 New Teammate
We are happy to announce after receiving a very talented batch of applicants, we have hired our new full-time teammate at Prysmatic: Victor Farazdagi!
Victor previously received his master’s degree from Georgia Tech, worked as a lead Golang engineer at Status, and has experience extending the go-ethereum client’s LES and Whisper protocols. We believe his skillset will ramp up the quality of our Prysm client and offer new perspectives to solving challenging problems in the eth2 research space, welcome Victor!
📝 Merged Code, Pull Requests, and Issues
BeaconState redesign: partial copy and shared references
The beacon state management is the root of many of our performance bottlenecks since configuring the testnet to run on a mainnet scale. As such, we have been working diligently to find ways to shave off kilobytes of redundantly allocated data to improve sync times, CPU load, and overall memory usage. Go, the programming language Prysm is written in, has no concept of fully immutable data structures. What this means is that two or more copies of an object may reference the same underlying object so when you mindfully mutate a copy of the object, you might be mutating another copy in some unknown routine! The only way around this is to provide a deep copy of data structure, however this becomes expensive with the beacon state as it is copied dozens of times throughout various routines and is relatively large and growing over time. So we have designed a mechanism by which we utilize a hybrid of deep copy, shallow copy, and copy-on-write strategies for sharing beacon states across multiple routines with minimal overhead.
This feature along with the recent full beacon state management rewrite PR have been an incredible improvement on memory and CPU usage. Check out the full design doc here and implementation PRs 4699 and 4785.
Comprehensive storage improvements
Prysm has an experimental flag for greatly improving sync times by storing more states in memory than on disk. One major bug we found was that we were holding onto many of these states in memory after initial sync was complete. These states only served a purpose during initial sync and often accounted for gigabytes of allocated RAM. This issue was easily resolved upon discovering it.
Other observations we have seen are that the cached states in memory can sometimes grow to the point of exhausting the syncing node. This issue is being tracked in issue 4813 and should be resolved soon. After resolving that, we can enable this in memory state management during initial sync for all nodes by default. For now, it is only enabled with a feature flag `— initial-sync-cache-state`.
Fork choice radical improvements
As we have mentioned in previous update, switching from a naive fork choice implementation to proto array fork choice implementation has significantly improved fork choice performance and lowered CPU utilization. With the recognization on this significant improvements, we have gone on to make proto array fork choice the default fork choice on the Prysm beacon node. See #4778 for further detail. Doing so, we also got rid of 4000 lines of code doing so. Hooray for code health improvement!
In parallel, we also have been making micro improvements in our block chain service code to better use this new fork choice implementation. One good example is a node often checks the validity of an object by whether the corresponding block has been successfully processed. The naive way is to check whether the block exists in DB. Another less known way but a more efficient way to check if the block exists in a fork choice store. Check fork choice store is ~10x faster than DB. See #4821
BenchmarkHasBlockDB-12 381 ns/op
BenchmarkHasBlockForkChoiceStore-12 40.0 ns/op
Better attesting summary reporting
Stakers often ask “Why aren’t my validators making $$?” That is a fair question but also a hard one to answer without dissecting into details on how validators attested. We updated the validator performance API to include useful information such as `inclusion_slots` and `correctly_voted” fields. This enables stakers to query information like the slot of when the validator’s attestation got included in the chain, or whether the validator correctly voted for source, target and head. We started working on this by updating the API definition #103
We then updated API implementations to support this feature #4845
With the log:
“[2020–02–12 12:48:00] INFO validator: Previous epoch voting summary correctlyVotedHead=false correctlyVotedSource=true correctlyVotedTarget=true epoch=7619 inclusionDistance=1 inclusionSlot=243747 pubKey=0x875ce51a3821e5a2”
Which can be read as:
“Validator 0x875ce51a3821e5a2’s attestation from previous epoch was included in block at slot 243747, it took 1 slot for this attestation to get included. The attestation correctly voted for source, target and incorrectly voted for head”
Another log that showcase summary for total validators:
“[2020–02–12 12:48:00] INFO validator: Previous epoch aggregated voting summary attestationInclusionPercentage=0.64 correctlyVotedHeadPercentage=0.81 correctlyVotedSourcePercentage=1.00 correctlyVotedTargetPercentage=1.00 epoch=7619”
Which can be read as:
“Out of all the validators in this client, 64% of their attestations got included in blocks. Out of the attestations include in the block, 100% voted correctly for source and target. 64% voted correctly for head”
Safer concurrency in Prysm
There are thousands of things happening in a beacon node every second, with validators submitting blocks, peers requesting and receiving data via p2p, making sure fork choice is being applied, tons of reads and writes from caches and the database, etc. Go is an excellent language for blockchain clients because of its powerful concurrency primitives such as channels and goroutines, making it easy to spawn concurrent processes which communicate with each other easily. A common requirement when designing concurrent processes is to declare locks around certain values when accessing or writing to them. Unfortunately, excessive use of locking can lead to higher latency and other problems if the code is poorly designed. We have been working hard to ensure safer concurrency in Prysm through careful use of locks when we need them and ensuring there are no race conditions throughout our code base. We improved concurrent access in fork choice here: #4784 and attestation concurrency here: #4833 which we believe makes Prysm safer under conditions of high validator per beacon node load.
Block tree visualization
Team member Terence has implemented a graphviz visual display of Prysm’s fork choice voting algorithm in real time. This tool has been great for debugging and for satisfying our immense curiosity of the current health of the testnet. Visit http://localhost:8080/tree with your local beacon chain node to see this block tree for yourself!
Slashing operations pool for Prysm beacon nodes
Our testnet previously was unable to insert slashings into a block, but that is soon going to change with our recent slasher progress! Beacon chain nodes now have a slashing pool that proposers can query for any pending slashings that need to be put into a block.
Our slasher is currently undergoing a reorganization (more details later), but once that is complete we will see beacon nodes adding slashings to the pool for insertion into blocks!!
🔜 Upcoming work
Real slashing occuring in the Prysm testnet
We’re super eager to ship our slasher implementation, which is essentially a watchtower for beacon nodes that detects every block and vote that comes through the node and is able to discern slashable events. If one is found, it then relays a packaged proposer_slashing or attester_slashing object back to the beacon node to be applied to validators via the state transition function. Unfortunately, slashing is still a super expensive operation and our current slasher design needs a fair bit of work before we are comfortable running it in the testnet. Now, we are working on a tracking issue that improves slasher via careful refactoring, testing, and benchmarking
We believe within the next two weeks we will be able to confidently start including slashing operations in beacon blocks in our running testnet.
Comprehensive bug fix roundup, optimizations, and multiclient testnet work
We need multiclient to be stable before we can consider a mainnet launch of eth2, and with various teams improving their production readiness, being on the latest spec versions, and aligning on networking, we believe things are ready to ramp up the effort. We opened up a tracking issue above to get all the missing TODOs done before focusing on a multiclient effort over the coming weeks. Not only is it important for another client to sync with us, but also for that client to product blocks and successfully contribute towards finalizing the chain. Having an evenly distributed multiclient testnet in terms of nodes and validators would also be a worthy goal to aim for before the real launch, and we’ll be working with other teams to achieve this goal carefully.
More memory optimizations
As we are looking forward to apply more optimizations, to lower the memory and CPU usages when running Prysm nodes. The biggest bottleneck we are working to solve is utilizing more shared references for our commonly used arrays. Constantly copying large arrays leads to a larger amount of memory required to run the beacon node, and can lead to frequent OOMs. Another method we are looking at is pooling together and re-using memory from already destroyed objects so that we do not need to allocate more memory when creating new objects. Since go runtime does not release memory back to the OS immediately, when frequently creating objects with large memory footprints this can lead to OOMs; it would be good to recycle this already allocated memory instead. Expect more update on this next time!
Miscellaneous
New awesome features added to Bitfly’s eth2 block explorer for the Prysm testnet
The awesome team at Bitfly has been working hard on adding new features to their fantastic ETH2.0 block explorer. They now have support for a few charts about the beacon chain, including great stats on block proposal history, active validator count, and more! They also have a great realtime block visualization and a validator leaderboard! 🎉🎉🎉Thank you Bitfly!
New awesome features added to etherscan eth2 block explorer for Prysm testnet
Another excellent block explorer for our Prysm testnet is Etherscan! They’ve added a really useful tracking site for ETH1 deposits, very neat for anyone curious on the kind of deposits being made regularly.
On top of this they have a dope finality history tracker that records historical finality history so it’s easy to tell when finality or performance issues occur. Very useful for our testnet while we’re working on fixing current finality issues :). It also shows the gaps in finality at any given moment of the chain. Thanks Etherscan!
See us at ETHDenver! 👋
Several members of the team will be around ETHDenver this weekend. Come find Raul, Preston, Terence, and Ivan during the event! Let us know if you plan to build something ETH2 related to claim the Ethereum Foundation sponsored additional bounties!
Interested in Contributing?
We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).
Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.
As always, follow us on Twitter or join our Discord server and let us know what you want to help with.
Official, Prysmatic Labs Ether Donation Address
0x9B984D5a03980D8dc0a24506c968465424c81DbE
Official, Prysmatic Labs ENS Name
prysmatic.eth