Eth 2.0 Dev Update #48 — “Eth2 Topaz Testnet Going Strong”

Raul Jordan
Prysmatic Labs
Published in
7 min readApr 24, 2020

Our biweekly updates written by the entire Prysmatic Labs team on the Ethereum Serenity roadmap.

🆕 Topaz Eth2 Testnet Recap

Topaz Testnet’s Successful Genesis Launch Event!

Our new Eth2 public testnet, the Topaz Testnet, had a very successful launch, with a ton of community members involved in making it a reality. The event was livestreamed on youtube here from a zoom call in which attendees were given proof of attendance tokens (POAP) for being at the Topaz launch event! The testnet was publicly available for anyone to join, and there was a period of 24 hours until midnight of the next day once the genesis validator threshold was reached, same as the real mainnet launch of eth2.

At the start of the chain, we only had around 67% control of the network, much lower than our previous testnet releases. Additionally, the Topaz testnet is special because it shares the exact same parameter configuration as mainnet — that is, 32 ETH deposits, same time-based parameters, and more. Block explorers such as Etherscan (https://beacon.etherscan.io) and Bitfly (https://beaconcha.in) updated their portals for Topaz.

We are very happy for the overwhelming support we received for this event, as it marks an important milestone for Eth2. This testnet is the basis for multiclient experimentation, although it is not the multiclient testnet. There will be another testnet restart which will be coordinated by multiple client teams and have various clients with equal participation at genesis, which is as close to the real thing as we can get before launch.

Validator Participation at 97%, Only 1 Finality Incident

Topaz has been running at an incredible pace of stability and a remarkable amount of participation far beyond what we imagined. We only control 59% of the testnet, and we are seeing 97.4% of active validators properly proposing blocks and voting on blocks consistently. We are seeing nodes with over 300 peers active in the network, while we only run a total of 8 nodes internally at Prysmatic Labs. The one incident that occurred which caused finality downtime was due to many nodes dying because of an experimental feature which was enabled by default. We quickly resolved by disabling the feature and notifying our users to update their nodes. Soon after, the chain reached all-time-high level of participation.

Consensus Bug in Topaz When Performing Interop Testing

While Topaz has been running, we have been working on the side extensively with the Sigma Prime team on getting their lighthouse client to interoperate with Prysm. After attempting chain sync with the testnet, Sigma Prime and EF researcher Protolambda discovered a state root failure, which typically points to a consensus bug. Upon further investigation, Prysm has an order-of-operations bug in its rewards/penalties computation logic, leading to divergence in states after block processing in both clients. Prysm was the source of the bug, meaning that lighthouse will not be able to sync with the topaz testnet without us coordinating a hard fork.

Realizing the gravity of the situation, it makes the most sense to focus on short-lived, private testnets for interoperability testing to ensure we iron out any consensus bugs. At the moment, we do not want to restart the Topaz testnet just to satisfy this bug, but instead will be working hard on the side to ensure client compatibility is top-notch. Once those items are resolved, we will announce a scheduled restart of the network where lighthouse will also be participating.

The EF Research team quickly stepped up to improve the rewards specifications in the spec to ensure there is more test coverage for these types of functions. Functions involving rewards/penalties, unfortunately, have a lot of potential edge cases that are very hard to exhaust through unit tests.

📝 Merged Code, Pull Requests, and Issues

Memory leak in node identified and patched

An unfortunate consequence of the initial Topaz launch was the huge increase in memory nodes were seeing over the course of a few days. We tried pinning down the issue but unfortunately it was difficult to diagnose. Over time, we realized that nodes with the ` — disable-new-state-mgmt` flag performed far better in memory usage than regular nodes. This feature was causing a significant memory burden because it would be keeping a lot of copies of the beacon state in-memory without properly removing them from a cache. After disabling this feature, all of our prod nodes went back down to < 1Gb memory and users’ nodes also stopped hogging computer resources. With further investigations, team root caused the memory leak happened due to unnecessary state copies when verifying an incoming attestation. In the eth2 world, attestations flow more frequently than blocks. If for every incoming attestation, the node needs to copy the beacon state just to verify the attestation, the node will soon run out of memory. The issue has been resolved by #5584 and ` — enable-new-state-mgmt` is ok to be used again.

Better handling of eth1 chain downtime

To all of our users running our testnet, we provide them access to our own eth1 nodes that are running the Goerli testnet, which is used to onboard validators into our eth2 testnet. Nodes constant access so they can track chain logs and block heights of deposits made to the deposit contract in eth2. Validators include the latest eth1 block hash and other metadata as part of blocks in eth2, which undergo a voting process to determine the best block. If there is any downtime from eth1, our validators would not be able to propose blocks successfully, making eth1 a single point of failure. If this happens in reality, we should instead include some random data into the eth2 block. This way, the chain will not stall, validators will still get rewarded, but the eth1 data voting process will be halted until the eth1 chain is back up. Our teammate Preston worked on resolving this issue and it is now included in the latest master branch of Prysm here.

Revamped the documentation portal for Prysm

Our documentation portal became one of our biggest action items during the topaz launch. It was really critical for us to have clear support for the various operating systems used to run Prysm. As part of our knowledge-base, we also made our internal incident reports and testnet postmortems publicly available in our docs portal here. We also now have revised instructions for the various kinds of operating systems, including docker instructions for windows available here. Thanks to Celeste, our docs expert, we continue to grow our knowledge base for anyone running Prysm and getting involved in eth2.

🔜 Upcoming Work

Getting rid of the archival service in beacon nodes

With the capability of generating any historical state at any arbitrary slot, we are excited to finally get rid of archival service. This PR has been pending for more testing. As soon as we are satisfied with the performance, the PR shall be merged and archival service should be deprecated in the canonical code base. This radical improvement will make the code base cleaner and require less service in the run time.

Eth2 API standardization for mainnet

Over the last year, we have collected user feedback and suggestions around the need for ETH2 data API access. In this time, we have launched our v1 alpha version of EthereumAPIs which has enabled block explorers, staking pools, and data interested developers to access the information they need to build upon phase 0. As we work towards our v1 beta and mainnet launch of phase 0, our team has been carefully collaborating with other teams and users to find an excellent minimal API that fulfills the needs of data consumers in ETH2. If you have feedback, comments, concerns, or suggestions around the Prysm API, let us know in our discord or on github.

Interested in Contributing?

We are always looking for devs interested in helping us out. If you know Go or Solidity and want to contribute to the forefront of research on Ethereum, please drop us a line and we’d be more than happy to help onboard you :).

Check out our contributing guidelines and our open projects on Github. Each task and issue is grouped into the Phase 0 milestone along with a specific project it belongs to.

As always, follow us on Twitter or join our Discord server and let us know what you want to help with.

Official, Prysmatic Labs Ether Donation Address

0x9B984D5a03980D8dc0a24506c968465424c81DbE

Official, Prysmatic Labs ENS Name

prysmatic.eth

--

--