Taraxa Testnet UPD.
Pre-launch node testing in progress: debugging the testnet synchronization.
Moving forward with our pre-launch testing, we’re increasingly seeing a lot more performance issues rather than core consensus ones, which is very good news. In many ways, we were able to observe these synchronization problems thanks to your active involvement in running the Taraxa nodes, which generated lots of data helping us to identify and debug. As of the time of writing, we’ve restarted the testnet and now debugging on the node synchronization and stalled issues, with a few big patches coming up this week.
The two main reasons for the node downtime surfaced last week:
- A bug introduced in the last patch that caused forking in the consensus was a very simple fix — we’ll keep an eye on it to make sure it doesn’t occur again. This fix is in the current image as of this writing.
- A node synchronization performance issue in which we needed to maintain a block-tx relationship in the DB to accelerate DB access (by over 20x) during synchronization, and a variety of other common-sense rules to keep nodes from being overwhelmed during syncing (e.g., limit the # of syncing nodes per node, throttling the syncing bandwidth) as we observed the consensus nodes were being overwhelmed by sync requests. These fixes are not yet in but are being worked on after we diagnosed them in a series of latency characterizations.
- Debugged and investigated the testnet PBFT chain forking issue.
- Implemented the PBFT never terminate cert voted value. Using a single variable the last cert voted value instead of cert values for all rounds. If node cert votes a value, it continues to vote on the value until added into the chain.
- Fixed a bug in PBFT: the last cert vote value is not always equal to the previous round's next voted value because the previous round's next voted value could be NULL BLOCK HASH.