Recommended inaugural producer settings for EOSIO 2.0.3 upgrade

eos sw/eden
6 min readFeb 26, 2020

--

Starteos has helped us translate this article into Chinese, it can be found here.

Introduction

Version 2.0.0 of eosio was released ~6 weeks ago. Since then several of the block producers and standbys have tested and worked in collaboration with block.one to ensure that the EOS mainnet can be upgraded in a safe and secure manner; without risking to run over slower nodes with too heavy blocks (referring to cpu and net usage) that could make it more expensive to keep nodeos synced with the chain.

The two parameters discussed in this post make it possible for the producers to increase and decrease the weight of their blocks at will. With 2.0.4 it is highly likely we will receive even more enhancements that will help producers produce more full blocks, send out their blocks faster and produce blocks immediately when exhausted.

These are excellent enhancements and they will allow producers with faster CPU to produce blocks carrying the same weight faster — when compared to producers with slower CPU. This means that there’s now yet another profit incentive for the producers to upgrade their hardware (producing blocks faster means that they can be released earlier — which ultimately will lead to less blocks being forked off).

Methodology

The following tests were done mostly in collaboration with Atticlab, Flytomars and Eoscannon.
We have been testing different configuration settings while keeping an eye on the amount of actions processed by each producer over a period of time. The intent of these tests were primarily to find recommendations for how to set the following parameters when upgrading a producer node to 2.0.3:

  • cpu-effort-percent
  • last-block-cpu-effort-percent

We kept track of the number of micro forks (dropped blocks in the handover from Producer A to Producer B) as well as latencies for received blocks. We analyzed the forks to try to identify the reasons and to understand why they occurred.

A couple of nodes were spun up on old hardware to see how those machines could keep up with the incoming blocks while adjusting the CPU effort on the producer nodes. This is an attempt to make sure that we wouldn’t overload dapps, exchanges and researchers with blocks they could not process without expensive upgrades on their side (keep in mind that these parameters can be changed if such a problem would arise).

For this same reason we also enabled the discontinued v1 history plugin (see Table 3) without filters on one of the old machines (this allows a node to handle requests for /v1/history/get_controlled_accounts and /v1/history/get_key_accounts which are needed for example by wallets).

Hardware used

  • Atticlabs producer on Intel i9–9900K @ 5.1 GHz
  • E5–2660 @ 2.7GHz (released 8 years ago)
  • X5670 @ 3.2 GHz (released 10 years ago)

Results

The results shown below come from the machine amed with dual Intel Xeon X5760. We do see similar differences when syncing for both of the older machines (E5–2660 & X5670) .

Table. 1 — The table above shows how many seconds it took to sync 1000 blocks (from publicly available peers) while our nodeos was behind head block — working hard to catch up. Keep in mind that two new blocks are created every second.

Once nodeos caught up with the head block we decided to look at the latency when receiving blocks from a specific producer. This way we could work together with that producer while keeping track of the latency for the incoming blocks. Using the eosmechanics benchmark we identified the producer with the fastest CPU:

Fig 1: This image shows how much time it took for each producer to process a specific type of action (referred to as eosmechanics). The four fastest producers are running wasm-runtime = eos-vm-jit, and Atticlab has the fastest CPU.

In the table below we are looking at the latency for receiving blocks from Atticlab.

Table 2 — The numbers in this table is the latency and represented in milliseconds.

The last-block-cpu-effort-percent was set to 20 for all these tests:

As expected, the time it takes to receive blocks is highly dependent on the weight of the blocks.

Testing with the deprecated v1 history plugin:

Table 3 — Shows the latencies from the 8 year old CPU E5–2660 running at 2.7 GHz. It does not seem to increase the latency to dangerous levels even on such old hardware.

It seems we can see a bigger performance increase from enabling eos-vm-oc while the nodeos instance was following along the head instead of syncing to catch up.

Using the X5670 to produce on the jungle testnet yield the following results as per the eosmechanics benchmark:

Table 4 — The most interesting thing (aside from just how much faster eos-vm-jit is than wabt) to us here was that we got better performance from wabt on 1.8.12 than on 2.0.3

Summary

Setting the cpu-effort-percentage around 40% seems to be very safe, and even at 50% things should work well as long as slow machines are configured to use eos-vm-oc-enable = 1. No other producer has as fast CPU as Atticlab — so it is most likely safe for most producers to go to 60% or even 70% (this can be increased over time).

There are other optimizations that could be configured to help slow machines stay synced with the head block, depending on individual needs and use-cases.

We have seen that going to 2.0.3 will help mitigate the microforks, as well as allowing us to stabilize and optimize the performance for the chain as a whole. This is quite straightforward to understand since running wasm-runtime = eos-vm-jit, will allow lowering the cpu-effort-percentage while producing blocks with the same weight. These blocks can then be released earlier . This results in more time available for the next producer both to receive and to process the incoming blocks.

The sum of these results makes us believe that it is time for the producers on EOS mainnet to upgrade to 2.0.3 and to run wasm-runtime = eos-vm-jit, as long as we lower the cpu-effort-percent and last-block-cpu-effort-percent from the default values (which is 80).
A lot of performance should be gained from this .

Today we observe that an overwhelming amount of the actions are handled by just a few of the producers. The upgrade to 2.0.3 — and later to 2.0.4 should even this out — meaning both increasing the performance and the stability of the chain.

Fig 2: This graph (generated by the data taken from the opensource full history solution developed by EOSRio known as Hyperion) shows the amount of actions processed into blocks per producer over the last 12 hours at the time of this writing.

Going forward it will be very easy to keep track of and to adjust the individual performance of each producer to make sure that we even out the work-load without producing too heavy blocks. As long as we make sure to monitor producers as they switch over (which the team from sw/eden will be happy to help with). We are happy to generate reports on performance, help with optimizing configurations or just answer questions.

Keep in mind not to enable eos-vm-oc on the producers.

With the release of 2.0.3 we feel confident that this upgrade can now be handled in a secure manner, and that we can increase both the stability and the performance of the EOS mainnet.

It’s exciting to see what new performance records that can be reached after a successful upgrade to 2.0.3 and how well the new functionality in the 2.0.4 release will help increase the stability even more.

To configure your node you could start with the values below.

cpu-effort-percent = 50
last-block-cpu-effort-percent = 20

Eosswedenorg
sw/eden — South West Eden
Block Producer Located in Sweden.
Telegram: https://t.me/eossweden
Website: Https://eossweden.org/
Twitter: https://twitter.com/EosSweden

--

--