Elrond improvement: Change in Consensus and Randomness Source
After the initial proposal that came with the Elrond Whitepaper almost one year back, in May 2018, there have been multiple improvements discussed internally and built into the source code. Most of these improvements, which could have been nice updates for the Whitepaper, have somehow landed in a pending updates list which we never got to share.
Now, as we are getting closer to releasing the first version of the testnet, we thought it would be a good moment to finally present some of these improvements. One of them, as the title implies, is about the changes we’ve done to the design of our consensus and randomness source. These came to solve some limitations of the previous solution and give a speed boost to the consensus operation, which in turn allows us to further increase the in-shard security.
Before jumping to the problem, let’s name some parameters as I will be using these a lot:
With previous setup, if the consensus block proposer selects a malicious node as signer among the n signers, that node can affect the system’s liveness, by choosing not to sign, and thus aborting the consensus for that round.
The liveness problem comes from the constraint that the consensus result (block signature) is also used as a randomness source, and thus it needs to satisfy several other properties like: unbiasability, and unpredictability for it to be a good candidate.
This was previously mitigated economically, and also by creating local rating maps, where each validator rates the other validators it has previously worked in consensus with. The local rating would allow a consensus group leader to have some insight on the probability of the consensus round being successfully finalized, before selecting and broadcasting the signers’ bitmap, evidently by choosing the best known peers.
How is liveness affected?
For clarification, the consensus operation flow with Belare Neven multi-signature (a Schnorr multi-signature scheme) is explained below (there are some changes in the implementation vs whitepaper which are covered here). The algorithm is presented from a high level perspective, the formal description can be found in the link above or in our whitepaper.
- Leader creates block with transactions and broadcasts this block to consensus group members
- Each member validates the block and if block is valid, they broadcast the hash of their commitment (commitment will be used for signing)
- Leader selects from among the received commitment hashes a subset of at least n and creates a bitmap for his selection, where B[i]=1 if the ith member of the consensus group has sent the commitment hash and B[i]=0 otherwise. Leader then broadcasts this bitmap. If not enough commitment hashes received after t time, the leader aborts the consensus.
- Upon receiving the bitmap, each member selected in the bitmap broadcasts their commitment corresponding to the commitment hash sent at step 2.
- Upon receiving all commitments in bitmap B[i], if all commitments are valid when checked against the corresponding commitment hashes, each member aggregates the commitments in B and then creates a partial signature based on this aggregated commitment, broadcasting it to all members of the consensus group.
- Upon receiving all signature shares corresponding to signers selected in B (verifying each signature share), each member can aggregate these signatures and add the signature to the block, broadcasting the block to the entire shard.
The problem with this algorithm is that after step 3, any member selected in the bitmap B can cause the consensus to be aborted by choosing not to participate in one of the next consensus steps (we disregard here the economic penalties).
The liveness could be improved by picking a different member from the consensus group in the bitmap to replace a non responsive previously selected member, and start again from step 2, but afterwards there is still the same problem as any of the members can cause the consensus to be aborted. Restarting the consensus after step 2 is not an option since the private keys of any validator that sent their signature share in two retrials can be easily computed.
The signature share of one signer is computed as below:
Where r is the private key chosen as commitment secret corresponding to the commitment sent (point on the curve), H(<L’>||PK||R||m) — let’s call it h1, can be calculated by leader and any other member of the consensus group that received all commitments in bitmap B, and sk is the signer’s private key.
If we restart the consensus group from step 3 or above, it means that we work with the same commitment and implicitly the same commitment secrets.
In this case the same signer would provide
Where H(<L2’>||PK||R2||m) — let’s call it h2 can still be easily calculated by leader and all members that received all commitments for the new configuration. In this case subtracting the signatures would leak the private key of the signer.
With previous solution, the result of the multi-signature scheme was important because it was used for:
- Demonstrating that the consensus was reached, as at least n members of the consensus group have approved the block by signing it;
- Random seed for the next consensus group selection (random number is used in calculating the next members of the consensus group) which brings further requirements:
- It needs to be unbiased (leader chooses which commitment hashes to use but does not know any relevant information that can help him influence the end result at this point);
- It needs to be unpredictable — multi-signature result is unpredictable, and on the system level it has a 1D local predictability as once the block is signed and propagated, every node knows the members of the current round consensus group.
With current solution seems like it is hard to improve the liveness without changing the block signature scheme.
If we do not use the block multi-signature as a random seed for the next block, then we have more options of demonstrating that a consensus was reached. Below the new solution for consensus and randomness source is presented.
We propose using the BLS multi-signature scheme. This multi-signature scheme has a biasable result, as the leader can aggregate any subset of n signatures, but since we won’t use the result as a random seed and only to prove the consensus was reached, this would be sufficient. Even though it is more time consuming on both signing and verification than Schnorr signatures, it reduces the communication rounds, so it actually reduces considerably the time spent in consensus.
How is this different?
For clarification, the consensus operation flow with BLS multi-sig is explained below from a high level perspective.
- Leader creates block with transactions and broadcasts this block to consensus group members
- Each consensus group member validates the block, and if block is valid creates a BLS signature and sends this back to the leader.
- Leader selects from among the received signatures a subset of at least n and creates a bitmap for his selection, where B[i]=1 if the ith member of consensus group was selected and B[i]=0 otherwise. Leader aggregates the signatures and broadcasts the block attaching the bitmap and signature to it (B, aggSig). It must also sign the end result, just to “seal” the configuration for (B, aggSig) before broadcasting.
In this case we have several advantages in comparison to previous model, among which:
- No single validator can cause the consensus to be aborted, except if that one is the leader, but there is no incentive to do this, the leader would only lose the fee => we gain real byzantine fault tolerance;
- Reduce the consensus communication rounds from 5 to 2 which makes the consensus much faster (interactivity is much reduced) and again reduces possibility of abort due to connectivity, latency, etc => we gain performance.
Randomness source calculation
Signing blocks with BLS multi-signature means that we can no longer use the aggregated signature as an unbiasable random number seed, since the leader has a high leverage on what the end result could be. For example, out of the group members signatures it can select any subset of size n that is more advantageous to aggregate, this gives it a combination of cSize taken n options to choose from.
In this case we can try to find another source of randomness that we can create with little effort and has the properties we mentioned before:
- It needs to be unbiasable;
- It needs to be unpredictable.
And in addition to this, the two properties need to be provable.
BLS signatures as candidate
BLS single signature (a summary here) seems to share the same property with deterministic k generation ECDSA, in that signing the same message with the same private key always produces the same result.
At the same time the scheme looks very simple:
Given private key sk, message m, and a hashing function H that hashes to points on G1:
In this case, there is no biasable parameter that could be changed such that the signature would still be verifiable but would give multiple options to the signer on the end result.
How we could use this with the proposed consensus model:
- New consensus group is selected using a hash of the randomness source taken from the previous block header (a BLS signature, or in case of the genesis block a known seed).
- Chosen leader signs the previous randomness source with BLS single signature to generate the new randomness source, creates a block with transactions, adds the new randomness source in the block header and broadcasts this block to consensus group members;
- Each member validates the block, also validating that the new randomness source is a signature verifiable with the leader’s public key on the old randomness source. If both are valid, it creates a BLS signature on the proposed block and sends this back to leader;
- Leader selects from among the received signatures a subset of at least n and creates a bitmap for his selection, where B[i]=1 if the ith member of consensus group was selected and B[i]=0 otherwise. Leader aggregates the signatures and attaches the bitmap and signature to the block. It must also sign the end result, just to “seal” the configuration for (B, aggSig) before propagating the resulted block through gossip inside the shard.
The evolution of the randomness source in each round can be seen as an unbiasable and verifiable blockchain, where each new randomness source can be linked to the previous randomness source.
The newly proposed consensus model with the BLS single signature scheme for randomness, is the chosen solution for improving the liveness of our consensus and generating unbiased random numbers, with a low degree of predictability (one round). In order to have the random seed uniformly distributed we can apply on top a hashing function and use instead of the BLS signature the hash of this signature to select the next consensus group.
This would mean that we have a consensus that uses aggregated BLS multi-signature as block signature to prove the consensus was reached, and single BLS signature from the leader of the consensus over the last randomness source providing the next randomness source. The transactions would still use Schnorr signing, as it is much faster to verify than BLS (one order of magnitude faster).
What we gain is a faster consensus, with 2 rounds of communication instead of 5, of complexity O(n) instead of O(n²) since leader needs to broadcast to the other members and the other members only need to send back to leader their replies. This means that we can increase even further the consensus group size (e.g 61 members should be easy) and as effect improving the shard security, as probability for malicious majority falls to ~1/10⁹ per round with 61 members in the consensus group.
The consensus also becomes more fault tolerant, improving liveness, since it does not matter which signatures the leader chooses as long as there are n of them.
The randomness source becomes unbiasable, either by leader or any other consensus member, since only the leader of a consensus can provide the next randomness source, but this result is fixed and only depends on the leader’s private key and the previous randomness source.
Because the consensus becomes faster with fewer interactivity rounds, we have more options for the economic model. For example we could increase the rewards per validator per block with number of signatures a leader aggregates in the BLS block multi-signature. This would have been hard with previous solution, but now becomes easy.