Implementing Casper FFG in Geth: Mitigating spam votes with NULL_SENDER

Casper, Ethereum’s Proof-Of-Stake (PoS) consensus protocol, plays a vital role in the Ethereum scaling roadmap. Our team signed up to implement the first phase of the protocol, Casper FFG, in Geth, as spec’ed in EIP 1011.

Based on EIP 1011, we broke down the work into 3 pieces:

  • Casper contract deployment
  • Support pooling and executing vote transactions sent as NULL_SENDER
  • Incorporate Casper’s block finality into client’s fork choice rule

Since EIP 1011 is still being developed, we decided to tackle the most stable part of the spec, NULL_SENDER, first. But before diving in further, let’s talk about Casper vote transactions.

What is a Casper vote transaction?

While still relying on the current POW mechanism to generate blocks, Casper FFG introduces two new components to provide deterministic finality:

  • A set of validators voting to decide which checkpoint block they want justified. Once > 2/3 of the validators agree, a checkpoint block is considered justified in the blockchain. This justified block will be finalized in the next round of voting if > 2/3 of validators agree again. See Casper FFG paper for the exact rules of finalizing checkpoints.
  • A Casper smart contract that verifies validators’ signatures and keeps track of validators’ votes

A validator votes by sending a transaction to Casper’s contract address and invokes the vote method with arguments describing the checkpoint block to finalize. This transaction is a Casper vote transaction.

Since these votes determines the finality of the blockchain, they should be processed in a timely manner. Specifically, these votes should be included in the current block ASAP and should never be blocked by normal transactions due to gas price and/or account nonce ordering.

Another consideration is to allow validators to employ a more sophisticated signature scheme than what current Ethereum blockchain can support natively. That means Casper will verify a validator’s signature inside the contract, be it Multisig M-of-N ECRECOVER, threshold signature, or a plain vanilla signature. With that in mind, it seems redundant for the validator to sign the vote transaction itself. As a result, EIP 1011 specifies that all Casper vote transactions should be sent as a NULL_SENDER.

What is NULL_SENDER?

The use of NULL_SENDER was first introduced in EIP 86 and later EIP 208. The main motivation is, quoting Vitalik from EIP 208:

Implements a set of changes that serve the combined purpose of “abstracting out” signature verification and nonce checking, allowing users to create “account contracts” that perform any desired signature/nonce checks instead of using the mechanism that is currently hard-coded into transaction processing.

Geth client implementation should consider a transaction to be a Casper vote when it meets the following conditions:

  • tx.To() == CASPER_ADDR
  • first 4 bytes of tx.Data() == 0xe9dc0614 (identifying bytes of the vote method)
  • tx.data.R == tx.data.S == 0
  • tx.data.V == chain ID
  • tx.Nonce() == tx.GasPrice() == tx.Value() == 0

Once a vote transaction is identified, the sender of this transaction is set to NULL_SENDER, i.e. its from address is set to 0xffffffffffffffffffff (20 f’s).

What’s good about NULL_SENDER?

Below is a diagram describing the transaction pool inside the Geth client.

Geth transaction pool data structure

From the programmers’ point of view, NULL_SENDER allows us to easily separate processing of votes from normal transactions. (See bottom of the diagram above) We can collect all Casper votes under one address and process them without considering gas price and account nonce ordering.

For example, if an external account sends both normal and vote transactions under the same address, a Casper vote will be blocked by normal transactions with lower nonces. Also, when choosing transactions for the current block, Geth picks the highest gas price across all accounts. If a Casper vote’s gas price is too low, it will be buried deep in the price heap and will not be included ASAP.

Essentially, delaying Casper votes processing delays the finality of the blockchain, and we don’t want any of that. So NULL_SENDER is good for that.

So what’s the problem?

During our implementation, we constantly find ourselves asking

What if the network is flooded with invalid votes? The miner will never be able to generate a block!

This is because EIP 1011 also specifies the following rules that keep us from running wild with NULL_SENDER:

  • Only include “valid” vote transactions
  • Track cumulative gas used by votes separately from cumulative gas used by normal transactions via vote_gas_used
  • Total vote_gas_used of vote transactions cannot exceed the block_gas_limit, independent of gas used by normal block transactions

These rules imply that the miner can spin its wheels executing spam votes endlessly if it doesn’t place a cap on vote processing. This is because a miner is encouraged to included as many votes as possible until vote_gas_used reaches block_gas_limit, but a failed vote doesn’t count towards vote_gas_used.

Note that the attack surface of a Casper vote is the same as a normal transaction. But spamming the Ethereum network with votes is slightly cheaper and more annoying because:

  • It’s free to generate a vote. One only needs to set the RSV values and to address of the transaction accordingly. No signature is required. tx_data can be filled with garbage.
  • You don’t pay any gas for a vote to be included in a block.
  • Geth client is inclined to process votes at a higher priority than normal transactions to speed up the finality of the blockchain.

We know that the Geth team has a production-quality coding mindset and it is rather difficult to contribute below-the-bar code. To shorten the development cycle, we initiated discussions with the authors of EIP 1011 on different venues and ultimately Vitalik himself.

What did we do?

We proposed an alternate approach to generate Casper votes that will discourage spam votes while allowing for relatively easy vote processing in the client implementation. The gist of it is:

  • A validator signs a vote transaction with its private key. This is simply the signature of the vote sender, and does NOT affect the signature verification scheme inside the Casper contract.
  • A validator uses a dedicated private key for vote transactions and not any other transactions.
  • Charge just enough gas for votes to discourage spam.

The complete proposal is at the end of this article, or can be found here. Our arguments mainly are:

  • Paying gas for votes, however little, discourages spams.
  • Validators are incentivized to keep a dedicated private key for voting and not other normal transactions, because they will be penalized if they don’t vote promptly.

So, what happened?

Even though our proposal was not adopted, we received very constructive feedback and reached a highly agreeable outcome with regard to mitigating spam votes.

EIP 1011 developers agreed to provide two contract methods for the client to validate votes prior to EVM execution:validate_vote_signature and votable. As their names suggest, these methods will catch all scenarios a Casper vote can fail. Upon receiving a vote transaction, the client will run the vote over these methods and throw it away if it’s invalid. Thus at EVM execution time, all votes should be valid.

The arguments against our proposals are:

  • The client has to do an extra ECRECOVER
  • Paying gas is essentially lowering the validator’s reward
  • Using NULL_SENDER allows the contract to prevent normal transaction from calling Casper contract vote method

We are somewhat convinced.

Programmers’ rants

But of course we are only 98% happy. As defensive coders, it doesn’t sit well with us that we have to completely rely on a contract to filter out spam messages.

NULL_SENDER, other than saving an ECRECOVER, serves little purpose in the client implementation. It adds an extra unnecessary layer to identify a Casper vote. We now have two ways in the code to identify Casper votes:

  • whether the vote transaction meets Casper vote criteria (RSV, to_addr, gas_price/nonce/value)
  • whether the sender of the transaction is NULL_SENDER

On top of that, we cannot predict that NULL_SENDER is ever going to be used solely for Casper votes, but I digress.

But we are making progress!

We are happy that we obtained a very reasonable outcome with EIP 1011 developers and were able to get past ourselves and make progress. The most important thing is to speed up Casper development so we can quickly move to a world where POS rules and Ethereum scaling efforts succeed.


https://hackmd.io/s/ryH-SKfJm

Problem Statement

Casper-FFG introduces a new concept of vote messages. While structured the same way as normal messages, a vote message is a special protocol message generated and sent by Casper validators. These votes messages determine the finality of the blockchain and therefore are processed differently from normal messages.

In practice, a normal transaction is executed by the miner based on its gasPrice and the correct order of the account nonce. A vote transaction should be executed without censorship and has a higher priority than normal transactions. If we treat these two types of transactions the same way, normal transactions could block vote transactions from being processed. This is not the desired outcome.

Background

The AMIS team is trying to implement Casper-FFG on the geth client per EIP1011. Our understanding is that by using NULL_SENDER, Casper votes are not subject to price/nonce ordering like normal transactions.

Issue

When implementing NULL_SENDER mechanism in the geth client, we found the issue of spam votes difficult to mitigate. Zero-gas votes encourages spams. Benign validators that are hacked or use poorly-written code may misbehave. As a result, the client implementation needs to code against spam votes.

Specifically, the client cannot execute vote transactions without placing a cap. Because a failed vote doesn’t count towards vote_gas_used, a miner can find itself executing failed votes indefinitely and, as a result, unable to produce the next block.

We found ourselves asking the contract developers for a public function votable() to allow the client to validate a vote a priori. The client then can only place valid votes in its vote queue. These votes will later be EVM-executed and included in the next block.

However, this does NOT free us completely from worrying about spam votes. Let’s suppose there are N ways a vote can fail in the Casper contract, and the client is able to screen Mscenarios. There are still N-M ways that a vote can fail at EVM execution time. A block will still be delayed if

  • The network is spammed with votes that fail in those N-M ways
  • EVM keeps executing these failed vote without counting them towards vote_gas_used or placing a cap

Even if the Casper contract guarantees votable() to capture all N failure modes, in the name of defensive programming where we don’t fully trust contract correctness, the client implementation will still want to cap vote processing to a fixed number or count failed votes towards vote_gas_used.

Proposal

We think it simpler if we can disincentivize spam votes all together. We propose the following:

  • A validator signs a vote transaction with its private key. This is simply the signature of the vote sender, and does NOT affect the validation code scheme in the Casper contract.
  • A validator uses a dedicated private key for vote transactions and not any other transactions.
  • Charge just enough gas for votes to discourage spam, or alternatively charge gas for failed votes only.
  • A vote transaction can be identified by
  • to == CASPER_ADDR
  • transaction data starts with vote bytes 0xe9dc0614

We recognize that in this scheme, if a validator uses the same private key for sending both normal and vote transactions, the votes can be blocked for having a higher nonce. But we believe that validators are disincentivized to do so since they will be slashed if they fail to vote promptly.

Apart from removing the error case of spam votes, this approach also makes the client implementation simpler in two ways:

  • We don’t have to special case NULL_SENDER in various places of the code
  • The assumption of NULL_SENDER will not be used for scenarios other than Casper votes gnaws at a developer