Making Sense of Ethereum Nonce(sense)

We expect users to send multiple Kin transactions in a row. We’re not expecting users to understand blockchain and monitor their transactions manually. When testing our Kin wallet SDK (Android and iOS), we witnessed a problem sending multiple transactions one after the other. In a specific period of time, all of our transactions got stuck — we couldn’t see that they were pending, and eventually they were cancelled. As it turns out, transactions had the same nonce for multiple transactions.

What is nonce?

Nonce can mean two things in Ethereum:

  • Proof of work nonce: A meaningless value in a block that can be adjusted in order to try to satisfy the proof of work condition. This is the essence of mining. This value makes satisfying “proof of work” hard computational work that depends on luck.
  • Account nonce: A transaction counter in each account that prevents replay attacks. For example, a transaction sending 20 coins from A to B can be repeated over and over by B to continually drain A’s balance.

Our problem is related to the latter nonce — the transaction counter. When making a transaction in Ethereum, a consecutive number should be attached to each transaction on the same account. Each node will process transactions from a specific account in a strict order according to the value of its nonce.

Therefore, failing to increment this value correctly can result in different kinds of errors. For instance, let’s say the latest transaction nonce was 121:

  • Reusing nonce: if we send a new transaction for the same account with a nonce of either 121 or below, the node will reject it.
  • “Gaps”: if we send a new transaction with a nonce of either 123 or higher, the transaction will not be processed until this gap is closed, i.e. until a transaction with nonce 122 has been processed.

Determining nonce for transactions

As mentioned earlier, we noticed that we were sending transactions that have the same nonce. To explain how this happened, let’s first understand how we increment the nonce value in our mobile SDKs. We are using the Geth mobile method (EthereumClient.getPendingNonceAt) that takes the public address of an account as a parameter. This method is eventually mapped to the eth_getTransactionCount JSON-RPC API method with a block parameter of “pending”. Remember, a nonce is the transaction count in this context.

A pending block is the current block that a node is mining, which has yet to be confirmed and propagated through the blockchain network. Blocks are limited in size, so when a pending block is full, any other transactions that arrive to the node are kept in a special area of the node memory, also known as the transaction pool or txpool.

eth_getTransactionCount with a ‘pending’ parameter takes the latest nonce from the last mined block and adds transactions residing in the pending block, while ignoring any transactions that might stand waiting in the txpool.

Now it became clear to us what had caused the problem — we encountered a peak in the Ethereum test network. Our transactions were queued in txpool, and we weren’t able to enter the pending block for a long time. Because the node only considered existing mined blocks and the pending block, querying a node for transaction count to get the next nonce returned the same nonce we already used. This resulted in multiple transactions with the same nonce, as long as our transactions didn’t pass txpool and proceeded to the pending block. Eventually, when our transactions progressed in the txpool queue and were processed, they were rejected because they used the same nonce.

Possible solutions

1. Doing the heavy work on the client side: Find the last proper nonce on the client side.

Using Ethereum JSON-RPC, the closest way to get a pending transaction is to get a pending block using eth_getBlockByNumber and iterating over all transactions. There is no filter with this method, so we must search for transactions from our address. Let’s leave aside the fact that it may include dozens of transactions (that we should download and parse in mobile client). We will simply end up with the same result as eth_getTransactionCount, as we look only at the pending block.

Another option for getting the true state of nonce is to access the entire txpool, search relevant transactions there and conclude the correct optimal next nonce. To do this, we can use special txpool related Ethereum APIs called ‘Management APIs.’

The problem with this solution is that it requires us to enable this special access API at the node level. We have data/processing concerns here as well, as txpool can be very large to download and parse at the mobile client.

2. Enhancing server (node) side: change node side logic to look over all txpool instead of the pending block.

There’s an open issue at go-ethereum to change this behavior, and in the parity node, there’s a custom solution — a non-standard API parity_nextNonce that returns the next available nonce both from pending block and txpool. There’s also an ongoing discussion to redefine ‘pending’ for several other JSON-RPC methods so that it also includes txpool.

Conclusion

While it’s seems like a reasonable solution, using a nonce that is derived from txpool has it’s own risks. In a scenario where we send multiple transactions, we can get to a state where multiple transactions exist in txpool. If one of these transactions is invalid, and the node rejects it, gaps in transaction count might be created. This may lead to hanging transactions that are not processed until we fill that gap. This can be very hard to discover and deal with on the client side.

Another unclear area is what will happen when nodes synchronize. When nodes announcing new transactions to other nodes, and the other nodes combine this new information into their own pool, what happens with latest nonce? What will happen when one transaction sent to node A while the second one sent to Node B? The consequences are unclear.

With respect to these problems, a future solution might include a throttling mechanism that will prevent the client from sending additional transactions when specific conditions are met.
Considering our product scope for IPLv2 (abstracting complex details and giving the user a seamless experience where he or she doesn’t need to know what a nonce is), we decided not to deal with these problems until we can spend more time producing an optimal solution.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.