Optimistic Execution landing in the Cosmos SDK

Cosmos SDK
The Interchain Foundation
4 min readSep 28, 2023

--

Optimistic Execution landing in the Cosmos SDK — Faster and more efficient block production

Optimistic Execution enables Cosmos SDK chains to build blocks with ground-breaking velocity, pushing the limits of block times and efficiency.

Optimistic Execution (OE) refers to the ability to process a block ahead of time in order to have a response ready when the consensus layer requests it. It’s optimistic because we start executing the block even though it can be rejected at a later stage (but that’s considered rare in most cases).

In load tests on Sei Network, Optimistic Execution cut the block time by ~300ms. On fast block-producing networks like Injective, OE can help the network generate sub-1-second blocks without any extra effort.

OE implementation is now available in the Cosmos SDK and will be available in v0.50 (the next release). Please check out Cosmos SDK’s RFC005 for more information.

Introducing Optimistic Execution

Before Optimistic Execution a Cosmos SDK app would execute the block and its transactions on the very last step of ABCI FinalizeBlock. This means that calling FinalizeBlock runs for the same duration as the block’s execution,resulting in the application only receiving the complete block proposal in the last step after all of the voting occurred.

Cosmos SDK — ABCI++ flow without optimistic execution.

With the introduction of ABCI++, the application layer now receives the block proposal before the voting period commences. This can be used to optimistically execute the block proposal in parallel with the voting process, thus reducing the block time.

Cosmos SDK — ABCI++ flow with optimistic execution.

How fast is faster?

Assuming average voting period takes P and average block execution takes Q, this would reduce the average block time by P + Q — max(P, Q).

Sei Network reported P=~600ms and Q=~300ms during a load test, meaning that optimistic execution could cut the block time by ~300ms.

We hope to get better statistics as more chains start to use Optimistic Execution.

How it works?

CometBFT v0.38 introduced ABCI++ which includes some new calls between the consensus layer and the app layer. This brought with it a whole new set of features that apps can take advantage of.

One of these new calls is ProcessProposal, which allows us to do application-dependent work on a proposed block. This means we get early access to the full contents of the proposed block.

Now that we have the block we can proceed to optimistically execute it. We do this by starting a goroutine in ProcessProposal that executes FinalizeBlock ahead of time.

Now CometBFT starts the pre-commits round. This can take some time depending on how big the validator set is, if the chain has vote extensions enabled and the nodes’ network latency.

Once CometBFT calls FinalizeBlock on our app, we wait for the goroutine to finish and return the result. In the best case scenario the goroutine has already finished and we can return its result right away, spending practically no time in this function call.

In case the block gets rejected by majority and we still ran the Optimistic Execution, we’ll abort and discard the results and start over.

When does an abort happen? How does it work?

We call for abort in two places:

  • Anytime ProcessProposal is called we abort any running OE process unconditionally.
  • In FinalizeBlock we check if the hash in the current request (RequestFinalizeBlock) matches the hash of the block being executed in the running OE, if they don’t match we proceed to abort and discard the result.

In both cases, the app will continue to work as expected without errors or panics.

When an abort event happens, there might be some added delay to the function calling it. Given that we wait for it to complete (more about this in the Future Improvements section) and if the abort happens on FinalizeBlock we proceed to process the block synchronously as it was before the implementation of OE.

Note: These should be taken as exceptional cases, and we don’t expect this to happen often.

Worst case scenario

The worst case scenario is when a node reaches FinalizeBlock while executing the wrong block. In this case the app will abort the running OE and start over with the received block. Resulting in a significantly higher processing time than previous blocks.

Future improvements

The team is actively exploring avenues of enhancement for the implementation of Optimistic Execution. One of the primary focuses includes reducing the time it takes to abort a running execution among other potential improvements.

As explained in the previous sections, before starting a new OE goroutine we need to abort the previous one and wait for completion. The team will study several notions to reduce this wait time as much as possible:

  • Make modules aware of a cancellable context, so when the execution is aborted any transaction being executed can return ASAP. Currently we return from execution in between transactions.
  • Cache all the necessary context for block execution, which will enable the app to start a new OE goroutine without having to wait for the previous (aborted) one to finish.
  • Better handling and abort of the goroutine (no mutex, only channels).

Contribute to the discussion around the improvements to Optimistic Execution in the Cosmos SDK.

--

--

Cosmos SDK
The Interchain Foundation

The world's most popular framework for building application-specific blockchains. https://github.com/cosmos/cosmos-sdk