Towards Tendermint Core 1.0

Our roadmap for the next 18 months

Tess Rinearson
Tendermint Blog
10 min readJan 28, 2021

--

Tendermint Core is nearly seven years old, and is used to secure billions of dollars across several significant blockchain ecosystems. And yet, despite this, it’s still considered pre-1.0 software.

Our goal is to fix that.

Although pre-1.0 numbering has allowed core development teams a lot of flexibility, it’s become obvious that this isn’t fair to our users. We can’t use version numbers to signal breaking changes, and our interfaces and encoding systems have often shifted out from underneath our users.

Tendermint Core 1.0’s chief concern is stability. It must be stable enough that people can (continue) to build production software on it confidently. This requires an increased level of clarity and rigidity around all the interfaces exposed by Tendermint Core. It also requires us to build confidence in Tendermint Core’s performance and correctness through investments in performance-oriented refactors and increased, more sophisticated testing.

This post outlines the things we consider necessary for a 1.0 release. However, please be aware that “1.0” has a specific meaning here. Although people sometimes expect software to be feature complete or otherwise “done” when it reaches 1.0, that is not our intention with bringing Tendermint Core to 1.0. Rather, the goal is simply stability.

Releasing Tendermint Core 1.0 means that we are providing stronger stability guarantees to our users. Reaching 1.0 also means that we can rely more easily on conventions like semantic versioning to help communicate and enforce our stability guarantees.

When thinking about the requirements for 1.0, we began to think of new features and improvements as a kind of Maslow’s hierarchy combined with a food pyramid, with the most substantial and most boring goals at the bottom, and the shinier and smaller changes at the top. Something like this:

Don’t read too much into this.

We’re likely to break these changes down into a pair of releases: 0.35 and 1.0. There are more notes on how that might happen at the end of the post. Also: As of January 2021, some of these projects are already underway, or even largely complete. I’ve marked those projects with an asterisk.

Interface stability 🍞

API stability

The goal here is to provide Tendermint Core users with guarantees about the ways that publicly exposed APIs will (and won’t) change during future minor releases. Once we’ve done this, we’ll also finally have a versioning scheme that actually conforms to semantic versioning!

To do this, we’ll define deliberate public interfaces across the Go, RPC, and P2P APIs, and deprecate and/or unexpose all other interfaces, perhaps through the use of internal packages. This gives us the flexibility to change those interfaces in the future, without necessitating a major release.

ABCI updates

Tendermint Core 1.0 needs to have a stable Application-Blockchain Interface (ABCI) which won’t shift out from underneath applications that rely on it. We expect we’ll work closely with Sikka to incorporate the work from their ABCI++ proposal.

Details of Sikka’s proposal for ABCI++

gRPC interface for `privval`*

The interface between Tendermint and privval (“private validators,” local implementations of validators that can perform signatures) should use gRPC.

Evaluate gRPC, and possibly adopt for RPC layer

We’ve explored a variety of options for our RPC layer, ranging from jsonrpc to gRPC. There’s been some enthusiasm from users around gRPC, because it gives us some nice things “for free,” but it also comes with tradeoffs in complexity and can require additional tooling.

As part of our 1.0 push, we’ll determine if gRPC is the right framework for our RPC layer; and if so, we’ll implement it. This work will begin with an RFC, and we’ll seek further input from community members and users. If this RFC is accepted, we’ll write a transition plan for the RPC layer and execute it.

Don’t panic

Tendermint Core is currently pretty liberal with its use of panics, which, in the worst case, can cause a node to crash. Through scout coding, we’ve worked to replace panics with errors that are passed back to callers; unfortunately, this can mean changing Go APIs as function signatures add errors as return parameters.

Before releasing Tendermint Core 1.0, we’ll make a concerted effort, likely including an internal audit, to root out any remaining and inappropriate panics before stabilizing the API.

Versioned documentation*

Different versions of Tendermint Core have different documentation for their APIs, their specifications, and more. One of our priorities this year is to ensure that documentation and specification is accurately versioned and accessible.

Block protocol stability 🧀

The 1.0 release will be the last opportunity to make block protocol-breaking changes until the next major release (i.e., 2.0). We will take care to evaluate and include any high-to-medium priority block protocol-breaking changes in releases up to 1.0.

Soft chain upgrades*

Historically, Tendermint-based chains must hard fork in order to upgrade to a new block protocol-breaking version. This can be a painful process: All nodes must coordinate a halt height/time, where application state is exported into a “new” chain. In theory, this means that networks will undergo a complicated coordinated process, and lose transaction history on the new chain. In practice, this means that many networks instead choose to forego new features for long periods of time.

Tendermint Core 1.0 will reduce the requirement for such hard forks, chiefly by expanding the kinds of changes that can be accomplished through soft forks.

Versioned, unified protocol specification*

The Tendermint specification (at https://github.com/tendermint/spec) is shared between the Go and Rust implementations, and at times, it’s divergent. Before any kind of Tendermint 1.0, we’ll have a spec that is cohesive, comprehensive, unified, and versioned to match a specific version of the Tendermint Core implementation in Go. It will likely also undergo release processes, as something of an upstream dependency of the implementations.

Adopt ZIP 215*

Different Ed25519 implementations verify signatures in subtly different ways. This will pose a problem for future implementations of Tendermint which may use different Ed25519 libraries. ZIP 215, from the Zcash Foundation team, “settles the situation by explicitly defining the Ed25519 validity criteria and changing them to be compatible with batch validation.”

By the time we release Tendermint Core 1.0, it will use ed25519-zebra for signature validation.

Evaluate proposer-based BFT timestamps, and possibly implement

Tendermint has a long history with timestamps. Initially, Tendermint used proposer-based non-BFT time: Blocks uncritically used the timestamp of the node that proposed said block. Currently, Tendermint uses median BFT time: The timestamp of a block is the median of the timestamps provided by all the validators that voted for it. Ultimately, we suspect we’d like to return to proposer-based timestamps, for advantages in both signature aggregation and IBC security. But this time, we’d like that timestamp to be Byzantine fault tolerant.

This will be a collaboration with the research team at Informal Systems, who will begin with a proposed specification change for proposer-based BFT timestamps.

Consider reducing validator address redundancy, and possibly implement

Validator addresses are currently included in blocks in two places: Votes and CommitSigs. This is redundant, and increases block size; but it’s also nice for debugging. For Tendermint 1.0, we’ll need to make a conclusive decision about whether or not to keep this redundancy.

Evaluate immediate execution, and possibly implement

The team at LazyLedger has pointed out that Tendermint’s “delayed execution” model is both prime breeding ground for off-by-one errors, and that it makes certain kinds of fee models difficult or impossible to implement. We would like to evaluate the possibility of transitioning to an “immediate execution” model wherein the AppHash included in blocks points to the current state rather than the last state.

This change would be hugely breaking, and it will be likely be an exploratory collaboration between the LazyLedger and Tendermint Core teams. Similar advantages may also be covered by the ABCI++ proposal mentioned above.

Consider custom block header data, and possibly implement

The LazyLedger team has also requested a way to include custom information in block headers. We think this is probably useful for other teams, as well, and we’ll evaluate the possibility of adding a new metadata field to block headers, along with a new block preprocessing phase that allows applications to write to this field. This will almost certainly overlap with the ABCI++ proposal.

Performance 🍎

Redesign and refactor mempool

Earlier this year, the community ran an incentivized testnet called Game of Zones. This was an excellent preview of the kinds of performance demands that Tendermint-based networks will see as they scale, and we learned that the current Tendermint mempool will suffer if you put it under that kind of load.

As various Tendermint-based networks grow, we’ll need a more sophisticated mempool that can load shed and prioritize transactions. We’ll also be seeking feedback from users to ensure that the new mempool can support a wider and more sophisticated range of application needs.

Note that we’ve discussed the possibility of creating a “pluggable” mempool, and we’ve decided not to include this in the roadmap for Tendermint Core 1.0: We’ll have more confidence in the requisite public interfaces after the 1.0 stability push.

Order-preserving database keys*

The block store uses an alphabetical ordering for keys, which makes it impossible to do efficient range scans. We’ll need to change this encoding to something that uses numerical ordering instead.

Benchmarks, scalability and performance measurements

Did performance really improve if it was never measured?

As part of our performance push, we’ll need to more aggressively track and test our performance as networks scale to large numbers of transactions, blocks, and validators. This will help us build an awareness of the bottlenecks in Tendermint Core, and prevent performance regressions in future versions of Tendermint. We’ll also work with high-load networks to ensure that our software can handle their load.

Correctness 🥬

End-to-end testing*

We’d like to have an end-to-end testing suite that can run a set of tests against various configuration permutations — permutations of configuration and genesis parameters, network topologies, Byzantine behaviors, and chaos testing, running automatically against master every night.

Advanced testing techniques*

We’d like to take advantage of the recent advances in distributed systems testing and verification, and use these tools to build more confidence in the correctness of our Tendermint implementation. We’d also like to start taking advantage of the work that the Informal Systems research team has done with formal verification, conformance testing, and model-based testing. Ultimately, our advanced testing could take the form of model-based testing, Jepsen tests, Elle tests, Twins tests, or some combination of the above.

A recent talk from the Informal Systems team about model-based testing

Security 🥩

Evidence handling follow-ups

Tendermint v0.34 introduced a new way of handling evidence. As a follow-up to this release, we want to “harden” the evidence flow, including increased end-to-end and unit testing for evidence. We’ll also almost certainly conduct an audit with the research team at Informal Systems, similar to the one that was conducted for IBC.

Additionally, we’ll want to look more closely at amnesia evidence: We shipped the new light client without automatic handling for amnesia evidence from light clients. That is, although light clients can identify and record evidence of an attempted amnesia attack, that evidence doesn’t actually identify the misbehaving actor(s). An upcoming release will examine these attacks more carefully and find a way to pass sufficient information to applications, such that they can follow some protocol to hold misbehaving nodes accountable for amnesia attacks.

Cross-upgrade accountability

When doing a chain upgrade, evidence from an old chain should be persisted so that nodes can be held accountable for misbehavior prior to the upgrade. We’ll begin by assessing the current limitations around accountability, work with the IBC team to understand cross-chain implications, and write and circulate an RFC.

Peer-to-peer refactor*

A number of security issues have cropped up lately around vectors within the peer-to-peer (P2P) layer. Refactoring the P2P layer has been a “wishlist” item for a long time: The highly entangled nature of the various P2P components makes it difficult to reason about or safely change, and these security issues help provide more urgency around tackling this.

Our goal is a simpler P2P layer with a more straightforward architecture that can be more easily patched, and with fewer DOS vectors and other security vulnerabilities.

For the 1.0 release, this refactor has been carefully scoped to internal improvements and new transport support; we do not plan to make disruptive protocol changes before 1.0 and certainly not before the foundational changes are made.

Usability improvements 🍩

Although many of the aforementioned projects will improve user experience, there are a few features we’re considering expressly to smooth out the usability.

Light client usability improvements

It’s not always straightforward to start a new light client: Users have to find P2P endpoints and node IP addresses for witnesses, as well as trusted headers, in order to start a new light client. We plan to help people do this in a trust-minimized (i.e. decentralized) way.

Although these features haven’t been fully designed or scoped yet, we know that we want to make it incredibly easy to start a new light client, and that this will be important for Tendermint Core 1.0.

Blockchain reactor cleanup

We currently have three versions of the blockchain reactor (v0, v1, v2), all of which are responsible for fetching and applying blocks. This is confusing to users, who don’t always know which version to use. We would ultimately like to get everyone on a single version, most likely v2, which has been designed so that it can be formally verified. Ultimately, we’d like to have a single, well-tested, formally-verifiable reactor that performs the blockchain reactor functionality. Perhaps it will even have a more obvious name than “blockchain!”

Appendix

Release schedule

There’s likely more here that can fit into a single release; so we envision a split that looks something like the following. Note that the larger features are sometimes themselves split across the releases, often into planning/design phases, where decisions can be made, and then implementation phases.

First Release (0.35)

  • Versioned spec
  • Versioned documentation
  • Versioned protocols and data types
  • Documentation outlining the plans for API stability and ABCI updates
  • RFCs written for proposed ABCI updates
  • gRPC interface for privval
  • Decision (RFC and ADR) for gRPC as our RPC layer
  • Soft upgrades planning
  • Evidence handling follow-ups
  • E2E testing
  • Immediate execution (evaluation)
  • Custom block header data (evaluation)
  • Advanced testing techniques
  • P2P refactor design phase 1
  • Deprecate blockchain reactors v0 and v1 (make v2 the default)

Second Release (1.0)

  • Proposer-based timestamps
  • Mempool redesign and refactor
  • Validator address redundancy reduction
  • Stabilized APIs (Go, P2P, RPC)
  • ABCI++ / stabilized ABCI
  • “Don’t panic”
  • Light client usability improvements
  • gRPC layer implemented (follow-up from decision made during 0.35 cycle)
  • P2P refactor (follow-up from designs written during 0.35 cycle)
  • Remove blockchain reactors v0 and v1

Excluded features

A good plan is as much about what you won’t do, as it is about what you will. To that end, we’ve decided to postpone or drop the following features:

  • Consensus reactor refactor (see ADR-030), postponed
  • Signature aggregation (BLS signatures), postponed
  • Data storage improvements (see #4567 and #4630), postponed
  • Vote proposal signing follow-ups (see #2546), postponed
  • Canary testing, postponed
  • Randomized proposer selection, postponed
  • Custom indexers, excluded
  • Validator key rotation, excluded, and can be handled at the application level

Psst. If you made it this far, you’re probably pretty interested in Tendermint Core. If you’d like to work on it full time, we’re hiring.

--

--

Tess Rinearson
Tendermint Blog

VP of Engineering, Tendermint Core. (Previously: @Chain, @Medium.)