Tendermint Core State Sync for Developers
Tendermint Core 0.34 introduces support for state sync. This allows a new node to join a network by fetching a snapshot of the application state at a recent height instead of fetching and replaying all historical blocks. Since the application state is generally much smaller than the blocks, and restoring it is much faster than replaying blocks, this can reduce the time to sync with the network from days to minutes.
This article will provide a brief overview of the Tendermint state sync protocol for application developers. For more details, please refer to the ABCI application guide and the ABCI reference documentation.
Cosmos SDK 0.40 will include automatic support for state sync, so developers using it will not need to implement the state sync protocol described in this article themselves. A separate blog post will also be published describing how node operators can make use of state sync to sync new nodes with Cosmos SDK applications.
State sync snapshots
A guiding principle while designing Tendermint state sync was to give applications as much flexibility as possible. As such, Tendermint does not care what snapshots contain, how they are taken, or how they are restored. It is only concerned with discovering existing snapshots in the network, fetching them, and passing them to the application via ABCI. Tendermint uses light client verification to check the final app hash of a restored application against the chain app hash, but any further verification must be done by the application itself during restoration, if necessary.
Snapshots consist of a set of binary chunks in an arbitrary format. Chunks cannot be larger than 16 MB, but otherwise there are no restrictions on them. Snapshot metadata, as exchanged via ABCI and P2P, contains the following fields:
- Height (uint64): the height at which the snapshot was taken.
- Format (uint32): an arbitrary application-specific format identifier (e.g. version).
- Chunks (uint32): the number of binary chunks in the snapshot.
- Hash (bytes): an arbitrary snapshot hash for comparing snapshots across nodes.
- Metadata (bytes): arbitrary binary snapshot metadata for use by applications.
The format field allows applications to change their snapshot format in a backwards-compatible manner, by providing snapshots in multiple formats, and choosing which formats to accept during restoration. This is useful e.g. when changing serialization or compression formats, as nodes may be able to provide snapshots to peers running older versions, or make use of old snapshots when starting up with a newer version.
The hash field contains an arbitrary snapshot hash. Snapshots that have identical metadata fields (including hash) across nodes are considered identical, and chunks will be fetched from any of these nodes. The hash cannot be trusted, and is not verified by Tendermint itself, but this guards against inadvertent nondeterminism in snapshot generation, and may be verified by the application if desired.
The metadata field can contain any arbitrary metadata needed by the application. For example, the application may want to include chunk checksums to discard damaged chunks, or Merkle proofs to verify each chunk individually against the chain app hash. In Protobuf-encoded form, snapshot metadata messages cannot exceed 4 MB.
Taking and serving snapshots
To enable state sync, some nodes in the network must take and serve snapshots. When a peer is attempting to state sync, an existing Tendermint node will call the following ABCI methods on the application to provide snapshot data to this peer:
- ListSnapshots: returns a list of available snapshots, with metadata.
- LoadSnapshotChunk: returns binary chunk data.
Snapshots should typically be generated at regular intervals rather than on-demand: this improves state sync performance since snapshot generation can be slow, and avoids a denial-of-service vector where an adversary floods a node with such requests. Older snapshots can usually be removed, but it may be useful to keep at least the two most recent to avoid deleting the previous snapshot while a node is restoring it.
It is entirely up to the application how to take snapshots, but it should strive to satisfy the following guarantees:
- Asynchronous: snapshotting should not halt block processing, and it should therefore happen asynchronously, e.g. in a separate thread.
- Consistent: a snapshot should be taken at a single isolated height, and should not be affected by concurrent writes e.g. due to block processing in the main thread.
- Deterministic: snapshot chunks and metadata should be identical (at the byte level) across all nodes for a given height and format, to ensure good availability of chunks.
As an example, this can be implemented as follows:
- Use a data store that supports transactions with snapshot isolation, such as RocksDB or BadgerDB.
- Start a read-only database transaction in the main thread after committing a block.
- Pass the database transaction handle into a newly spawned thread.
- Iterate over all data items in a deterministic order (e.g. sorted by key).
- Serialize data items using e.g. Protobuf, and write them to a byte stream.
- Hash the byte stream, and split it into fixed-size chunks of e.g. 10 MB.
- Store the chunks in the file system as separate files.
- Write the snapshot metadata to a database or file, including the byte stream hash.
- Close the database transaction and exit the thread.
Applications may want to take additional steps as well, such as compressing the data, checksumming chunks, generating proofs for incremental verification, and removing old snapshots.
To state sync a new Tendermint node, the operator must enable state sync in the node configuration, and provide a set of RPC servers for light client verification along with a trusted block height and hash, and a trusting period during which validators can be punished for misbehavior (e.g. 2 weeks for the Cosmos Hub):
enable = true
rpc_servers = “rpc.a.com:26657,rpc.b.org:26657”
trust_height = 2568653
trust_hash = “D915BA86676F490AE93E8988EA5B6CD0A5FC473CE3484E912FF3041FD5753D30”
trust_period = “336h”
The trusted hash must be obtained from a trusted source, e.g. a block explorer, but the RPC servers do not need to be trusted. Tendermint will use this to obtain trusted app hashes from the blockchain in order to verify restored application snapshots. This app hash and the corresponding height are the only pieces of information that can be trusted when restoring snapshots, everything else can be forged by adversaries.
When Tendermint starts up, it will check whether the local node has any state (i.e. LastBlockHeight is 0), and if it doesn’t then it will begin discovering snapshots via the P2P network. These snapshots will be provided to the local application via the following ABCI calls:
- OfferSnapshot(snapshot, apphash): offers a discovered snapshot to the application.
- ApplySnapshotChunk(index, chunk, sender): applies a snapshot chunk.
Discovered snapshots are offered to the application and it can respond by e.g. accepting the snapshot, rejecting it, rejecting the format, rejecting the senders, aborting state sync, and so on.
Once a snapshot is accepted, Tendermint will fetch chunks from across available peers, and apply them sequentially to the application, which can choose to accept the chunk, refetch it, reject the snapshot, reject the sender, abort state sync, and so on.
Once all chunks have been applied, Tendermint will call the Info ABCI method on the application, and check that the app hash and height correspond to the trusted values from the chain. It will then switch to fast sync to fetch any remaining blocks (if enabled), before finally joining normal consensus operation.
How snapshots are actually restored is entirely up to the application, but will generally be the inverse of how they are generated. Note, however, that Tendermint only verifies snapshots after all chunks have been restored, and does not reject any P2P peers on its own. As long as the trusted hash and application code is correct it is not possible for an adversary to cause a state synced node to have incorrect state when joining consensus, but it is up to the application to counteract state sync denial-of-service e.g. by implementing incremental verification and rejecting invalid peers.
Note that state synced nodes will have a truncated block history starting at the height of the restored snapshot, and there is currently no block backfill (although this may be added in the future). Networks should consider the broader implications of this, and may want to make sure at least a few archive nodes retain the complete block history, e.g. for auditability and backup.
State sync will greatly improve the experience of joining a network, reducing the time required to sync a node by several orders of magnitude, and affording developers significant flexibility in how to implement it for their own applications. We hope this article has been useful in outlining how to implement state sync, and recommend reading the ABCI application guide and ABCI reference for more details.