Validator’s Note 13 — Following Celestia’s Data Flow

Youngbin Park
DSRV
Published in
6 min readMay 15, 2023

Disclaimer: This article is for informational purposes only and should not be taken as financial advice. No information contained within this article is a recommendation to invest in any of the assets mentioned. All investors are advised to thoroughly conduct their own research before making any financial decisions.

Guess what? Celestia is currently running its incentivized testnet program known as the Blockspace Race! The Blockspace Race has been running since March 7th with Phase 5 currently underway. While DSRV started out participating as a validator node, as Celestia employs various types of nodes– we decided to deploy our own rollup and run a light node as well! This allows us to learn more about how Celestia’s user data is delivered to light nodes via validator nodes and bridge nodes as a means of ensuring data availability.

To preface

Celestia is a modular blockchain that performs the role of the Consensus and DA layers that store and ensure the data availability of execution layers like rollups. Modular blockchains separate each function performed by the chain into individual layers, unlike monolithic chains (like Ethereum) where the single blockchain takes care of everything. Celestia divides its modular layers into four parts:

  1. Execution layer that executes all valid transactions
  2. Settlement layer that verifies the proof of transactions executed on the execution layer
  3. Consensus layer that confirms the order of transactions
  4. Data Availability layer (DA) that guarantees the DA of executed transactions
Source: https://docs.celestia.org/concepts/how-celestia-works/monolithic-vs-modular/

As a result, Celestia is composed of two networks with different types of nodes. The Consensus layer is a PoS network based on the Cosmos SDK, and employs validator nodes and consensus full nodes (RPC nodes). Bridge nodes connect the Consensus layer and the DA layer, while the DA layer has light nodes that perform DAS (Data Availability Sampling), as well as full storage nodes (more on DAS below).

Let’s dive into the data flow

1. Deploying a Sovereign Rollup

Celestia has a tool called Rollkit, which essentially allows anyone to deploy their own Sovereign Rollup. A Sovereign Rollup is an independent rollup that combines the Execution layer and the Settlement layer, i.e., the rollup performs the transactions and verifies proof of the rollup itself, while guaranteeing DA through the DA layer.

To better understand this, DSRV used Ignite CLI and Rollkit to deploy a Sovereign Rollup chain that uses Celestia as the DA layer. This rollup provides a random value based on block time, validator hash, and app hash. To store its data, the rollup periodically sends a transaction called PayForBlob to Celestia.

9:01AM INF successfully submitted Rollkit block to DA layer daHeight=491158 module=BlockManager rollkitHeight=254
9:02AM INF successfully submitted Rollkit block to DA layer daHeight=491161 module=BlockManager rollkitHeight=255
9:02AM INF successfully submitted Rollkit block to DA layer daHeight=491164 module=BlockManager rollkitHeight=256

Because Celestia is a DA layer shared by multiple rollups, each rollup must be able to distinguish and get its own data from all the data being stored. To do this, Celestia adds a field called NamespaceID to the Merkel tree. Rollups send PayForBlob transactions with their NamespaceID, and can then retrieve their stored data from Celestia using this form of identification.

Source: https://celestia-rollup-explorer.bharvest.io/rollups/5c2c1304e6caea61

2. Generating blocks on the Consensus layer

The PayForBlob tx is then included within the blocks of the Consensus layer. The proposer first separates the executable part and the blob part of the transaction, and then generates an NMT (Namespace Merkle Tree). Then they perform the erasure coding necessary for DAS (Data Availability Sampling) to generate a DAH (Data Availability Header).

Source: https://docs.celestia.org/concepts/how-celestia-works/data-availability-layer/

DAS is a method of proving that data is available by verifying only a part of the data. To do this, erasure coding is used to enable the recovery of all data, even if a part of the data is lost. 2D Reed-Solomon (RS) encoding, one type of erasure coding used by Celestia, can recover all data (including lost data) if 75% of the data is intact.

Source: https://docs.celestia.org/concepts/how-celestia-works/data-availability-layer/

In 2D RS encoding, the original data is divided into k x k chunks, the parity block is expanded into 2k x 2k chunks, 4k merkle roots are generated for 2k columns and 2k rows of data, and the merkle root values representing these merkle roots are included in the block header as data_hash. As for the process of validator nodes reaching a consensus on the proposed block, it’s actually the same as a typical Tendermint-based chain. The block containing our transaction was created as shown below.

Source: https://testnet.mintscan.io/celestia-incentivized-testnet/blocks/491158

3. Bridge node generates an ExtendedHeader

When a new block is created on the consensus layer, the bridge node connects with a validator node or consensus full node to receive the block. The bridge node then performs basic validation on the block and creates an ExtendedHeader by adding a DAH, Validator Set, and Commit. Armed with the ExtendedHeader, both light and full nodes can reference the DAH to perform DAS and recover the block data. When the bridge node propagates the generated ExtendedHeader to the network, all connected peers (light and full nodes) will receive it and sample it.

Source: https://github.com/celestiaorg/celestia-node/blob/main/header/header.go#L40-L45

Therefore, since bridge nodes need to connect to the validator for DAS and continue to receive block data, it is recommended that validators run validator and bridge nodes together– what we’re currently doing! This ensures that the bridge nodes can smoothly receive new blocks and forward them to the DA layer. In our case, the ExtendedHeader stored by our bridge node for block 491158 was as follows:

2023-05-15T09:01:31.964Z	INFO	header/store	store/store.go:353	new head	{"he

4. DAS via Light Node and Full Storage Node

After downloading the ExtendedHeader from the bridge node, the light node selects a random DAH and requests a data chunk for that header from the bridge node to verify that the DAH was generated correctly. In the DA layer, there are light nodes as well as full storage nodes. A full-storage node does sampling like a light node but downloads enough shares to recover and store the entire extended block. If the data recovered during this process does not match its DAH, it generates a Bad Encoding Fraud Proof (BEFP) and stops all operations. The light nodes receive the BEFP propagated by the full storage node and verify that the proof is correct. If it is a valid proof, the light nodes also halt all activity. If DAS is done without BEFP, we can ensure data is available.

$ curl -X GET <http://localhost:26659/data_available/491158>
{"available":true,"probability_of_availability":"0.9899774042423815"}

Conclusion

Let’s briefly summarize the process we saw above.

  1. The rollup uses a PayForBlob transaction to request Celestia to store its data.
  2. The proposer on the consensus layer generates a DAH through erasure coding and creates a new block containing the transaction.
  3. The bridge node receives the new block and DAH, generates an ExtendedHeader, and propagates it to the DA network.
  4. Light and full nodes perform DAS, and if they find an incorrectly generated header, generate a BEFP.
  5. If DAS is done without BEFP, the rollup can ensure data is available.

We hope this helped decipher some of the workings behind data availability on Celestia, and a big shout-out to all who helped make the Blockspace Race a success!

Written by
Youngbin Park, Research Engineer, DSRV Validator Team (Twitter @bin0_0bin)

Reviewed by
Hyunggi Kim, Software Engineer and Validator, DSRV Validator Team (Twitter @HgKim00)

Edited by
Domitille Colin, Brand Communications Manager (Twitter @domitille_marie)

--

--