Celestia: The General-Purpose Data Availability Layer of the Decentralized Internet

We’re thrilled to support Celestia as the incredibly talented team breaks new ground in solving the real bottleneck of scaling: data availability

Georg (Juri) Stricker
Signature Ventures Blog
11 min readMar 19, 2021

--

Celestia (previously LazyLedger) is building a scalable general-purpose data availability layer for the decentralized internet of the future. While many scalability solutions are tied to a specific Blockchain, Celestia helps to solve the issue from the ground up by focusing on the real bottleneck: data availability. This approach makes large amounts of data accessible, ultimately scaling the entire decentralized Blockchain ecosystem. We recently participated in their $1.5M Seed-Round alongside Interchain Foundation, Binance Labs, Maven 11, KR1, P2P Capital and others.

In 2017, for a short period of time, Blockchain had broken into mainstream discussion. While a few were excited by the opportunities a decentralized world presents and happily put everything on the Blockchain, most were cautious or outright skeptical of the promises a technology brings that can only process one block of transactions at a time. At the end of November 2017, Dapper Labs released their Blockchain-based game Cryptokitties into the world which played a big part in congesting the Ethereum Blockchain, accounting for almost 12% of all transactions. This not only seemed to prove the skeptics right, it also accidentally created the most popular image of the scalability problem Blockchain faces until today.

Fast forward two and a half years and Cryptokitties have found themselves a new home on Dapper Labs’ new Blockchain Flow. A testament to the fact that Ethereum might not be the “World Computer’’ after all and different applications might require different Blockchains for optimal performance. Meanwhile the Decentralized Finance (DeFi) ecosystem on Ethereum has proliferated, with decentralized exchanges (DEX) and stablecoins leading the pack. In fact, the most prominent DEX, Uniswap, alone makes up about 20% of the transaction fees. The success of sending US Dollars within seconds to any wallet and trading tokens without intermediaries combined with essentially infinite flexibility to program capital allocation shows what a digitally-native financial system could look like. The only problem: success creates demand for block space and on the Blockchain block space is notoriously scarce and expensive. This is the scalability problem.

Scalability and Throughput

Scalability is often confused with an increase in throughput. If one machine can process and validate x transactions (i.e. produce blocks), then adding a second machine will double the amount of processed transactions. By doing this one has increased the throughput two-fold, but with a commensurate two-fold increase in costs (on hardware, energy, etc.). Scalability on the other hand, is the mechanism to increase throughput with a sub-linear increase in costs. Phrased differently, the same machine would be able to process and validate more transactions. Many early Blockchains like EOS or TRON that at their core were built around a different block interval or number of consensus nodes claimed to be more scalable than Ethereum. However, as John Adler (Chief Research Officer at Celestia) showed at his presentation at EthDenver last year (around 2:30min into the video), this has nothing to do with scalability. In fact, the reason EOS has a higher throughput than Ethereum is because each node is required to use more powerful hardware. Likewise, an increase in block size does not increase scalability as the transactions in the larger blocks still have to be processed just like in the smaller blocks before. This is also known as Layer-1 scaling, since one directly changes a parameter of the Blockchain to increase the transaction capacity. All in all, this is not to say that increasing throughput is bad (on the contrary), but increasing scalability is significantly harder and more beneficial.

On the other hand, more recent Blockchains projects like Cosmos, Polkadot or even Ethereum 2.0 seem to take a more promising route. In contrast to Blockchains that popped up during the 2017/2018 years within months, their development, including prior research, has often taken years and they have been rolled out gradually, adding functionality over time. Above all, this approach is the recognition of the sheer complexity of a Blockchain that has to deliver decentralization and security at scale. Apart from scaling approaches directly on the Blockchain (layer-1), most of the research is concentrated on layer-2 scaling. Those are also called “off-chain” because the modifications are plugged into the Blockchain without directly modifying the core mechanics of the Blockchain. Roughly, layer-2 scalability concepts can be categorized in two camps:

  • State channels: These are Blockchain interactions that are conducted within the Blockchain network but are never recorded on the chain except for the start and end state of the interactions. This property has the advantage that state channels make use of the security model of the underlying Blockchain and do not require additional validators. The downside is that they are limited to a fixed set of participants and are capital inefficient as they have to be pre-paid in full in order to be used. Bitcoin-based Lightning is the most prominent state-channel implementation facilitating payments.
  • Sidechains: These are separate chains that are compatible with the mainchain and contain their own security model and block parameters. The reason the chain is a “side” to a “main” is purely functional as in theory both the mainchain as well as the sidechain are independent Blockchains. However, both are optimized for different tasks with the sidechain regularly writing to the mainchain which is responsible for the final settlement of transactions.

Many of the former solutions were developed with specific use cases in mind (above all payments) while chasing the VISA capacity for transactions per second. Thus, payment channels seemed like a promising approach. However, the ongoing roll out into production slowly shows that they cannot be easily generalized to other areas beyond payment due to explicit reasoning and application-specific development complexity. Moreover, they are capital inefficient as each channel has to be funded in advance in order to be used.

Sidechains were another way people started to think about scalability. Not every transaction is equally important to everyone in the network. Thus, we can create chains with a specific purpose (e.g an app or a process) that are anchored to the mainchain and have their own process parameters (block size, block time, etc.). From there, funds can be moved to the sidechain, used and settled back on the mainchain. However, this concept itself does not increase scalability. Since transactions are now processed in parallel, throughput has grown by the number of sidechains while settlement is still bound by the throughput of the mainchain. This means that although a transaction might have been “approved” it can be rolled back in some time in the future because of some malicious behavior. Even worse, funds can be potentially stolen, since sidechains rely on their own independent security model and do not benefit from the mainchain. Thus, initially, realizing the benefits of a sidechain while devising an architecture that can take care of the risks made sidechains rather unappealing compared to state channels. Even more so, since the basic concept does not offer obvious scalability advantages.

Rollups and the Future of Scaling

Ethereum in particular has amassed a lot of research around scalability and it comes with little surprise that most upcoming solutions are released for Ethereum first. Moreover, given Ethereum’s general computation character most research has gone into sidechain-related projects as their architecture is easily generalizable compared to state channels. In fact, from the list of current scaling solutions almost all are sidechains in the form of Rollups or Plasma.

Figure 1: Rollups. Transactions in a sidechain are being put in a Rollup and published on the mainchain (blocks in the center) by the respective block producers. In the case of ZK-Rollups, a validity proof is also published. Everyone can verify the transactions either by checking the validity proof or by submitting a fraud proof in case an invalid transaction is found (red blocks).

Rollups in particular have taken the community by storm. Loosely speaking they offer a mechanism to put snapshots of a sidechain on the mainchain in regular intervals with minimal trust assumptions. Technically, Rollups present a type of sidechain with a two-way peg to the mainchain where transactions get regularly combined into a single hash (”rolled-up”) and committed to a smart contract on the mainchain. The transactions of the Rollup block are also committed to the mainchain, but not executed. Additionally, they are “trust-minimized”, which means that the funds on the sidechain cannot be stolen by the block producers of the sidechain even if they all collude. This is due to the interoperability mechanism between mainchain and sidechain. Essentially, this mechanism defines the two types of Rollups out there:

  • Optimistic Rollups: Here, only the bare-minimal information without any proof is published to the mainchain and only in the event of malicious behavior further action is required (thus “optimistic”). Bad transactions (and thus bad Rollup blocks) can be detected and corrected by anyone through a fraud proof (a more recent version of the paper can be found here) within a time frame after a Rollup has been committed to the mainchain.
  • Zero-Knowledge Rollups (ZK-Rollups): Here, a validity proof is published together with the Rollup to the mainchain which can be used to verify that the transactions in the Rollup are correct. The construction of the validity proof is more complicated and resource intensive. However, once the proof is committed the finality of the Rollup block is only dependent on the finality of the mainchain.

Data Availability: The Layer That Ties Everything Together

The main difference between Rollups and classic sidechains is that in Rollups all transactions are published to the mainchain, but not validated directly. Thus, they are not executed. This resolves two problems: a.) the reliance on a (semi-)trusted intermediary between side- and mainchain and b.) no validation means no need to execute the transaction on the mainchain (but only on the sidechain) which is the resource-intensive part. Once all the data is published on-chain anyone can check the validity of the Rollup block. This is called data availability and it turns out it is easier to scale than execution.

Data availability is concerned with the problem that all data that should have been published is actually published and available. This problem is so essential that without it Rollups wouldn’t work at all since both fraud proofs and validity proofs rely on all the data being available. But data availability also has another important implication: Not all nodes in a Blockchain network are equal, nor should they be. In order to build applications on the Blockchain that everyone can easily use, it is important to extend the security and decentralization of a Blockchain to hardware-light devices such as smartphones. Consequently, those nodes are called light-clients, in contrast to full nodes that check every transaction and require stationary hardware. Light clients are particularly vulnerable since they only download a little data, they rely on a full-node to trust the Blockchain. A potential solution to this problem was proposed by Mustafa Al-Bassam (CEO of Celestia), Alberto Sonnino and Vitalik Buterin in the form of data availability proofs. They can be used to guarantee the light-client that the current state of the Blockchain they are seeing is indeed correct. To be precise, they guarantee that the data behind the Blockchain is correct, which ensures that a proof can be constructed to detect invalid transactions.

Combining the Blockchain with the concepts of sidechains, Rollups, data availability proofs and/or validity proofs adds up to a significant gain in throughput. Additionally, Rollups offer some great scalability advantages through transaction compression or signature aggregation. There is only one key piece they all rely on: data availability.

A Scalable General-Purpose Data Availability Layer to Power the Internet of Blockchains

Proofs can only be constructed if all data is available and thus Rollups only work if all data is available. Last year, Vitalik Buterin published what in his opinion is the roadmap for Ethereum in the mid-term and long-term: A Rollup-centric Ethereum Blockchain where Ethereum is the scalable data availability layer to the connected Rollups. However, as Mustafa points out, from the view of the entire Blockchain ecosystem this is a rather narrow vision as it focuses on Ethereum only.

Figure 2: Celestia. Current Blockchain architecture involves a mainchain that combines the consensus and execution layer. Thus smart contracts built on top are using the same execution environment despite potentially different requirements (left). Celestia separates the consensus layer (data availability layer) from the execution layer. This way smart contracts are free to choose their execution environment while benefiting from the scalable consensus layer of Celestia.

There is a case to be made that the future of Blockchain lies in a network rather than a single general purpose Blockchain powering all applications. The reason for this is that different types of applications require different types of execution environments that can optimize for their needs. This is the reason why core financial infrastructure is different from the infrastructure used by web applications which in turn runs on a different infrastructure than AI and Machine Learning research. In Ethereum, Rollups (and thus applications) are ultimately bound by their environment which at the core is the Ethereum’s Virtual Machine (EVM). While this can be optimal for some applications it covers only a small part that currently contains mostly financial applications.

Just like different programming languages and frameworks offer each developer the choice of the best possible environment, applications running on Blockchain should be able to choose their execution environment without limitations. If applications can be placed on a separate Rollup based on their needs then the only thing missing is the scalable data availability layer powering them. This is Celestia (previously LazyLedger), the general-purpose data availability layer of the decentralized internet.

It is best described in their own words: Similar to how cloud services like Amazon Web Services (AWS) made it possible to launch new virtual servers with their own operating systems within seconds by using the same physical server, Celestia is a decentralized project that aims to make it possible to launch decentralized Blockchains quickly by using the same consensus layer.

From a technical perspective, Celestia is a stripped layer 1 that only does the core things a layer 1 needs to do — in a scalable way: order transactions, and make the data for them available. Celestia only orders and publishes any arbitrary data that developers throw at it. It does not perform any computation on the data.

Because there is no on-chain smart contract environment in Celestia, all execution happens off-chain, using optimistic Rollups. While every other layer 1 follows the “world computer” paradigm, where the chain provides both consensus and execution, Celestia goal is to make the Blockchain stack more modular by decoupling consensus and execution. Developers can therefore define their own execution layers. At its core, Celestia relies on two key technologies: Optimistic Rollups and data availability proofs. For a deep dive into data availability proofs and the technology behind Celestia please refer to John Adler’s whiteboard session on YouTube (part1 and part2).

Deeply rooted in the Blockchain research and development community, the incredibly talented team behind Celestia is leading the charge towards a new architecture of the decentralized internet. The three founders Mustafa Al-Bassam, Ismail Khoffi and John Adler have years of experience in developing decentralized and permissionless systems and have pioneered sustainable scaling and optimistic Rollups. Additionally, they have assembled a great team of engineers and advisors to help them realize their vision. We at Signature Ventures are excited to back such exceptional founders and are grateful for having been part of that journey from day one.

--

--

Georg (Juri) Stricker
Signature Ventures Blog

Digging through all the noise towards Crypto enlightenment @signature_vc