Ethereum’s New Data Economy

Forthcoming mainnet upgrades suggest a future of incentivized data preservation and a shared responsibility to encode the past

Takens Theorem
Etherscan Blog

--

Ethereum’s core devs are already approaching another major upgrade to mainnet. This upgrade will center on Ethereum Improvement Proposal #4844 (EIP-4844). They’ve designated a new portmanteau, “Dencun,” to refer to this upgrade (combining “Deneb” and “Cancun,” for updates to the consensus and execution layers, respectively).

EIP-4844 may bring down transaction costs on mainnet, but its focus is on reducing fees for Ethereum’s second layers (see my post here about L2s). To accomplish this, this EIP's approach is all about data. The EIP will improve the way in which L2s encode data on mainnet. L2s currently devote much of their fees to writing to Ethereum mainnet for validating their ledgers (using transaction calldata). This also increases fees on mainnet. You can see this on Etherscan’s “gas guzzler” list here. 5%–10% of mainnet fees are often related to L2s, such as zkSync and Arbitrum.

Example gas guzzlers during Jul. 2nd, 2023 with zkSync and Arbitrum near the top

EIP-4844 is therefore significant. In this upgrade, users of Ethereum (such as L2s) will be able to encode so-called blobs of data. As part of a new transaction type, these blobs will be cheaper because the data will only persist for 30 days. There will be a second fee market on mainnet for the cost of committing blobs on the Beacon chain (the consensus layer). Blob fees will have a dynamic similar to how EIP-1559 governs supply and demand (see here for a great summary). All this complexity (including fascinating details about the blob data itself) are by design; they are meant to bring Ethereum closer to future scaling upgrades. And L2s can use these cheaper blobs to validate their ledgers.

Some early discussions of the blobs

But EIP-4844 introduces for the first time a big idea in Ethereum’s future updates: transient data.¹ This upgrade got me thinking about its implications. Other planned protocol changes also have this property of temporary data on chain. A bird’s-eye view over the planned upgrades reveals that data is an important part of Ethereum’s future. Or, put differently, the absence of data is an important part of that future.

Let’s consider some other examples. I’ll focus on NFTs to illustrate what data temporariness means for the future. Despite concerns with transience, this series of upgrades represents a growing data economy for Ethereum.

Pruning Historical Data: EIP-4444

I’m especially curious about implications for applications that make use of on-chain data. In particular, there is a growing landscape of NFTs that use on-chain data storage. On-chain NFTs store their data on chain because the asset (artwork, PFP, etc.) is purportedly forever — you can always retrieve it on chain.

Hundreds of NFT projects are now fully on chain; see 0xchain.art for an authoritative list

But these upgrades and the temporariness of chain data raise important questions. There are legitimate concerns about how the data will be stored and made available.

Consider another major improvement proposal: EIP-4444. This EIP may be implemented in the coming year or two. The idea of this proposal is pretty simple: Ethereum nodes will no longer be required to hold onto historical records of transactions beyond one year. This will include block headers, calldata, and so on. This can impact applications that make use of historical data, such as market analysis or economic research. It can also impact some NFT projects. For example, some prominent NFT projects store their code or data in calldata. You can see this on Etherscan too. Here’s the C code to generate one of 0xDEAFBEEF’s archetypal projects, Synth Poems. It is in the calldata used for this transaction (its hash is recoverable from contract functions here):

This code would be needed to rebuild the hypnotizing audiovisual experiences of 0xDEAFBEEF’s pieces. EIP-4444 would prompt nodes to delete this calldata because it is from over 2 years ago. (And that means that even if you spun up a node yourself in the future, you’d not have access to this data.)

Still frame from a Synth Poem.

An important distinction here is between memory and storage. Because 0xDEAFBEEF’s code is in calldata, it is at risk in the EIP-4444 upgrade — it is not accessible in the EVM, and calldata is only in memory in the moment of the transaction. So calldata is a historical transaction record, accessible to a full node that syncs the chain (but not in the EVM itself). EIP-4444 would mean this is pruned after a year.

By contrast, projects that use storage preserve data in their contract, accessible to the EVM. On-chain NFTs store data inside contract storage itself. These are part of Ethereum’s state, and so aren’t at risk by EIP-4444. This storage pattern is exemplified by Avastars and CyberBrokers. These NFT projects have a beautiful and complexly layered set of functions to assemble SVG artwork. These functions use contract storage (see my blog post here for detail). You can see the beautiful layers encoded on Avastars years ago by calling its contract storage on Etherscan.

Rendering Avastar #1 by calling its contract storage
Avastar #1: fully on-chain, in contract storage

Other planned upgrades imply that contract storage is not entirely safe either. It may succumb to a later upgrade of Ethereum that involves state expiry.

Purge of the State

At this point, you may ask why the absence of data is so important to Ethereum’s future. A compelling case is made in a Bankless episode with Vitalik. The interview is somewhat dated, but the content has aged extremely well, and remains a crystal clear discussion of many roadmap features.

At about 40:00 in this interview, Vitalik summarizes the challenges that data will pose for those who wish to participate in Ethereum’s security — such as by running a node. When Ethereum scales, it would produce petabytes of data per year under the current data model. This is far too prohibitive for most participants because they would be expected to completely sync up with this growing blockchain data.

Chain size is already considerable; see Etherscan’s charts

The concern about data also applies to the very state of the Ethereum blockchain itself — to storage, mentioned earlier. The possibility of state expiry is also encouraged by the fact that historical data has a simpler trust model (the past is “easier to prove”). So why not prune the state itself?

This EIP proposes just this (it is currently an early “proto-EIP,” which you can read here). After a period of time, nodes could prune states too. The effects of this are non-trivial. For example, such states store balances for all ERC-20 contracts. And this would impact all NFT projects. The state also stores URI pointers to every NFT asset, and for on-chain NFTs it is arguably worse: All the metadata and the piece itself are evanescent under state expiry. (That means that, after state expiry, if you spun up a node, states of your projects beyond particular time points may not even be accessible either.)

The New Data Economy

The blobs of EIP-4844 are temporary. This bridge between L1 mainnet and L2s lasts for about a month, after which validators on the Beacon chain need not hold onto them. Where will blobs go? Will they be needed, in audits or analysis? In EIP-4444, historical data is pruned after a year, and state expiry will involve some similar timeline for state pruning. A future of “temporary data.”

To observers, this may seem concerning, especially if you’re into projects that make great use of historical data or contract storage (which is, arguably, everything; perhaps most starkly with on-chain NFTs).

But this transient data approach is a necessary one. The chain is getting too heavy. It is the “deadweight of history,” as Vitalik has described it. But this presents new challenges of data preservation, recovery, analysis and so on. And challenges present opportunities. With EIP-4844, we get a new fee market baked into the blob transaction type. EIP-4444 and state expiry present new opportunities for other markets, too. Here are a few ideas.

Centralized services

The obvious choice for maintaining both historical data and state data is centralized services. Vitalik mentions Etherscan and other approaches in his interview, too (including Beaconscan). There is incentive to maintain these data sources because they are monetized as a service. This will become more important for Ethereum beyond the so-called “Purge,” with EIP-4444 and state expiry. Tools like Etherscan are already routinely mentioned as critical infrastructure. In the future era of transient data, their importance will grow.

The Purge, as part of Vitalik’s roadmap diagram

Incentivizing distributed data preservation

Another approach to storing historical and state data is to create a distributed system (akin to IPFS) that is built on top of Ethereum. The Portal Network is aiming to create a peer-to-peer system that permits light clients that distribute the data load so that history is still accessible in a similar way to current APIs. The Graph is a prominent data infrastructure that many are hoping will approximate a fully decentralized preservation system that can be incentivized by participation in governance and paid data usage.

The Graph’s subgraph explorer; delicious mounds of chain data

State maintenance services

These next two present more interesting possibilities and pertain to state expiry. In state expiry, it is possible to keep a storage slot active on your contract in order to maintain its presence in the chain. One could imagine new contract functionalities that routinely “ping” another contract in order to maintain certain states. A customer could register with a state-maintenance server which uses an emerging standard to “ping” all contracts created by a given wallet. For a small fee, this could be “loaded” with a subscription that lasts decades into the future (akin to ENS registry). It could also be decentralized too, using a system of contracts, and customers could routinely check to ensure the system is working. If it is not, they could seek another service or setup a scheduled system themselves to call a “maintenance” contract.

State maintenance monetizes the “state tree” more fully. Some may be concerned that it’s an additional fee for users, like the lamentable “Apple peripherals” that can proliferate into higher distributed cost. But the argument against this is that data preservation is expensive, especially if it is in some tension with securing the blockchain. For this reason, data maintenance services let users pay for the privilege of such data preservation, and let validators and other participants focus on consensus and security.

State recovery services

In that Bankless discussion with Vitalik, he emphasized that history is unlikely to be lost. With the services described above, we could expect multiple more or less centralized tools for robustness to preserve historical and state data. But even without these tools, assuming you have information about the storage in your contracts, you can still recover them. State recovery could be a service too. It could provide point-and-click tools and some standards and practices for preserving history of importance to you. You can then bring that personally held data to a service, upload it and establish a proof that recovers these states.

There can be fun and fulfillment in recovery, see MoonCats!

In a summary of this state expiry, Vitalik shares a wonderful thought experiment of Alice whose work with a smart contract is one of her passions (see “Epoch 13” section). She travels and has some other events in her life that keep her from the contract for some time. Its storage is pruned from the tree. Vitalik describes how she hunts for witnesses with sufficient information to facilitate recovery of her beloved contract.

Vitalik’s little thought experiment about state expiry

Conclusion

Ethereum has to accommodate security and efficiency of its consensus mechanism amidst what we hope to be mass scale increase in future use. This goal is in tension with the wonderful yet plentiful data that blockchain creates. Forthcoming upgrades will bring a new era of “temporary data,” but it will also introduce new and interesting economic possibilities for the maintenance, recovery and curation of blockchain data.

Here’s the rendering code for the Art Blocks project Symbol 1 by Emily Weil. A beautiful quinean project; the code is the work. That code sits in storage. But in the coming years, it may not. The future data economy may help to preserve and recover it.

Symbol 1 #96, owned by me
Symbol 1 script (in 2 parts) in Art Blocks storage

Endnotes

  1. Dencun will also likely include EIP-1153, which proposes new transient storage opcodes which have very interesting computational implications — another transient data ingredient.

Further Materials

  1. Anthony Sassano just discussed EIP-4844 again on a Daily Gwei, including an update to devnet.
  2. Recently updated article on statelessness on ethereum.org.
  3. Great recent summary of EIP-4844 by Christine Kim @ Galaxy, including interesting detail about the life of a blob.

About

I am on Twitter. I spend a lot of my time on creative data visualization projects, including several fully on-chain works like the_coin, one of the first NFT projects that lets owners update contract storage to modify the NFTs (hence an interest in state expiry).

Disclosures: I own and create NFTs and sometimes hold the projects that I mention. For example, I own some Avastars. I love them. I was not paid for this post. I wrote it for fun. I hope it was interesting.

--

--

Takens Theorem
Etherscan Blog

Dynamic distributed data displays. Intermittent. Friendly.