A Quest for Permanent Data Storage on Web 3

Jack Boyuan Xu
Sign
Published in
7 min readJul 27, 2022

A programming language is considered Turing complete if it is able to perform any general purpose computation. In order to be Turing complete, the system must be able to remember “states” of progress through computations.

In a computer program, the state is kept in-memory during runtime or in a non-volatile storage medium between runtimes for data persistence. DApps are also computer programs, they just have astronomically high execution and storage costs — some have estimated it costs $20k to store 500KB of data on the Ethereum blockchain. This number isn’t completely accurate but it gives the right idea. It is simply impractical to store any amount of data on blockchains such as Ethereum.

Yet, we must store data, sometimes in large amounts, to keep track of state. Since on-chain storage is too costly, we have to look for off-chain options.

Centralized Off-chain Storage

There are a few factors to consider when storing data off-chain, the first being whether we wish to use centralized storage.

Centralized storage isn’t bad. In fact, it is almost always cheaper, more familiar to work with, and more performant than its decentralized counterpart. The two main problems with it are the lack of transparency and concerns of data accessibility. To access data stored on centralized servers, the request usually goes through an endpoint. Anyone who has ever worked as a backend developer knows how trivial it is to modify the returned data for any given request. This entire process is also entirely opaque with the end user having absolutely no idea if the data returned is the data they are expecting. On the other hand, if the hosting site or owner decides to pull the plug, then all the data will perish and there is nothing anyone can do about it. These are both crucial aspects that most people in the NFT space tend to overlook — for example, BAYC NFT metadata is stored on their own servers and the server administrator can rug-pull anyone anytime by swapping out metadata or outright deleting them as they wish, but nobody ever talks about it.

Lastly, centralized storage is antithetical to the ethos of Web 3. It just seems a bit odd when a supposedly decentralized application keeps its state on a centralized platform.

Decentralized Off-chain Storage

When looking into decentralized off-chain storage solutions, there are a lot of options with the most popular ones being IPFS, Filecoin, and Arweave. They all have their pros and cons but some outweigh others. Let’s take a look!

IPFS is popular, but a bad idea

IPFS is often considered as the earliest and most widespread form of decentralized storage in the Web 3 world. Anyone can run an IPFS node of their own and start accepting data storage requests from users all around the globe. The network itself is completely free-of-charge and does not require registration of any kind to get started. Sounds real nice, right? However…

There is no free lunch.

When a file is uploaded to an IPFS node, it only exists on that node. There is no automated replication built into the protocol, meaning other nodes need to make a conscious decision to replicate a specific piece of data. Yet, when everything is done free-of-charge without any incentive aside from moral principle, why would anyone do such a thing? This isn’t a huge problem if your data is frequently accessed, as when routed through other nodes they actually store a copy as a form of caching. But for data not accessed frequently, IPFS functions just like a centralized storage repository.

Due to the fact that IPFS can be used without any form of authentication or payment, it has gained and maintained tremendous popularity, especially after the boom of NFTs in the past year or so. A quick YouTube search reveals almost all NFT tutorials utilize IPFS to store metadata. However, nobody ever asks what would happen if metadata files are lost when nodes containing the files go offline? Some would point to various IPFS pinning services, which provide data persistence at a monthly cost. But if we take a step back and look at pinning services such as Fleek* and Pinata, are they really that different from a Web 2 storage stack such as AWS S3? They actually run AWS servers behind the scenes and Fleek even provides an S3-compatible API, so what’s the point of using them at all? Just to abuse the IPFS branding and ethos?

*Fleek claims to also make use of Filecoin, but from personal experience nothing I have uploaded to Fleek has ever made its way into the Filecoin network

Filecoin has incentives, but isn’t permanent

The lack of incentive in IPFS is known all too well by its creator, Protocol Labs. Thus, Filecoin was born, an incentive network on top of IPFS. Filecoin introduces a storage market where miners on the network charge a fee in exchange for their storage space and retrieval bandwidth, known officially as making a storage deal. The problem with those fees is that they are recurring. If the creator of an NFT collection simply stops paying metadata storage fees, then their data will be purged. NFT owners are powerless and receive no forewarning in this situation.

Filecoin is also quite difficult to be directly utilized in a user-facing application. It works better as a backend middleware instead of being embedded in client-side frontend applications, which make up a significant portion of Web 3 projects.

Arweave works great, with caveats

Arweave is a decentralized file storage blockchain. In contrast, neither IPFS/Filecoin nor Sia Skynet (another popular storage platform) is a blockchain. As Arweave is a blockchain, all the properties of a blockchain directly carry over, including being tamper-resistant and transactions being permanent. Miners are financially incentivized to store a full copy of the entire blockweave (Arweave blockchain). Compared to IPFS, miners on Arweave are incentivized to replicate data. Compared to Filecoin, data storage only needs to be paid once and thus can be considered permanent. Accessing data on Arweave is also much faster than retrieving data directly from Filecoin which requires 1–5 hours to unseal, although Filecoin utilizes IPFS as its cache layer for quick access.

So everything looks good, what’s the catch? Well, Arweave isn’t without its problems.

  • Uploading to Arweave requires payment in AR tokens, which is quite difficult to obtain as a US citizen.
  • There’s the problem with Arweave being an actual blockchain. You see, one of the properties of blockchains is block time. For Bitcoin, it’s ~10 minutes. Ethereum, ~10 seconds. For Arweave, it’s ~2 minutes. This is entirely reasonable and actually quite aggressive when Arweave can have block sizes of up to 1GB, but it does prove to be problematic when it comes to user experience. If the user only wanted to upload a small amount of data, for example a 200KB PDF file, but has to wait for 2 minutes for a single confirmation, they would not want to use the product ever again.

Arweave is inarguably the closest thing to a usable permanent storage solution on Web 3, yet its UX issues bar it from mass adoption. But hey, I wouldn’t be writing any of this if I didn’t have a solution to present, right?

Bundlr.Network: Arweave’s L2

Enter Bundlr, a project that aims to push Arweave towards mass-adoption. They just launched their testnet in April 2022 as the founder saw all the issues mentioned above with Arweave and set out to address them.

  • To address the first issue, Bundlr accepts payment in 14 different cryptocurrencies including ETH, MATIC, AVAX, BSC, and more.
  • To solve the annoyance of waiting for block confirmations, Bundlr provides instant transaction finality on Arweave. In other words, uploaded data will be instantly available on the official Arweave gateway instead of having to wait for 2 minutes.
  • As for the very last problem of exposing private keys, Bundlr actually does not require the user to even have an Arweave private key. It’s able to entirely abstract away anything related to Arweave client-side and utilize EVM/Solana/NEAR/Polkadot injected Web 3 providers for authentication and payment.

It’s not all sunshine and rainbows though, as Bundlr is still extremely early-stage and has some UX hurdles such as requiring users to go through multiple MetaMask popups before they can upload data and compatibility issues with Gnosis Safe, both of which are critical here at EthSign. In addition, we have utilized Biconomy meta-transactions throughout the application to sponsor gas fees for our users so they don’t need to own any cryptocurrency to use EthSign, removing a huge barrier of entry. Unfortunately, Bundlr does not yet support this feature and after a month-long trial run, we decided to look elsewhere.

Arseeding by everFinance: A robust Arweave light gateway

Arseeding is an end-to-end storage solution built on Arweave. Instead of being a frontend solution like Bundlr, Arseeding also provides an Arweave light gateway that we can run and host ourselves. Here are some of the reasons why we eventually decided to move to Arseeding:

  • Arseeding’s gateway supports submitting payloads via HTTP endpoints, reducing dependencies and making the entire process magnitudes more flexible for us.
  • We can pay the uploading fees for our users, thus only requiring a single signature from our users on their browser. There is now no need for users to deposit uploading fees and keep a balance separately, reducing complexity and the number of interactions.
  • The gateway can bundle multiple uploads together into a single Arweave transaction to reduce fees.
  • There is an API key system to restrict upload access to our gateway.

Arseeding is a very enticing solution because it offers us the possibility to make several UX improvements on top of its relative ease of use from a developer standpoint. It’s very easy to get up and running on an AWS EC2 instance and requires minimal maintenance. The implementation of Arseeding opens many possibilities for EthSign that we are excited to explore.

Conclusion

The quest for permanent data storage on Web 3 has proven arduous, yet not impossible. As the Arweave ecosystem further matures, I firmly believe that their practicality and efficiency will make them the first-choice decentralized storage solution across Web 3. The Arweave ecosystem is full of very talented and dedicated developers, so I’m excited to see the improvements they make to the L1 and L2 down the line. Until then, the quest continues.

Access EthSign Here
Twitter | Gitbook | Discord | Youtube

--

--

Jack Boyuan Xu
Sign
Editor for

Co-founder & Tech Lead @ EthSign. Blockchain Lecturer @ USC.