The Third Dimension: Long-Term Data-Availability

Pier Two
Pier Two
8 min readSep 3, 2024

--

Introduction

Humans have increasingly trended towards more complex databases, interacting with each other in more complex ways, since we began interacting via the internet. However, what is known as the internet today has only been one step in a sweeping and evolving definition of data.

Considering that local area networks, the infrastructural backbone of the internet, were initially invented by military researchers early in the Cold War it is difficult to compare what was conceived then to what is used now by billions of people each day. Along this path, there have been three distinct phases, each transforming how we store, connect to, interact with, and utilise data. There has been plenty said about these latter transformations to explain how the consumer’s experience of the internet has adapted, often summarised as:

  • web 1 (static data);
  • web 2 (interactive data); and
  • web 3 (owned data).

There has been great debate regarding the above consumer experience developments, however, little has been broadly discussed about what has almost entirely been the domain of information technology companies, that is: how is data stored?

Similarly to that of the internet, the development of databases have seen significant shifts that can be categorised into three phases, which have largely synchronised with the web’s evolution. Within this article those categorisations of the way data is stored and interacted with will be defined as:

  • db 1 (centralised data);
  • db 2 (distributed data); and
  • db 3 (public data).

Ethereum has long stood out on the frontier of web 3 and db 3, offering ownership of public data with confidentiality and integrity integrated. However, to truly become a complete information system that can realise such an expansive future as web/db 3, a crucial aspect remains underdeveloped: data availability. In addition, future infrastructure such as light-clients that are integral to many other aspects of Ethereum are reliant on these developments to realise their full potential.

This article will explore the challenges and prospects of ensuring long-term data availability in Ethereum, highlighting the work of projects such as Covalent and their Wayback Machine. The Infrastructure required to bring these changes about will be immense but alongside investigating these challenges, possible solutions will be proposed and a snapshot of what a web/db 3 future holds will be offered.

Evolution of Data Systems

Towards web 3

Characterised by static web pages and a read-only experience, some of the first sites to be publicly available on the world-wide web presented information in a manner and volume only previously possible of the world’s largest libraries. But beyond dedicated volunteers, there were few people or resources able to guarantee the accuracy or quality of these web 1 sites.

Further development allowed progression to web 2, introducing dynamic content, user-generated data, and social interactions all in real-time (mostly). This did emphasise a read-write paradigm, but one where access rights were strictly controlled and where single points of failure could render entire systems redundant overnight. Recent examples include the response to the CrowdStrike crash that afflicted many users globally. The capacity to overcome what was a reasonably minor error was hamstrung by the inflexibility of existing infrastructure to manage dynamic operating systems across millions of devices.

Now there is an ongoing shift towards web 3, focusing on introducing decentralisation and trustlessness (i.e. no single point of failure) to combat the inherent flaws of web 2, enabled by blockchain technology. Ethereum, as a leading platform in this space, provides a decentralised infrastructure that ensures data integrity and confidentiality. With Ethereum, single points of failure have been removed and any individual has the right to read or write data; to host it from their own device or to participate in a wider network where their rights are guaranteed by two-way social contracts with every other user.

These social contracts rely on more than just the soundness of the code that upholds the “law” of Ethereum, but also the active participation of users towards the public good. Unless a participant hosts data themselves there is no guarantee that this access will endure, which is where the need for a complete information system comes in.

Towards db 3

In parallel with the internet’s evolution, database systems have also progressed. Traditional centralised databases came about in the form of db 1, representing the capacity to have pointers to data in a computer’s memory and to even write new data at these pointers. This allowed for the existence of web 1 and facilitated the growth of web 2, but it soon became apparent that it was unable to scale to keep up with active public use.

As such db 2 followed and allowed for a network of centralised databases to become a distributed database, often commonly referred to broadly as the “cloud”. Improving data redundancy and read-write speeds to the magnitudes necessary for vibrant networks, db 2 allows you to view this article from anywhere around the world with millisecond latency, regardless of whether the power is out at the main server. But you would never be able to read this article without the dedicated services of many underlying pieces of infrastructure that can fail to work or be revoked at any time: certificate servers and cloud providers to name a few of many.

Blockchain-based systems represent a significant paradigm shift. Moving from distributed databases, db 3 expands to support public data controlled by individual users. Pairing with web 3’s capacity to allow for ownership of data, blockchains like Ethereum ensure that once data is written it cannot be altered, thereby guaranteeing data integrity. Data integrity however does not guarantee data availability, and ensuring availability over the long-term stands as perhaps the final challenge. Unlike db 2 which supports a distributed system with large built-in redundancies to guarantee service up-times nearing persistent availability, there are no similar guarantees for data in db 3. This isn’t to say the problem is necessarily db 3 specific, but rather that a complete information system that can support a “public-owned” future is a challenge that if solved and planned for now, will bring about untold opportunities.

Challenges to Data Availability

While Ethereum’s architecture guarantees data integrity and confidentiality, data availability remains an addressable aspect. In a blockchain, users are intended to maintain a copy of the headers of every block which increasingly grows over time. Assuming that the task of storing these block headers does not become difficult it still leaves the question as to what is done to maintain all data not stored in block headers. Maintaining such extensive records as the blockchain expands becomes increasingly out of the reach of most users as much as they may wish to. Applications for blockchains, and the required data for such applications, have expanded at a far greater rate than storage has. Based on this, the problem emerges that if only a few users have this capacity can the data truly be described as “public-owned” as defined through the lens of web/db 3? Instead, a mechanism is required that either:

  • performs these duties as a public good; or
  • incentivises these duties as a responsibility of all capable users (those with sufficient computational resources) to ensure long term data availability is possible.

Possible Solutions

Existing Projects

Pier Two has long aimed to support the infrastructure and patterns of incentive to provide for a healthy Ethereum ecosystem leading to long-term data availability. Not only as supporters of non-custodial staking but also as continued participants in Chainlink’s oracle network. Furthering the move towards web/db 3.

Projects like Chainlink aim to enhance long-term data availability in blockchain systems by developing innovative storage solutions. Chainlink’s oracle network has been a critical piece of the puzzle for ensuring data availability in Ethereum. By providing secure and reliable access to off-chain data, participants in this network have been integral to enhancing the functionality of Ethereum’s smart contracts and improving data retrieval processes.

Large magnitudes of this previous work, centered around efficient APIs and big data strategies, have also been pioneered by projects like Covalent. Which is focused on providing a GoldRush.dev API to blockchain data, and addressing this aforementioned issue regarding data availability. Notably, many of Covalent’s solutions aggregate data from various blockchains, including Ethereum, and offer a comprehensive interface for querying blockchain data. This approach simplifies data retrieval, making it more accessible for developers and businesses. By building on the work of and supporting Covalent infrastructure, participants in the wider db3 networks can work together to improve data availability.

Ethereum Wayback Machine

The Ethereum Wayback Machine, an archive project, has been one of the most significant contributions from Covalent and is fundamental to the long term viability of web/db 3 systems. Offering a philosophical and technical parallel to the challenges faced by web/db 2 through which the famous and original Wayback Machine has long aimed to confront, the Ethereum Wayback Machine aspires to archive all Ethereum data. Technically, it addresses the problem of data availability by employing a distributed network of servers and robust archiving protocols through its vast API. Philosophically, it addresses the problem of data availability by easing accessibility and lending its voice to creating a movement to expand this db 3 mission that has been started. Covalent is a fundamental component of ensuring this.

Calling upon and promoting collaboration with skilled operators and data providers that are incentivised to lend their high-quality services, the Ethereum Wayback Machine will only continue to grow. Such partnerships will continue to improve the reliability of this archive network and provide the necessary tools to ensure a future complete information system. It is important to emphasise that the Ethereum Wayback Machine currently stands as one of the most integral steps towards a fully-integrated web/db 3 future that allows for a completely new public-owned data paradigm.

Public-Owned Future

Achieving long-term data availability is part of the final frontier for developing a comprehensive information system. Such a system would combine the strengths of blockchain technology, confidentiality, and integrity, with robust data availability solutions. This would create an environment where data is not only secure and immutable but also easily accessible and retrievable over the long term. To put this into context, if long-term data availability is fully realised it would be possible for articles such as this to be published by any user across the world securely, to be shared with any readers they wish for, and to be stored as long as readers exist. Ethereum would become a whole new Internet, a web/db 3 evolution that allows for ownership of your data and guarantees that it will remain as public or private as you desire.

For now though, long-term data availability remains a critical challenge for Ethereum and other blockchain platforms. By drawing on the expertise and innovations of organisations like Covalent, the Ethereum Wayback Machine can continue to develop as a robust solution to ensure that data remains accessible and retrievable over the long term. Integral parts of the future infrastructure of Ethereum, including light-clients, will both increase the network’s reliance on data-availability as well as exposing the true advantages of such an information system. Working towards a truly complete information system has not been possible before but now that possibility is closer than ever.

--

--

Pier Two
Pier Two

Enterprise-grade infrastructure for institutional clients. Non-custodial ETH Staking. Secure Node & Validation Services. Creating the light client Lantern (C#).