Decentralized Storage: The Backbone of the Third Web

Published in

ConsenSys Media

8 min readJun 30, 2016

Since the World Wide Web hit the mainstream in 1994, we’ve seen the network expand to encompass almost every aspect of human life. The underlying infrastructure of the Internet and the services built on top of it are interdependent, one informing the other as new use cases and technologies arise. There have been two clear generations in the services and structure of the web thus far, but today we are moving into a third.

In this article we will look at what characterized Web 1.0 & 2.0 and what may characterise Web 3.0. We will then look at the storage technologies that are likely to form its backbone: the decentralized storage network IPFS and it’s incentive platform Filecoin and Swarm, an emerging Ethereum oriented storage platform that uses IPFS.

Web 1 & 2

Web 1.0 was the birth of a new idea. If we can connect all of the computers in the world through a global network, the Internet, then we should be able to make the collective content pool universally accessible. For this mass of data to be usable it needed to be indexed and browsable. This necessity was behind the innovation that led to the first generation of the World Wide Web.

Web 2.0 took this new global resource, a universally accessible pool of content, and began plugging things into it. Programs could connect and use the Web as a way to store information and communicate with each other. Central intermediaries, today’s Googles and Facebooks, assumed the roles of data silos and switchboards, offering scalable resources and routing traffic. While these new corporations have changed the way we live and provide amazing services, they also leverage their centralized position for profit and power.

In their orange paper, “Swap, Swear and Swindle: Incentive System for Swarm”, Viktor Trón, Aron Fischer, Dániel a. Nagy, Zsolt Felföldi, Nick Johnson write that:

Context-sensitive targeted advertizing offered a Faustian bargain to content producers. As in ‘We give you scalable hosting that would cope with any traffic your audience throws at it, but you give us substantial control over your content; we are going to track each member of your audience and learn — and own — as much of their personal data as we can, we are going to pick who can and who cannot see it, we are going to proactively censor it and we may even report on you, for the same reason.’ Thus, millions of small content producers created immense value for very few corporations, getting only peanuts (typically, free hosting) in exchange.

As the Web grew, we reached scaling limits. Central nodes required ever more bandwidth to cope with ever increasing data flows. Over time, as things were shuffled around, links broke and content was lost, vanishing unsearchably into the mass of information.

To compound these problems, security never achieved a level appropriate for the new communication and commerce services provided through the Web. The client-server model relies on a system of digital certificates, which are issued by third parties to secure connections. The problem is that if the third party is compromised, so, potentially, are all the connections made using its certificates.

According to Juniper Research, “the rapid digitisation of consumers’ lives and enterprise records will increase the cost of data breaches to $2.1 trillion globally by 2019, increasing to almost four times the estimated cost of breaches in 2015.” Thus, the scene is set for a paradigm shift and a cluster of new technologies are emerging. They promise to solve the problems plaguing the existing system and create a new way of using the web.

Web 3.0

Following the trend set by earlier iterations, the idea of Web 3.0 posits a change in the way content and programs interact. If central intermediaries like Facebook and Google are cut out of the picture, many of the problems we have today will go with them. Instead, content addressing and related techniques will allow content and programs to link to one another directly and in a more robust fashion. Blockchain technologies like the digital currency Bitcoin and the smart contract platform Ethereum use unbreakable public key cryptography to secure the connection between programs and protect data.

This is an alternative to the centrally issued SSL Certificates used today. Because there is no central intermediary routing traffic, connections can dynamically find the most efficient pathway through the internet and route around congestion or damage.

These systems were designed for financial transactions though. They are not suited to storing and relaying the volume of information required to replace a central server. BitTorrent is a popular solution that has excellent storage and scaling characteristics. However, navigating the Distributed Hash Table used to index content on the network can take seconds. This kind of latency is fine for large file transfers but no good for datacenter use cases.

IPFS

Described by Viktor Tron as “the lego kit for the third web,” IPFS is a new system for storing data on a large number of computers. It is transport layer agnostic, meaning that it can communicate through TCP, uTP, UDT, QUIC, TOR, and even Bluetooth.

Instead of a central server, a peer to peer network is used to establish connections. Public key cryptography is built into the node addressing system and content addressing is used to index content. Both node and content addresses are stored in a decentralized naming system called IPNS.

Node addressing and connection security

Nodes in the peer to peer network each hold private keys and release public keys, just like in Bitcoin or Ethereum.
Node addresses are derived through hashing their public keys. Allowing connection verification through message signing.
Their public keys can be used to encrypt data before it is transferred, preventing interception and theft.

Solutions to the security issues of today’s web are built into this addressing system. There is no need for a trusted central certificate issuer to provide connection verification tools and all connections can easily be encrypted by default. No more SSL.

Content addressing:

A content address is derived by hashing a piece of content.
That content address is then hashed again to derive a key name.
The key name is associated with a human readable name in IPNS (IPFS’ address registry).

In today’s web, if a file is moved, all links to that file need to be updated if they are to resolve. Because IPFS addresses are derived from the content they refer to, if the content still exists anywhere on the network, links will always resolve. This removes any need for duplication of content, except for the purposes of greater persistence security or for scaling up serving capabilities.

However, for a decentralized storage system to grow to replace the current model, it needs a way to incentivise the storage and serving of content.

Filecoin is one prospective solution being developed by Protocol Labs, Swarm is another being developed by the Ethereum foundation. Both projects make use of IPFS technology but have different philosophies on how to incentivize participation.

Filecoin

Filecoin is being developed by Protocol Labs, the same entity that is developing IPFS. While the specification is still evolving on a whiteboard somewhere in Palo Alto, it will likely take design cues from Bitcoin and related cryptocurrencies.

Filecoin will use an established consensus process already in use securing a financial network. Most likely, the Ethereum public network. By requiring nodes to solve puzzles based on randomly selected data chunks, a Proof of Work algorithm can be built, which will reward the nodes that store more data chunks and have better connectivity. Tools for adding redundancy and the ability to select nodes based on reputation, whether that be tracked within the protocol or outside it, will address the problem of persistent storage. This is an indication of the direction Filecoin is moving in, but the protocol is still in development.

Swarm

Swarm was conceived of as a storage protocol tailored for interoperation with the Ethereum smart contract ecosystem. Like Filecoin, it will piggyback on Ethereum’s consensus process in order to provide a decentralized alternative to our existing client/server infrastructure. Incentivising persistent storage is a challenge, however. The downside of a node deleting data and losing some income is potentially much less significant than a user losing his or her valuable data.

Swarm takes the approach of rewarding nodes for serving content. Because more often requested content is more profitable to store than rarely requested content, rewarding nodes only for recall would incentivise the trashing of rarely accessed data. Failure to store every last piece of a large data set can result in the entire set being rendered useless, so in these cases a solution must exist to balance this downside asymmetry.

Using content recall as the base reward mechanism and distributing content randomly among nodes, weighted for location, puts Swarm in a good place to start solving the persistence problem:

Nodes offering “promissory” storage, or storage with a promise of persistence, must first post a security deposit covering the time for which they are offering storage.
If data is lost during this period, the bond is forfeited.

The smart contract infrastructure of Ethereum automates this whole process, making the “upload and forget” experience seamless.

Conclusion

This has been a brief introduction to one part of the Web 3.0 vision. Distributed networking and storage is coming fast and promises solutions that could save the global economy trillions annually. The Web today needs a new security model and an architecture designed around contemporary use cases. Swarm and IPFS represent the most ambitious solutions to this problem. However, there are some others worth noting: Sia and Storj are two decentralized storage options that are approaching maturity and it would be a disservice not to mention them here.

As the global infrastructure adapts to the new demands we are putting on it, unforeseen opportunities will open before us. New tools will change not only the way we work, play and live our lives, but also the way we organize ourselves in groups. The real questions we must ask are: How will the world of Web 3.0 differ from the world of 2.0, and how will this technology penetrate beyond the cultures which created it?

Juan Benet of Protocol Labs and Viktor Trón of the Swarm team consulted on this piece. For follow up, I recommend the Swarm orange paper “Swap, Swear and Swindle”, this talk from the Swarm team, the IPFS white paper, as well as this talk by Juan. The aim of this article is to serve as an introduction to those resources.