The Inter-Planetary File System

Background

As the internet has gotten smarter and more pages share common resources, building web applications is less like coding and more like putting Legos together. With software like WordPress and Drupal, one need not touch a single line of code to start building their own website. Cloud computing and storage such as those offered by Amazon Web Services and Google’s Firebase allow you to serve your content without having to think once about infrastructure or scalability. As such, creating content online is more accessible than ever. But the current model has several looming problems including efficient allocation (redundancy of data) and fault tolerance (availability of data).

In old days of the internet, web content was stored on private servers at your place of work or even in your own home. If that location lost connectivity, your site was out of commission. If you wanted your content to be available through any storm, you had to manage fallback servers in geographically distant locations. Today, services like AWS and Google Firebase do that for you. You upload your content to the provider and they manage the proper allocation of resources to keep your site up and running.

Still, these companies rely on semi-centralized data centers. There is a bottle-neck (a wide one, but still) as these servers can only handle so much processing, storage, and bandwidth. There are also political concerns as these companies may be economically incensed to censor data. The Inter-Planetary File System, or IPFS project, is attempting to solve these issues with a peer to peer protocol for serving data on the web. Let’s look a little bit deeper at what this entails.

IPFS combines two extremely important protocols: BitTorrent peer to peer filesharing and Git version control. These two protocols have proven to be extremely robust and are in use across the globe today.

To be very meta, you can check out BitTorrent on Github

BitTorrent: Peer to Peer Filesharing

Unfortunately, BitTorrent has received quite a bit of infamy as a medium for the sharing of copywrit material. However, the protocol itself was designed as an efficient peer to peer filesharing method. In BitTorrent, files are broken down into small, easy-to-share pieces. Each piece is given a unique identifier, called a hash. Hashes are computed from the content of the file in such a way that only two identical files share the same hash. This way it is immediately obvious if a piece of data has been corrupted or tampered with — just check if the hash of the data you receive is the same as the hash you requested. This provides a unique, global identifier for that piece of information. When you want to download a torrent, you collect the hashes of all pieces of data that you need, usually from a trusted entity called a BitTorrent tracker. You then broadcast these hashes to the BitTorrent network. Peers who have the data corresponding to that hash will then send it to you. You can then verify that the data received matches the hash, put all the pieces together, and voíla! you have the desired file.

The reason BitTorrent is so efficient is that you don’t have to wait for some central file server to be available to receive the data you want. Nodes in the network can store bits and pieces of files, and if one peer is taking too long to provide the data you need, you can just switch to another peer who has the same data. Instead of downloading one big file from a single location and hoping your connection isn’t interrupted, you download bits and pieces from many different sources as they become available. Additionally, since the hashes of identical files are identical, you never need to store the same data more than once.

Git: Version Control Software

Version control software allows you to “go back in time” when editing a file. This is done by means of committing changes to files. You can open a text document, make some changes, and then commit those changes. Or you can create some new files, delete some unneeded ones, and edit a few here and there before committing. Git never forgets these commits, so you can always go back to a previous one. Think of it as being able to press Ctl+Z on an entire directory of files.

A version-controlled directory is called a repository. The naive way to implement version control would be to save the state of repository at every commit. That would create enormous repositories very rapidly! Git is smarter, and it only logs changes to files. Files are then rebuilt by incrementally by adding those changes all the way back from the initial creation of the repository.

To be more technical, Git creates a directed acyclic graph with Merkle hashes for each node. The hash of a file at a certain commit uniquely identifies how to “walk” the graph in order to rebuild the file as it was when the commit was made.

IPFS: A Decentralized, Global Repository for the Entire Internet

The IPFS project is an attempt to move from the current state of the internet — centralized entities sharing large files over potentially unreliable connections — to a new internet, where swarms of peers share small pieces of globally-indexed data.

In the current model, you type a human-readable address into your browser. That address is then converted into an IP address, which is a unique address assigned to a computer on the internet that you want to connect to. You then establish a connection with that computer and request any desired data. In the new model, you don’t type in the address of the computer you want to connect to, but instead you provide the address of the data itself. Then anyone who is connected to you and has that data can provide it. If none of your peers have the data, they can re-broadcast the request to their peers, and so on.

Instead of the internet being a roughly tree-like structure with individual branches routing through large trunks, it becomes a strongly-clustered network of peers. This isn’t to say that trees cannot form in the new network, but the IPFS protocol allows for more general network topologies and overall higher fault tolerance.

I highly recommend you look into the IPFS project. If you’re a tech-minded individual then you can even download and run the test version of IPFS yourself. Even if you’re not so techy, I expect IPFS will play a major role the near-future of the internet and understanding the basics will put you ahead of the curve in today’s rapidly changing world.