IPFS in a Nutshell
The IPFS (Interplanetary File System) is a p2p distributed file system that seeks to connect heteregenous devices using the same file system. It provides a both decentralized and permanent web, that offers several benefits such as a high throughput, DDoS protection, no single point of failure (due to its decentralized nature), a content-addressed storage model and a versioned file system.
The IPFS protocol can be partitioned into the following layers.
From bottom up:
- Network: network stack for communication between peers, such as transport protocols, reliability mechanisms, NAT traversal techniques, integrity checks, authentication
- Routing: routing system that finds other peers’ network addresses and peers who store particular data
- Block Exchange: organization of data distribution (communication, of what data a peer can provide and what data he’s looking for)
- Object Merkle DAG: a Merkle DAG in which objects are linked using cryptographic hashes of their identities, providing a number of benefits, such as tamper resistance and deduplication and the ability to build a variety of data structures on top of it
- Files: modelled on top of the DAG, files are represented as a set of objects, featuring a versioning system simliar to Git’s
To prevent reinventing the wheel, the IPFS stack is formed using a variety of successful projects. The following sections will provide information regarding the implementation for each protocol layer.
Prior to understanding the implementations, some basic knowledge about some key technologies is beneficial.
“A HT (hash table) uses a hash function to compute an index into an array of buckets or slots, from which the desired value can be found.”
Distributed Hash Tables
A DHT (distributed hash table) is a HT that maps keys to peers on a distributed network (instead to an index in a table) that allows any participating node to get the value by providing the associated key.
Successful DHT projects include:
- Kademlia DHT: provides efficient lookup, relies on average contact log2(n) nodes and offers resistance to various hacks
- Coral DSHT (the ‘s’ in DSHT stands for ‘sloppy’): extends Kademlia by storing addresses to peers who provide data (instead of addressing the data directly) and introduces clusters
- S/Kademlia DHT: extends Kademlia by Sybill attack prevention
Block Exchanges — BitTorrent
BitTorrent is a p2p filesharing system that coordinates swarms (networks of untrusting peers) in order for them to share files to each other. It uses a quasi tit-for-tat strategy (contributing nodes are rewarded while leeching nodes are punished) and tracks the availability of file pieces, sending the rarest ones first. The latter results in even non-seeding peers to be capable of trading files with each other and takes load of the seeding peers.
Version Control Systems — Git
VCSs (version control systems) track file changes and enables efficient distribution of different file versions. Git is built on a Merkle DAG (directed acyclic graph, a merkle tree that is deduplicated, does not need to be balanced with its non-leaf nodes containing data).
Git manages files (blob), directories (tree) and changes (commit) as immutable objects, which are content-addressed.
Self-Certified Filesystems — SFS
Remote filesystems are addressed using the following scheme:
# Location ~ server network address # HostID := hash(PublicKey, Location) /sfs/<Location>:<HostID>
Therefore, the name of an SFS file system certifies its server. Users can secure the traffic by verifying the server’s public key and negotiating a shared secret.
Libp2p provides a high-level, developer friendly, extensible set of modules that contain implementations of network protocols that are suitable for p2p communication. The following IPFS layers are implemented using libp2p:
- Block Exchange
Each node has a NodeId which is the cryptographic hash of its public key. When peers initialize a connection, they exchange their public key and check, if
hash(other.PublicKey) equals other.NodeId. If
false, the connection is terminated.
The network stack features:
- transport protocols: any transport protocol can be used, such as TCP, UDP, Websockets as well as more exotic ones, such as QUIC and WebRTC
- reliability: can be provided using Bittorrent uTP or SCTP
- connectivity: ICE Nat traversal techniques
- integrity: can be checked using a hash checksum
- authenticity: can be checked (optionally) using HMAC with the sender’s public key
In order to find other peers’ addresses and peers who can serve particular objects, a DSHT is used. While small values (<= 1KB) are stored directly on the DHT, for larger files only references are stored, which are the
NodeIds of peers who can serve the block.
For data distribution, IPFS uses BitSwap, a protocol inspired by BitTorrent. The BitSwap API requires peers to provide the fields
want_list (data, a peer is looking for) and a
have_list (data, a peer can serve). BitSwap acts as a persistent marketplace, where peers can exchange their data.
Object Merkle DAG
A merkle DAG is used for storing and distributing data blocks quickly and robustly. Objects are linkes using cryptographic hashes of the targets. With its architecture being simliar to Git’s, the merkle dag provides content addressing, tamper resistance (since content is verifiable by its checksum) and deduplication (objects with same contents are seen as the same object).
Through its flexible API, the IPFS merkle DAG allows for storing data in custom data formats, that IPFS might not even understand. A variety of data structures can be modelled using the IPFS merkle DAG: key-value stores, RDBMSs, linked data triple stores (?), linked document publishing systems, linked communication platforms and cryptocurrency blockchains.
The filesystem itself is also modelled on top of the merkle DAG. Since the object model is very close to Git’s, a conversion between the two systems is possible. The supported object types are:
block(a variable-size block of data)
list(a collection of blocks of other lists)
tree(a collection of blocks, lists and other trees)
commit(a snapshot in the version history of a tree)
While objects in the merkle DAG are immutable, the naming is mutable. Further, human-friendly names are enabled by various techniques such as peer links (public, node-scoped aliases that points on another node’s public key), DNS, name shortening services and proquint pronouncable identifiers.
IPFS is an ambitious project, that introduces the vision of a new decentralized internet infrastructure and offers solutions for p2p applications on how to organize data. It has the potential to push the web to:
“push the web to new horizons, where publishing valuable information does not impose hosting it on the publisher but upon those interested, where users can trust the content they receive without trusting the peers they receive it from, and where old but important files do not go missing.”
My opinion on IPFS, regarding replacing the centralized internet with a decentralized counterpart
Despite being convinced that IPFS will be used as the backbone for many p2p use cases successfully, I’m not sure if it can replace the whole world-wide-web eventually. In recent years, we’ve been moving away from storing our data on our devices towards storing them in the cloud instead. Although research enables us to store our data on storage mediums with bigger capacity, I feel like many users still struggle with lack of free disk space on their devices. In a fully decentralized web according to IPFS’s vision, each node would be required to provide some of their precious disk space to store parts of the data of the network.
Originally published at takahser.github.io.