What is IPFS

Munish Kohli
8 min readDec 27, 2021

--

IPFS stands for the InterPlanetary File System. It’s a revolutionary technology disrupting the way internet sites are accessed. The most widely used, ubiquitous protocol for accessing the internet is HTTP.

Let’s travel a little bit into history. HTTP stands for Hypertext Transfer Protocol. It’s a standard or common language for transferring information between clients and servers which gave birth to the modern web. HTTP was originally proposed by Tim Berners Lee and Robert Caillaiu in 1990.

HTTP allowed the exchange of HyperMedia. Hypermedia refers to the non-linear medium of information consisting of not only plain text, but audio, video, graphics, and hyperlinks. Multimedia is a broader non-linear medium minus the hyperlinks. Examples of non-linear mediums include a hard drive, compact disk, and many others. Linear-medium is things of the past like audio/video cassettes, Magnetic tape data storage. HTTP adopted its data representation from another very successful encoding and data representation protocol called MIME. It stands for Multipurpose Internet Mail Extensions. MIME was invented to allow emails to carry multimedia information.

MIME is an extension of SMTP, Simple Mail Transfer Protocol. SMTP used in sending and receiving an email was proposed in 1982 is based on TCP/IP protocol. TCP/IP developed by Robert E Kahan was born out of a project called ARPANET funded by the US govt. agency ARPA, Advanced Research Project Agency. ARPANET became a reality in 1969 when four US universities were linked. Under the hood of ARPANET was the idea of exchanging information between the computers in the form of packets. These packets are routed through different paths and reconstructed at the destination. Packet in network language is the smallest unit of data. It is the smallest unit of information getting exchanged between the 2 different computers. ARPANET which gave birth to TCP/IP became the basis or the underlying language of the Internet.

Tim Berners Lee envisioned the internet as more decentralized where every machine in the world will participate. HTTP protocol coupled with the browser is a big contributor to the internet revolution, created a whole new set of businesses, had an enormous social impact, but somewhere along the way internet became more and more centralized when few players (Yahoo, Google, Facebook, Microsoft, and many others) started controlling most of the traffic.

Large exchange of information got tied up in a client-server model. Consumers of the internet have less freedom on their data and more in the hands of large players. Government agencies in rogue countries at their own can shut an internet server at any time and deny citizens of their legal rights to have a piece of information.

These days’ Tim Berner is running a company SOLID. Its goal is to radically change the way the current web works and allow us to take true ownership of our data.

Fast Forward 2015, enter IPFS.

A Stanford computer science graduate Juan Benet who migrated from Mexico at the age of 13 gave a world IPFS. IPFS is like HTTP, but instead of using location to connect to objects, it uses Peer-Peer (P2P) model (No more Client-Server) to exchange information. Any node participating in an IPFS network can host a website, document, or any file and allow other users to share the same. This means no control and no single point of failure.

P2P — Centralized Internet

In the current centralized internet, model addresses are accessed based on the location. For example, sending a request www.someabcsite.com from a browser gets translated into its IP address, a request is routed to a central location that sends the response. In case the site is rich in content a good amount of data travels between your machine and the server. It’s a waste of bandwidth. This is not the same case with IPFS. Instead of requesting a central server, requests get fulfilled from the nearest node. Chances are the nearest node has the file. Once you download the contents on your computer you are both server and a client. IPFS’s motto is to find not where the contents are, but what is it you want to find. That’s why IPFS is rightly called “The Distributed, Permanent Web”. Here is a nice video of why the IPFS by Juan Benet.

Contents addressing in IPFS

At the HTTP layer files/contents are referred through their address. Let’s say I want to refer to some document. I will cryptographically hash the document. Hash becomes a small and secure fingerprinting of the whole document. Example one of the IPFS drafts written by Juan Benet can be accessed as https://ipfs.io/ipfs/QmV9tSDx9UiPeWExXEeH6aoDvmihvx6jD5eLb4jbTaKGps. This URL is the address of the draft or also called Content Identifier (CID). There is no IP involved. Instead of referring to the server where the file is located, you are referring to the file itself. By typing the above hash in the address of the browser network is being asked which node has the content. The network connects you to the node and the file gets downloaded. The future is not far when URLs will be replaced by Hashes.

Every content address in IPFS starts with Qm. This is because files and objects in IPFS are hashed using MultiHash format and Base58 encoding.

IPFS-Under the Hood

Beneath the IPFS is an amalgamation of proven technologies Merkle-DAG, Distributed Hash Table (DHT), BitTorrent, and Git.

One of the core principles of IPFS is how it structures and links the data. Its data structure called InterPlanetary Linked Data (IPLD) is modeled on Merkle-DAG, named after an American mathematician, Ralph Merkle. DAG stands for Directed Acyclic Graph. Why Merkle tree? It’s because the Merkle tree is a proven technology within P2P/Blockchain as it ensures immutability and verifies the transactional integrity of data. This data structure also has a useful property called Deduplication. Through this property, exact data contents are stored only once in a Merkle tree. IPFS stores data in blocks. Each block has a size of about 256 Kb.

Here is an example of how a file will be organized in IPFS using the Merkle tree. Let’s say you want to store a file in IPFS. The total size of the document is 1,698 kb. Since each block can only contain about 256 Kb this file will be divided into 7 blocks (1698/256). Each block will be hashed with its CID. All those hashes will again be hashed to get a single hash. This last single hash will be a root hash. All hashes will be linked to root hash in a tree-like structure. You will provide this root hash to let someone read the contents of the MS word document. The root hash block is referring to the sub or leaf nodes. There can be sub-nodes of sub-nodes and so on which can make look like a Merkle tree as below.

Merkle-DAG is a fundamental part of a blockchain. Bitcoin employs the Merkle tree in maintaining its transactions. In reality, when you add a file to IPFS, IPLD is all happening behind the scenes. The job of IPFS is to bring all of its sub contents together when a file is requested. A common user will hardly interact with Merkle-DAG.

Routing and Searching

IPFS uses Distributed Hash Table (DHT) for routing and locating the particular file requested by a node. DHT is maintained when new data is added or when nodes join and leave the network. DHT will be looked at to find which nodes have the content. For example, when contents for hash QmV9tSDx9UiPeWExXEeH6aoDvmihvx6jD5eLb4jbTaKGps are needed, the DHT table will be searched, a node address will be found and contents will be delivered. According to the white paper, small data values less than 1 kb are stored directly on the DHT. For larger values, DHT stores reference to the NodeIds of peers who can serve the block. Under the hood IPFS uses Kademlia , SKademlia (extension of Kademlia) and CORAL implementation of DHT. Nice article to get more info on DHT.

Data Exchange

IPFS uses Bitswap as its data exchange model. Bitswap gets the data blocks when requested by the network. Data Routes for the request are provided to Bitswap from the DHT. In the second step, Bitswap delivers blocks to nodes that want them.

Bitswap is the data exchanging model used in IPFS, it manages to request and send blocks to and from other peers in the network. Bitswap serves two main purposes, first is to get blocks that are requested by the network (it have routes because of DHT). The second is to send those blocks to nodes that want them.

Bitswap is a message-based protocol, as opposed to response-reply. All messages contain want lists or blocks. Upon receiving a want list, a node should consider sending out wanted blocks if they have them. Upon receiving blocks, the node should send out a notification called a ‘Cancel’ signifying that they no longer want the block. At a protocol level, BitSwap is very simple. Here is one nice article on Bitswap.

Conclusion

  1. Participation in IPFS is free
  2. In its whole network, IPFS does not keep the redundant file. It also maintains versions of the file. In case the file was changed 3 times all 3 versions will be maintained and can be trace backed
  3. As IP address is hard to remember so is the IPFS hash. A human-readable name (Ex: www.yahoo.com) is typed in the browser which gets translated into its IP. In IPFS, IPNS comes as a rescue to locate an IPFS hash
  4. Storing even small files on the Ethereum Blockchain networks can cost lots of gas resulting in a costly transaction. IPFS can complement Ethereum smart contracts world by storing files in its network and Ethereum will store only the hash on its network. IPFS guarantees some data immutability like any blockchain interesting
  5. Interesting slides which describe the transition from Web 1.0–3.0, the world of blockchains and IPFS
  6. IPFS uses the Merkle tree to structure large files. Consider this site https://flyingzumwalt.gitbooks.io/decentralized-web-primer/content/ipfs-dag/lessons/files-as-dags.html
  7. Consider this article when you add a file to IPFS https://medium.com/textileio/whats-really-happening-when-you-add-a-file-to-ipfs-ae3b8b5e4b0f

IPFS is a fantastic platform for hosting decentralized files without worrying about DDoS attacks and server problems. It just works and it’s ideal for static websites.

Hope you liked the article. It will be nice to provide feedback and a couple of claps. Will encourage me to keep on writing further articles.

--

--

Munish Kohli

Technology Enthusiast | Business & Data Analyst | Machine Learning | Big Data | Blockchain | IBM Hyperledger | Ethereum Smart Contracts