How IPFS works

4 min readJul 5, 2018

In this article, we shall be dealing with concepts that make you clear how IPFS works. The article is intended to clear common doubts on IPFS and make your thoughts clear.

IPFS stands for Interplanetary File System. IPFS acts as a decentralized source of data. There is a thin line that separates distributed and decentralized data source. Distributed means that the processing is shared across multiple nodes, but the decisions may still be centralized and use complete system knowledge. Decentralized means that there is no single point where the decision is made. Every node makes a decision for it’s own behaviour and the resulting system behaviour is the aggregate response.

A decentralized system is a subset of a distributed system.

Why IPFS?

Lets go by a very vague example but very relevant to our case.

Suppose humanity has colonises Mar and the first person on Mars tries to access internet services from Earth. It would approximately take 1 hour for him to access a news website. What if another person tries? He ends up taking another 1 hour and so on.

But IFPS is used, the second person on Mars shall be able to retrieve the content from the first person who came on Mars instantly. From there onwards data can spread like a wildfire on Mars.

Installing and Running a IPFS node.

Running an IPFS node is no big a task. You can download the distribution based on your OS. Once installed follow the mentioned the steps in one of the terminal

ipfs init
ipfs daemon

IPFS init creates a unique PeerId for itself. (This important and will be used later)

As soon as IPFS Daemon is started, following services are started on mentioned ports.

At 4001 port: IPFS swarm runs which helps you connect to other nodes in the network.
At 5001 port: You can access the network stats by visiting url http://localhost:5001/webui
At 8080 port: This is the gateway for your files to be accessed. If you wish to see any content on IPFS network, simply visit in your browser http://localhost:8080/ipfs/<<hash of content>>

The intent of the article is not teach how to run and use IPFS. You can find many tutorials free online. We intent here to explain the underlying principle in a simple way.

How IPFS stores data ?

When you add any content on IPFS network, the data is split into chunks of 256Kb. Each chunk is identified with it’s own hash. These chunks are then distributed to various nodes on network which have there hash closest to peerId.

It could be a very confusing, but lets go by small illustration.

Let us assume that there are 4 nodes with peerId 6789, 789a, 89ab, 9abc respectively
We try to add a file name(size= 1Mb) something.mp4. Your node first calculates that hash of the file, say 7abc. Additionally the file is broken into 4 chunks of 256 Kb each. Your node then calculates the hash of the each chunk, say (7aaa, 8abc, 9a23, 5bcd)
Now node broadcasts the each chunk to node with has the closest peerId numerically. In our mentioned example chunk with hash 7aaa it closest to hash 789a. Hence this chunk is send to node with peerId 789a.
Similarly, all chunks are send and there address in updated in DHT.
Lastly, the object root hash i.e 7abc is stored, (Root hash can be stored anywhere, it is assumed that in current example it is stored in our system) and hashes that it links to i.e 7abc → [7aaa, 8abc, 9a23, 5bcd]

Refer to hand drawn diagrams for further

How data is divided and stored. Root hash is assumed to be stored on your node. It is however stored in same way chunks are stored. It could be on anyone, including yours.

On the node where root hash is stored. In this example we are assuming that root hash is stored on your node. Root hash is also stored in same way, the data chunks are stored

How data is retrieved ?

On IPFS network, the file is identified solely by it HASH. In our case 7abc

Once the user request request, the request traverses to nodes where hash is existing using the DHT. If the data points to other chunks(like in our case), even they are searched same way. Once all chunks are obtained, all of them are simply concatenated to obtain the main object.

NOTE: For sake a understanding the explanation has been made simple. Actual implementation may be slightly different.

FAQ

Here I would like to answer some of the common question which have been asked ? (Drop down your question in comments, I shall try to answer them)

Is my data permanent on IPFS network ?

Ans: Yes and No. If the file you added to IPFS network is not accessed by many people, it fades away. Your data needs to be more popular on network for it to be permanent on the network. If you never want your data to fade out from IPFS network, you must pin your data on your node. Pinning ensures that over the network at-least your node has that data.

It is also advisable to run ipfs-cluster if you want your data to be persistent.

Is there any access-control for data?

Ans: No. Anyone can access your data provided that they know the hash. To avoid this, you can use combination of symmetric and asymmetric encryption.

Can I remove my data on IPFS?

Ans: No, If the data become unpopular in network, it fades away. But if some other person tries to host your data by pinning it on his node, you cant delete it.

How IPFS works

Why IPFS?

Installing and Running a IPFS node.

How IPFS stores data ?

How data is retrieved ?

FAQ

Written by Akshay Meher