Entering the world of IPFS

IPFS — Interplanetary File System is awesome for storing data in a decentralised fashion and it’s one of the most commonly used tools for storing data within the blockchain space. Storing data within a Blockchain is expensive and slow, storing in IPFS is free and fast. IPFS does not allow duplication of data by hashing it and hashing the same data twice will give back the same hash.

IPFS seems to work really well. You can add data and it stays in this global network of nodes. Anyone with a hash can access the data (note that all data on IPFS is public unless you use a private IPFS node) and the same data is not duplicated. The question that you might be asking at this point is, “Can I change my database with IPFS and become fully decentralised?”

While IPFS is great at storing your data, it does not give any guarantees of the data being available if your IPFS node goes down. Confused? Imagine this. You start your IPFS node, run it for a month and have very important data within it. Your machine breaks, you’re thankful it's a public network so other people will have your data, right? Not quite, IPFS never guaranteed your data will always be available. No other node was interested in it and therefore no one else pinned your data except your loyal node. Consequently your data is lost since no other node has it.

This is where Filecoin comes in, Filecoin allows other nodes to be incentivized for storing your data in case your node goes down and a small fee on your end to avoid losing your private data. Filecoin is awesome, let's use it! Well, it's not there yet… Filecoin is under development and will be available at some point in the future. For now, we have to work with current solutions, which include:

  1. Running multiple IPFS nodes
  2. Backing up ~/.ipfs folder
  3. Running IPFS-cluster
  4. Using something else (ie Storj)

There are more solutions than this but the one I want to talk about is the Ipfs-cluster. The Ipfs-cluster lets you run many nodes which each replicates the data and keeps the nodes synchronised by using RAFT consensus. Lets look at how many nodes we need to keep the system secure. A general case is that a node may go down or lose internet connection. Assuming we have t nodes, we need at least t+1 to keep the system running, in simple terms; it just means there should be at least 1 node up and running even if all others are down. So if you think 5 nodes can never be down at once then your risk factor t is 4.

IPFS cluster uses IPFS nodes to store data and its built on top of standard IPFS which means using the cluster does not mean running a different IPFS node but the exact same node you will run if you were using plain IPFS. IPFS cluster has a proxy node which is used to store all pin requests and implements RAFT algorithm (Leader based approach).

The idea is that you would interact with the IPFS cluster proxy node which handles synchronisation and backups so you can treat it exactly as an IPFS node. IPFS-Cluster nodes exchange messages between each other such as a request to pin a hash or sending out heartbeat messages to know if a node is down. One of the biggest advantages of this approach is that once a node comes back up, it can receive all the messages it has missed and be up to date in no time.

Fig.1 IPFS Cluster Architecture

Thats awesome! Lets use IPFS for everything? Not so fast!

Well, while IPFS is awesome in many ways (there is even a repo) it has its problems too. IPFS node can get quite hungry in terms of memory usage as it connects to more and more nodes continuously - not to mention the amount of bandwidth it consumes! 1 GB of memory is not enough for IPFS nodes and what happens is that the Operating System (Ubuntu) kills the IPFS daemon once it starts using too much memory and all you will see when you look at it is a word “Killed” . It is as devastating as it sounds and if your application depends on IPFS being up full time then you would need a machine with more memory and bandwidth capacity. However; there are other solutions on solving this problem. (Note that this is a problem with IPFS node and not IPFS-cluster).

Limiting the number of nodes that IPFS connects to can resolve this problem and this can be done in one of two ways (everything discussed here is in the config file for ipfs found at ~/.ipfs/config).

  1. Manually: By changing the config file to disable MDNS Discovery or setting the interval too high (not recommended, it essentially isolates your node from rest of the network and there is a probability that other IPFS nodes will be able to get content from your node but its very low).
  2. By using the IPFS connection manager to limit the number of connections your node makes. This has three attributes: 
     — Max Connections:
    Maximum number of connections that your node will allow and start dropping connections after that.
     — Min Connections:
    Minimum number of connections that your node will try to maintain
     — Interval:
    The amount of time after which max and min constraints will be checked, for example, if its a minute then every minute your node will ensure that Min ≤ number of peers (ipfs swarm peers | wc -l ) ≤ Max Connections

Most of the problems with IPFS can be solved with updating the config file or reading the docs, if you have any questions or suggestions, please comment below or message me on twitter @aliazam2251