The complete beginner’s guide to IPFS

Myroslav Golub
VirtusLab
Published in
9 min readFeb 3, 2020
IPFS is a cloud of nodes that communicate with each other in a decentralized way. Photo by James Wainscoat on Unsplash

IPFS stands for “Interplanetary File System”. Conceptually, it can be described as a cross between WWW, Git and BitTorrent. Despite the pretentious name, it can be used for down-to-earth purposes.

IPFS makes it extremely easy to publish your content and make it accessible anywhere, anytime. On the other hand, the publishing doesn’t happen instantly — it often takes some time (minutes to hours) before others can see what you published.

The obvious way to make use of IPFS is file sharing. This is also the only use that this article describes in detail, but IPFS is capable of much more:

In fact, there are plenty of applications that use IPFS in some way.

There are several distributions of IPFS and multiple ways to install them. This guide will use the Kubo command line application, which supports all mainstream platforms. The command interface is extensive and well documented. Don’t let the long list of commands scare you — you’ll need just a few simple ones to start using IPFS.

Before actually using it, you’ll need to have a peer ID generated. Open a terminal and enter this command:

ipfs init

The meaning of that command and its output will be explained later in more detail; for now you can just continue to the next step.

Publishing a file

Let’s try publishing this PNG image. Save it as ipfs-logo.png, then open a terminal in the directory where it's located, and enter the following command:

ipfs add ipfs-logo.png

The output will look like this:

added QmbYq2pMi91Xd5Hu6Z1edrvP4BwJXCH9HhRX8Tk99DJWG6 ipfs-logo.png

The long string starting with Qm... is called a multihash. It is a unique identifier derived from the contents of the file. It will always be the same for the same file, regardless of when and how many times it is republished.

Retrieving the file

Getting the file back is just as simple:

ipfs get QmbYq2pMi91Xd5Hu6Z1edrvP4BwJXCH9HhRX8Tk99DJWG6 --output out.png

The optional flag --output allows you to specify the name for the downloaded file.

Publishing a directory

You can also publish at once a whole directory containing files and nested directories. For that you’ll have to add the -r flag (short for --recursive) to the add command. Place the logo file into a directory called logo, then execute:

ipfs add -r logo

The output will look like this:

added QmbYq2pMi91Xd5Hu6Z1edrvP4BwJXCH9HhRX8Tk99DJWG6 logo/ipfs-logo.png added QmU1muwAeYjHX1kUnYEXPWEhnFxcVGS6wv8tggoHLHkm3f logo

The bottommost line contains the identifier of the directory. It can be retrieved using ipfs get, in just the same way as a single file.

Each file inside the directory can be referenced by its path relatively to the parent directory. However, each file has also been assigned an individual identifier. A particular file can be referenced by its multihash without any directory context. In our example, these commands should yield the exact same file, without retrieving the whole directory:

ipfs get QmU1muwAeYjHX1kUnYEXPWEhnFxcVGS6wv8tggoHLHkm3f/ipfs-logo.png ipfs get QmbYq2pMi91Xd5Hu6Z1edrvP4BwJXCH9HhRX8Tk99DJWG6 --output ipfs-logo2.png

Tips

  • If you want the published file to keep its name after downloading, place it inside a directory and publish that directory. Actually, you can just add the flag -w to ipfs add, and the directory wrapper will be created automatically.
  • When publishing a directory, the hidden files (those that have names starting with .) will be omitted. If you want to include them, add the flag -h (short for --hidden).

Joining the swarm

You just have published some files to the local IPFS storage and retrieved it back. But your file is not available to the rest of the world yet. To achieve that, you need to start an IPFS node:

ipfs daemon

The node works both as a client and a server. It will establish connections with a number of other nodes and exchange the information about the available content. You can check which nodes you are connected to by typing

ipfs swarm peers

The output will contain many lines that look like this (the actual addresses and hashes may be different):

/ip4/99.7.131.248/tcp/4001/ipfs/Qmf3KKqHdL1fUDiguRsTojXBBrqR94yx4EUd8EesXogcSs /ip6/2604:a880:cad:d0::17:2001/tcp/4001/ipfs/QmUR8d2WLbNcAFRMWn3SMdBRDhJugZUezfwLkDYti3Gc3w

Each line is a multiaddress of an IPFS node. It consists of a location in the IP network (address and port), as well as a unique peer identifier. The address of the node might change (this happens as your laptop travels with you from home to office to café and so on), but the peer ID always stays the same.

Saving data

No single node can possibly keep all the data that have ever been published. This means that your node may sometimes choose to throw away some of the data. This also means that you cannot totally rely on your peers: if nobody is interested in keeping your data, it might well disappear from the network.

To protect the data object (such as a file) from disappearing, you can pin its identifier. This will make sure that the data is not deleted when your local node decides to free some space.

The files that you have added are automatically pinned, so let’s pin something that you don’t have yet:

ipfs pin add /ipfs/QmNhFJjGcMPqpuYfxL62VVB9528NXqDNMFXiqN5bgFYiZ1/its-time-for-the-permanent-web.html

The output should be like this:

pinned Qmcx3KZXdANNsYfSRU1Vu4pchM8mvYXH4N8Zwdpux57YNL recursively

By the way, this operation has also downloaded the data onto your computer (how else can you be sure that it never disappears!) But retrieving it should be lightning fast now:

ipfs get Qmcx3KZXdANNsYfSRU1Vu4pchM8mvYXH4N8Zwdpux57YNL -o article.html

Tips

  • If you need to pin large amounts of data or make sure the data is available even when you’re offline, consider using commercial services that pin your data for you, such as Pinata or Eternum.
  • So, can you use IPFS to back up your files? Yes, you can — if there is at least one remote node that would retrieve your data and wouldn’t discard it later. The commercial pinning services do just that.

How the right data is found

Even though the file was only 26 kB large, the last operation might have taken some time. This happens because the data has to be located before it can be downloaded.

The requested block of data could be stored on any node in the global IPFS network. Your local node is likely unable to keep direct connections to every other server, or keep track of every block that is added elsewhere; that’s why finding the right node can take some time.

The information about which node stores what blocks is organized as a distributed hash table, which is split across the nodes just like data itself. When you search for the data identified by a certain hash, your node first has to locate the node that contains that block. The node sends a request to some of its direct peers, and if one of the peers happens to store the block looked for, the search ends there. If a peer doesn’t have the data, it sends the same request to its own peers, and so on until the keeper of the data block is found.

The nodes in the network are arranged in such a way that this process has very little overhead, and the whole network can be traversed in a matter of minutes. However, this means that in the worst case it may take many minutes to complete the search. This is especially true for the data that has been just published, and it can be outright annoying when you know it was published by your colleague sitting right next to you. Fortunately, it is possible to bypass the global search if you already know where to look.

Remember the Qm... hash that was printed out by the init command? Well, that was the peer ID of your node. If you haven't written it down, don't worry - it will be displayed along with some more information when you enter the command

ipfs id

The output will be a JSON object with several fields. For now, the important one is the node ID.

If you get to know the ID of the node that must contain the data you need, you can skip the lengthy search process by establishing a direct connection with that node. To do this, execute the command

ipfs swarm connect /ipfs/Qm...

substituting the Qm... path with the node's ID.

In order to connect to a new peer, your IPFS node will have to search for it first. This step can be avoided too, if you know a full multiaddress of the remote node. In that case you can invoke the same command with the full multiaddress as an argument, e. g. like this:

ipfs swarm connect /ip4/<IP address>/tcp/<port number>/ipfs/<peer ID>

IPFS can work over multiple network protocols, and a node usually listens on several network interfaces (at least the external interface and the loopback). That’s why a node usually has several multiaddresses with slightly different formats.

Each one includes the peer ID, but also information about the ways it can be accessed (e. g. an IPv4 address and port).

You can also find the addresses of the node by its peer IDs:

ipfs dht findpeer Qm...

When the peer connection won’t work

It can happen that your colleague at the desk next to yours has published a fresh data block, but you don’t seem able to retrieve that block, nor even establish a direct connection to his IPFS node. The most likely cause of these issues is network connectivity — for example, a firewall that prevents computers in the same network from talking to each other. Here are the steps that can help you determine the source of the problem.

Try to access the data through a WWW gateway. For example, the IPFS logo mentioned in the beginning of this article, can be found at this URL:

https://ipfs.io/ipfs/QmbYq2pMi91Xd5Hu6Z1edrvP4BwJXCH9HhRX8Tk99DJWG6

For newly published data, this usually takes several minutes. However, if the request times out, this means one of two things:

  • the node(s) that used to store the data block are currently off-line, or
  • the node that has published the block is cut off from the rest of the network, perhaps by a network firewall.

If the gateway was able to fetch the data, but your local IPFS node is not, test whether you can connect to the address and port of the peer using telnet. If the connection cannot be established, you’re out of luck. The data exchange is still possible, but not in every case, and only through a third node which both you and your peer can access. There are two things you can do to resolve the problem:

  • talk to your network administrator to allow IPFS connections within the local network;
  • Move your data outside of the restricted network by hosting it at one of the pinning services.

Think before you publish!

By publishing data to IPFS, you hand it over to entities over which you don’t have control. As a consequence, there is no mechanism to retract published data — every chunk of it becomes literally public. The data might disappear over time if nobody pins it, but a sufficiently popular piece may stay in the network forever.

Although the decentralized way of storing and transferring data defeats some kinds of censorship, it does not hide the fact that you have downloaded the prohibited information, nor does it obscure your identity. In fact, an IPFS node advertises its address and the list of data blocks it possesses. In order to find the dissenters, it is enough for the authorities to query for a known block hash and receive a list of network addresses.

Conclusion

The Interplanetary File System is not meant to replace an actual file system, nor can it replace a general-purpose web server. It is definitely not an unlimited free storage, and it doesn’t serve as an anonymous proxy. Although it is officially still in the beta stage of development, IPFS is already a robust field-tested tool, and the remaining issues are actively being solved.

Further reading

--

--