Introduction to IPFS: Run Nodes on Your Network, with HTTP Gateways

How to install IPFS nodes across your VPS network and configure your own Gateways

You’ve heard of the IPFS distributed file system and want to start adopting the technology in your stack — after all, a distributed, decentralised future is one that we will inevitably come to see. The industry is making fast progress along the path to achieving that goal in the software space.

IPFS, or the Interplanetary File System, was incepted in 2015 with the idea of a distributed internet with the goal of replacing the centralised HTTP web we have come to know and use today.

The capabilities of the underlying IPFS network allow us to utilise the filesystem to act as a global CDN network for web assets, with super fast delivery. Each node we set up can act as an access point to the filesystem.

IPFS was designed to distribute files on an interplanetary scale (literally), but HTTP was not. This is why each IPFS node needs to act as an access point, somewhere in the world, to serve traffic in that geographic location.

With this in mind, what this article will cover more specifically is:

  1. Installing the IPFS software on your servers. A CentOS7 VPS will be used, but documentation for any Linux distribution is provided.
  2. Configuring the IPFS repository to be stored on a mounted volume. Why? You can configure your IPFS repository to be as big as you need, therefore it maybe wise to dedicate a Volume whereby you can adjust the size as and when you need to. In this article I’ll refer to Digital Ocean Block Storage Volumes, but again, you are not limited to this solution.
  3. Opening required firewall ports. Firewalld will be configured to open the necessary ports of the VPS in order for the network to interact with other peers.
  4. Running IPFS in the background using Supervisord. Supervisord is a process manager that I use a lot — it will ensure that IPFS starts at system boot, and will restart the process should it crash or become interrupted. An environment variable needs to be defined with the process configuration, so this will be done.
  5. Opening your IPFS node’s HTTP Gateway. With the node successfully running, we will then stop it, and amend some configuration options to allow HTTP requests to access IPFS content from your node.
  6. Repeating the process for your other VPSs, and connecting your nodes as peers. Doing this will provide direct links between your IPFS nodes. With your nodes being aware each other, the IPFS protocols will be able to optimise data transfer between them.

IPFS Gateways

A Gateway allows access to data from the IPFS network via HTTP requests. By default an IPFS gateway is configured on port 8080, where content can then be fetched from.

More specifically, data would be fetched from your VPS via: http://<your_ip_address>:8080/ipfs/<content_hash>.

Notice that this is non-encrypted by default.

An NginX proxy pass, for example, could handle encrypted requests to your IPFS node, combined with configuring CORS in your IPFS config file to only allow certain domains to access your gateway.

You may be aware of the fact that Protocol Labs provide a gateway themselves, at gateway.ipfs.io, allowing anyone to fetch content from the IPFS network, for example: https://gateway.ipfs.io/ipfs/<content_hash>. This is great for demonstration purposes, but in reality we are relying on a third party to host our HTTP access point into the IPFS network. This access point could experience downtime. It could become overloaded. It’s a single point of failure.

Available Gateways

Are there other gateways available online? Yes —there are services like Infura.io that provide an IPFS gateway as a service. This is useful, especially if you need to offload traffic from your servers onto other services in the event your network becomes overloaded. But again, we are subject to the same limitations mentioned above.

Personally speaking, I, as an optimist for a decentralised web, would like an internet where we are not relying on a handful of companies to manage our mission critical endpoints. At the same time reality dictates that centralised services will more than likely be a necessity for commercial apps today. For this reason, strategically speaking, implementing distributed solutions alongside centralised services is the safest and most viable way to build apps today. As infrastructure evolves over time, one’s importance will begin to outweigh the other.

Onto IPFS

What we can do now is set up every IPFS node we install to also be a gateway. IPFS nodes are clients as well as hosts; they host and serve data simultaneously, the process of which is determined largely by the underlying Bitswap protocol.

This article is not a technical talk on the underlying IPFS protocols; refer to the whitepaper to learn about how the system operates.

Let’s move onto installing IPFS and configuring your repository.

Installing IPFS

Installing IPFS is a straightforward process. The full IPFS protocol is written in Go, hence the package we need is named go-ipfs.

Note: At the time of this writing there is a very interesting version of IPFS being built in Javascript. This version is not quite ready for production environments, but learn, contribute or just keep an eye on it at https://github.com/ipfs/js-ipfs.

Option 1: Installing Manually

Run the following commands to install go-ipfs on any Linux OS:

cd ~
wget https://dist.ipfs.io/go-ipfs/v0.4.18/go-ipfs_v0.4.18_linux-amd64.tar.gz
tar xvfz go-ipfs_v0.4.18_linux-amd64.tar.gz
cd go-ipfs
sudo ./install.sh
ipfs help

Running the above will download the latest IPFS package, unpack it using tar and then run install.sh to move the binaries to an OS path directory. For CentOS7, this was /usr/local/bin.

For the most up to date distribution check out https://dist.ipfs.io/#go-ipfs. What you need for your VPS is the Linux Binary amd64 build.

Running ipfs help will output the CLI options, verifying IPFS installed correctly.

Option 2: Install using ipfs-update

We can also install and update IPFS using another program designed specifically for these tasks, namely ipfs-update.

Find the latest version at https://dist.ipfs.io/#ipfs-update.

Run the following to install this package and install the latest ipfs version.

cd ~
wget https://dist.ipfs.io/ipfs-update/v1.5.2/ipfs-update_v1.5.2_linux-amd64.tar.gz
tar xvfz ipfs-update_v1.5.2_linux-amd64.tar.gz
cd ipfs-update
sudo ./install.sh
ipfs-update versions
ipfs-update install latest

Once installed, running ipfs-update versions lists the available versions to us. ipfs-update install latest will install the latest go-ipfs version.

Initialising IPFS

The next step is to initialise an IPFS repository: The directory that hosts your IPFS configuration and datastore. The datastore is the location our node hosts its share of network data. Initialising a repository generates a key pair and identity hash specific to your node, among other data.

Initialising an IPFS repo can be done with ipfs init. Doing so will generate an IPFS repo with a standard default configuration file, the values of which do not assume that we are running IPFS on a server.

Once initialised, this config file is saved as config in your repo root directory, available for us to amend as and when the need arises.

This configuration file determines a lot about your IPFS repository, ranging from our bootstrap nodes, peer list, data storage options, CORS configuration, and more. Familiarise yourself with the config file at https://github.com/ipfs/go-ipfs/blob/v0.4.15/docs/config.md

As mentioned at the top of the config documentation, we can utilise the --profile flag with ipfs init to quickly set configurations optimised for a certain environment. For a server, we can use ipfs init --profile server.

By default your repository is generated at ~/.ipfs. This is not very flexible on a VPS, therefore we will set the repository to be stored on a mounted XFS formatted volume. If you do not wish to do this, skip to the next section.

Mounting a Block Storage Volume

If you wish to set up such a volume with Digital Ocean, refer to the Creating and Attaching a Volume on the following page:

The process is straight forward. Remember to attach the Volume to your Droplet and run the commands Digital Ocean provide before continuing.

Let’s create a directory named .ipfs on the Volume to host the IPFS repo:

mkdir /mnt/<name_of_your_volume>/.ipfs

IPFS needs to know that we want to initialise the repo at this location. To do this, set the IPFS_PATH environment variable to the directory:

export IPFS_PATH=/mnt/<name_of_your_volume>/.ipfs

Finally, initialise the repo:

ipfs init --profile server

You should get an output to verify a successful initialisation that consists of your peer identity hash. Make a note of this hash in a secure place.

At this point you may wish to browse your config file. You can do so with:

ipfs config show
or
less /mnt/<name_of_your_volume>/.ipfs/config

ipfs config show requires the IPFS_PATH environment variable to be set, otherwise the config will not be found. For this reason, using less may be more practical on subsequent SSH sessions.

Opening required firewall ports

Open the following 3 ports to get IPFS communication working without issue:

sudo firewall-cmd --zone=public --add-port=4001/tcp --permanent
sudo firewall-cmd --zone=public --add-port=8080/tcp --permanent
sudo systemctl reload firewalld
sudo firewall-cmd --zone=public --permanent --list-ports

I opted to use firewalld for my CentOS VPS. List your public zoned ports after reloading the service to verify they are correctly configured.

Running IPFS in the Background

Before configuring the HTTP gateway of our IPFS node, let’s run it to check it is working, and then attach the process to a Supervisord process to run it in the background.

Simply running ipfs daemon will turn on your IPFS node, and will be running in your terminal window as long as it is open and your connection to the server is maintained.

Now is a good time to familiarise yourself with the daemon CLI options. Check them out with ipfs daemon — help. There are some very useful tips below the OPTIONS section, listing some common configuration options and things to be aware of.

Running CLI commands can update your configuration file, as can manually opening the file with your favourite editor. Use which method you prefer.

If you have the daemon running, press CTRL+C to stop it now.

Setting up IPFS Daemon as Supervisord Process

Next we will run the IPFS daemon as a background process using Supervisord.

If you are not aware of Supervisord, a robust process manager, information can be found at http://supervisord.org. The easiest way to install the program is with pip or easy_install:

easy_install supervisor
or
pip install supervisor

Once installed, create the configuration file with the following:

sudo echo_supervisord_conf > /etc/supervisord.conf

By default, supervisord will start up at boot; no further configuration is needed for this. Now start supervisord for the first time:

sudo supervisord

And add an IPFS process at the bottom of the configuration file:

sudo vi /etc/supervisord.conf
[program:ipfs]
environment=IPFS_PATH=/mnt/<name_of_volume>/.ipfs
command=ipfs daemon

Notice environment. Here we are again defining the IPFS_PATH environment variable so the following ipfs daemon command knows where the repo is located.

Now, to start IPFS, use the supervisorctl CLI to reread and update our Supervisord process list:

sudo supervisorctl reread && sudo supervisorctl update

IPFS should now be running. Check with:

sudo supervisorctl status

At this point we have a working IPFS node contributing to the entire network!

But at this point you cannot yet access content via HTTP requests directly from your node. Some configuration options need to be amended to make this possible. Let’s do this next.

Stop the IPFS process before continuing:

sudo supervisorctl stop ipfs

Configure Your IPFS HTTP Gateway

Open your config file and change the following:

MaxStorage: The default size is 10GB, however you can amend this value accordingly to suit your resources. Change to 1GB, for example, if you need to:

"Datastore": {
"StorageMax": "1GB"
}

The larger your repository, the more CPU and bandwidth your node will consume. You may wish to experiment with what size works best for your environment using system monitoring tools.

Change your Gateway address:

#change this
"Gateway": "/ip4/127.0.0.1/tcp/8080"
#to this
"Gateway": "/ip4/0.0.0.0/tcp/8080"

Amend your Gateway configuration to match the following:

"HTTPHeaders": {
"Access-Control-Allow-Headers": [
"X-Requested-With",
"Access-Control-Expose-Headers",
"Range"
],
   "Access-Control-Expose-Headers": [
"Location",
"Ipfs-Hash"
],
   "Access-Control-Allow-Methods": [
"POST",
"GET"
],
   "Access-Control-Allow-Origin": [
"*"
],
   "X-Special-Header": [
"Access-Control-Expose-Headers: Ipfs-Hash"
]

},
"RootRedirect": "",
"Writable": true,
"PathPrefixes": [],
"APICommands": []

The bolded content here is what I amended from the default configuration.

Save these changes and start IPFS with sudo supervisorctl start ipfs. Your Gateway should now be accessible via HTTP requests. Try it now — attempt to load a cat picture in your browser with the following URL:

http://<your_ip_address>:8080/ipfs/QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg

If the adorable kitten loaded, your gateway is working.

With your gateway open and accepting HTTP requests, you can now repeat this process for your other VPSs.

Connecting your Nodes as Peers

It is fascinating to physically see which peers your node is connected to. Within seconds of starting your daemon you’ll have a large list of peers that are communicating with your node. View them with the following command:

ipfs swarm peers

To connect your VPS nodes together as peers, use the following command on each of them:

ipfs swarm connect <address>

The address format needed is an IPFS multiaddr. This is an example taken from the IPFS docs:

/ip4/104.131.131.82/tcp/4001/ipfs/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ

The 2 bolded values above are your VPS IP address and peer identity hash respectively.

To disconnect from your peers, use the disconnect command:

ipfs swarm disconnect <address>

Disconnecting from peers may only be temporary — if IPFS needs to communicate to the peer again in the future, it will indeed connect again.

Check out the IPFS CLI documentation at https://docs.ipfs.io/reference/api/cli/.

Conclusion

At this point your network will have live IPFS nodes with gateways to the system’s content. You have successfully strengthened the network as a global singular entity, whilst at the same time adding the ability for you to tap into the IPFS ocean of hashed content, all at your disposal.

This article is aiming to serve as a hands-on introduction to utilising IPFS on a server. I will write more insights in the future; we have just scratched the surface of the IPFS ecosystem here.

At this introductory stage, here are some thing to so consider to evolve your IPFS infrastructure:

  • Limit domains that access your node, and set up an encrypted proxy pass to handle HTTPS connections to IPFS content. Consider how your production apps will fetch data from IPFS — which node will they communicate with?

Read more about the solutions to these problems here:

  • Check out the other libraries that compliment IPFS. The Github homepage is at https://github.com/ipfs. The ecosystem is large and growing quickly — familiarise yourself with what is being worked on from Github.
  • The Javascript IPFS HTTP client is an interesting library allowing you to interact with IPFS directly from a Javascript environment. Check that out at https://github.com/ipfs/js-ipfs-http-client.

One interesting project being developed to run on top of IPFS JS is the OrbitDB distributed database. To familiarise yourself with it and start using it, read my article on the package:

Let me know how you are utilising IPFS! I am interested to hear about your use cases.