A Weekend With IPFS

Will Hill
A Weekend With
Published in
12 min readJul 5, 2017

IPFS is a distributed file system, IPFS is an acronym for “Inter planetary file system” Adding a file places it on many machines that are part of the connected swarm. When you add a file you receive an address. This address is to the file not the server where the file is located.

IPFS also has fixed gateways — machines connected to IPFS that are part of the swarm and help facilitating the publishing and fetching of files in the swarm. These gateways also have a Javascript API allowing front end applications to interact with IPFS directly, again without a server.

By handing the network over to the community, we remove the friction caused by a single entity’s obligation to appease stakeholders or pressure applied by various forms of authority.

It’s pretty cool stuff and it’s very simple once you make the shift of thinking about the file system in a slightly different way.

The plan for the weekend is to set up a virtual private server as an IPFS gateway to host a little website with a domain like http://ipfs.domain.com

This article is aimed at the professional or hobbyist developer interested in the distributed web and we will be covering…

  • Setting up IPFS on a server to create a gateway
  • Keeping the gateway running as a service
  • Getting files from and publishing files to IPFS
  • Point a domain name at a public gateway and at our gateway
  • Set up SSL with let’s encrypt for nginx
  • Reverse proxy the gateway with nginx, re route requests from a web facing port to an internal port that IPFS is running on.
  • Quick and dirty speed test comparing HTTP2 to IPFS
  • Caching config for our new IPFS gateway.

Over at Digital Ocean I made a new Ubuntu droplet. I then followed the ssh key setup guide and added the public SSH key. After the droplet was built I got an IP address which we can now connect to with:

ssh root@128.199.236.232

This IP address is the one I got for this droplet, replace it throughout this guide with the one you get. I will also be using a domain I have “bkawk.com through this guide.

Typing “yes” when prompted we are now connected to the droplet, let’s make sure everything is up to date.

sudo apt-get update
sudo apt-get upgrade -y

Next, download the latest version of IPFS (you can check the available versions here).

wget https://dist.ipfs.io/go-ipfs/v0.4.8/go-ipfs_v0.4.8_linux-amd64.tar.gz

Unpack and move into the extracted folder to install:

tar xvf go-ipfs_v0.4.8_linux-amd64.tar.gz
cd go-ipfs
sudo ./install.sh

Check the version and let’s initialize ipfs:

ipfs version
ipfs init

Make a note of the output. The peer identity identifies the peer as opposed to the content that the peer will publish, we will need this again later.

initializing IPFS node at /root/.ipfs
generating 2048-bit RSA keypair...done
peer identity: QmeQe5FTgMs8PNspzTQ3LRz1iMhdq9K34TQnsCP2jqt8wV
to get started, enter:
ipfs cat /ipfs/QmVLDAhCY3X9P2uRudKAryuQFPM5zqA3Yij1dY8FpGbL7T/readme

For a little bit of joy, you can do as it suggests and run the below which will show you the IPFS readme document.

ipfs cat /ipfs/QmVLDAhCY3X9P2uRudKAryuQFPM5zqA3Yij1dY8FpGbL7T/readme

All good so far, but we want to make sure it stays this way. To keep the IPFS running at all times, we should set up a daemon to run in the background. Let’s move to the system and make a service file:

cd /lib/systemd/system/
nano ipfs.service

and copy in

[Unit]
Description=ipfs daemon
[Service]
ExecStart=/usr/local/bin/ipfs daemon
Restart=always
User=root
Group=root
[Install]
WantedBy=multi-user.target

and Ctrl + X to save it and then reload it with:

systemctl daemon-reload
systemctl enable ipfs.service

Start it up and let’s have a look and see what magic happened.

systemctl start ipfs
journalctl -u ipfs -n20

You will see the last 20 lines of logs from IPFS. The output is below, make a note of the address that the gateway is listening on.

-- Logs begin at Sun 2017-04-02 02:42:26 UTC, end at Sun 2017-04-02 02:42:59 UTC. --
Apr 02 02:42:34 ipfs systemd[1]: Started ipfs daemon.
Apr 02 02:42:35 ipfs ipfs[1420]: Initializing daemon...
Apr 02 02:42:35 ipfs ipfs[1420]: Adjusting current ulimit to 2048...
Apr 02 02:42:35 ipfs ipfs[1420]: Successfully raised file descriptor limit to 2048.
Apr 02 02:42:45 ipfs ipfs[1420]: Swarm listening on /ip4/10.15.0.5/tcp/4001
Apr 02 02:42:45 ipfs ipfs[1420]: Swarm listening on /ip4/127.0.0.1/tcp/4001
Apr 02 02:42:45 ipfs ipfs[1420]: Swarm listening on /ip4/128.199.236.232/tcp/4001
Apr 02 02:42:45 ipfs ipfs[1420]: Swarm listening on /ip4/128.199.236.232/tcp/4001
Apr 02 02:42:45 ipfs ipfs[1420]: Swarm listening on /ip6/::1/tcp/4001
Apr 02 02:42:45 ipfs ipfs[1420]: API server listening on /ip4/127.0.0.1/tcp/5001
Apr 02 02:42:45 ipfs ipfs[1420]: Gateway (readonly) server listening on /ip4/127.0.0.1/tcp/8080
Apr 02 02:42:45 ipfs ipfs[1420]: Daemon is ready

Let’s make sure it comes back up on reboot and then reboot to check it

systemctl enable ipfs
reboot

Log back in and check it’s active with

ssh root@128.199.236.232   
systemctl status ipfs

Let’s tidy up by removing the files we downloaded and unpacked

rm -rf go-ipfs/
rm go-ipfs_v0.4.8_linux-amd64.tar.gz

Now, check that we are connecting to other peers in the swarm

ipfs swarm peers

and you should see a big old list of peers! Yay! Awesome!

The list is large and scroll over a few pages,

Other members of the swarm will also be able to issue the same command and see our gateway in the list.

Let’s get a picture of a cat from the swarm, this picture of a cat has been added to the IPFS by someone else at the address:

https://ipfs.io/ipfs/QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg

was then made public for us to use here.

The cat command actually just prints out the contents. Using that with `>` places that information in the cat.jpg file and that’s how it’s saved.
This is not to be confused with the cat picture. And we are not getting the picture of a cat from any one particular server.

ipfs cat /ipfs/QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg >cat.jpg

then let’s check we really got it

ls

and boom there is the cat picture in file system!

And have look at the picture in a browser:

https://ipfs.io/ipfs/QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg

Ahh how cute it is!

Time to publish!

Just like when we set-up apache or nginx for web hosting, I’m going to make a folder with the same name as the domain that will be pointed at the folder. So I’m naming this folder “ipfs.bkawk.com” you can call it whatever you want but later when you have many sites, you may not know which domain is pointing at which folder.

cd ../var
mkdir www && cd www
mkdir ipfs.bkawk.com && cd ipfs.bkawk.com

Ok, let’s make a really cool website. It all has to be static so nothing more than good old HTML, JavaScript and CSS

nano index.html

and copy in or make something better:

<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width,initial-scale=1">
<title>A weekend with IPFS</title>
</head>
<body>
<h1>IPFS Magic! Yay!<h1>
<p>You got it running in a weekend! Woo hooo!<p>
<img src = "cat.jpg">
</body>
</html>

We can publish this by jumping out of the folder

cd ../ && ls

You should just see the “ipfs.bkawk.com” folder we made before. Now we want to add the entire folder and its contents to IPFS.

ipfs add -r ipfs.bkawk.com

The -r flag will recursively add every file in the folder, and we get the output

added Qmd286K6pohQcTKYqnS1YhWrCiS4gz7Xi34sdwMe9USZ7u ipfs.bkawk.com/cat.jpg
added Qmd4kHTYwyZHZcDihK5JxrAVRwPb9FjDztMDknhPng7BM3 ipfs.bkawk.com/index.html
added QmV8G4EzLq9AMvrw7f9kdjzdPsGefyjrCp6hnP7urWa8ED ipfs.bkawk.com

IPFS will generate a hash for each file that was added. At the end, it will then give you your site hash, this is the one we are interested in.

QmV8G4EzLq9AMvrw7f9kdjzdPsGefyjrCp6hnP7urWa8ED

We are going to use another public gateway https://ipfs.io to view our site.

https://ipfs.io/ipfs/QmV8G4EzLq9AMvrw7f9kdjzdPsGefyjrCp6hnP7urWa8ED/

let’s see if we can browse our site on the link above, how exciting!

Wow, magic!!! But hang on, if I want to change the site that hash is going to change, and if I link a domain to the hash it’s going to break each time I change the site. What we need to do is link the site’s hash to the peer identity.

ipfs name publish QmV8G4EzLq9AMvrw7f9kdjzdPsGefyjrCp6hnP7urWa8ED

and you get the below 2 hashes, that are now linked together, the first hash is your peer identity which we saved earlier, and the second is your site hash.

Published to QmeQe5FTgMs8PNspzTQ3LRz1iMhdq9K34TQnsCP2jqt8wV: /ipfs/QmV8G4EzLq9AMvrw7f9kdjzdPsGefyjrCp6hnP7urWa8ED

Linking a peer identity to a file or folder uses IPNS.

IPNS can be thought of in the same way as DNS, a domain that does not change and can be linked to any IP address. With IPNS we have a peer identity that can be linked to any file or folder.
Notice we are now using “ipns” (interplanetary name system) and not “ipfs” (interplanetary file system) as we did before in the link below.

https://ipfs.io/ipns/QmeQe5FTgMs8PNspzTQ3LRz1iMhdq9K34TQnsCP2jqt8wV/

So even if we change the website this link is always going to be good. And to change the website we just need to do…

ipfs add -r ipfs.bkawk.com
ipfs name publish <THE HASH THAT WE GOT>

Mental note: Objects added through ipfs add are pinned recursively by default. Ipfs pinning is a way to ensure garbage collection does not remove the objects you want to keep

Let’s make that URL look a bit nicer, wherever you purchased your domain from they will give you a control panel, in that control panel you can edit your DNS records. In the dns let’s add a TXT record to ipfs.bkawk.com and wait for it to propagate.

dnslink=/ipns/QmeQe5FTgMs8PNspzTQ3LRz1iMhdq9K34TQnsCP2jqt8wV

Now we can go to

https://ipfs.io/ipns/ipfs.bkawk.com

better, but still not the best, let’s add an A record to point at the ip address of https://ipfs.io

ping https://ipfs.io

the result is

104.236.151.122

So let’s point the A record for ipfs.bkawk.com at 104.236.151.122 and again wait for it to propagate.

For anything more than just testing you will need some added protection from services like Incapsula or CloudFlare and maybe add your own rules to nginx.

This could alternatively be done with a CNAME for ipfs.bkawk.com to point at gateway.ipfs.io which would avoid linking to any IP address.

Now we can browse to ipfs.bkawk.com

Just to make sure the server isn't being relied upon, go ahead and power it down, and again request the domain ipfs.bkawk.com and you’ll find it’s still there!

So long as people keep requesting it, it will stay there, if nobody requests it for a while the nodes garbage collector will get rid of it. If you keep your server up then it will always be there.

Cache me outside!

Is it even possible to add caching to IPFS?

IPFS site still takes 2 seconds as it’s not being cached.

Upon checking the network tab in Chrome Dev Tools, you can see the big difference. The HTTP2 site is caching, so the second page reload takes just 134 milliseconds whereas the IPFS site still takes 2 seconds.

Can we use nginx to proxy the ipfs gateway?

Lets try! Back in the configuration..

nano /etc/nginx/sites-available/default

and change the location block to

proxy_pass http://127.0.0.1:8080;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;

and test and restart

sudo nginx -t
sudo systemctl restart nginx

We now have the site being served from our own gateway over IPFS with SSL but still no caching!

Let’s try and get the nginx reverse proxy to cache the IPFS response properly. This isn’t going to make the initial page request load faster but on the but all subsequent requests it will be incredibly fast as it will be coming from the local disk.

nano /etc/nginx/sites-available/default

Remove this line

proxy_cache_bypass $http_upgrade;

Add this line

proxy_cache STATIC;
proxy_cache_valid 200 1d;
proxy_cache_use_stale error timeout invalid_header updating
http_500 http_502 http_503 http_504;

Add this expires block above the existing server block

# Expires map
map $sent_http_content_type $expires {
default off;
text/html epoch;
text/css max;
application/javascript max;
~image/ max;
}

In the server block add

expires $expires;

Next, edit the nginx.conf

sudo nano /etc/nginx/nginx.conf

Under http, add

proxy_cache_path  /data/nginx/cache  levels=1:2    keys_zone=STATIC:10m   inactive=24h  max_size=1g;

Uncomment

gzip_vary on;

Add

gzip_min_length  1100;
gzip_buffers 4 32k;
gzip_types text/plain application/x-javascript text/xml text/css;

Then test

sudo nginx -t

The test may fail because some folders don’t exist, so go ahead and add the ones it complains about by hand and test again. with

mkdir <folder name>

Then restart again…

sudo systemctl restart nginx

Yay! Back in the network tab of Chrome Dev Tools, the html files are checking in with the server and getting a 304 not changed and the image is getting a 200 from memory cache and loading lightening fast.

Performance insights are maxed out!

The final nginx config files are in the gist below

The final version with the caching gateway running from the Digital Ocean droplet is online for a while here https://ipfs2.bkawk.com/. If you want to compare it to the non-caching you can visit the slower public gateway at https://ipfs.io/ipns/ipfs.bkawk.com

Speed comparison between IPFS and HTTP2

below if the gist for installing http2 and SSL so we can make this comparison.

To run a fair test, use the 4 locations at https://tools.pingdom.com.

Results

Time for first load — lower is better

These times look pretty close for first time loads, the times will vary depending on network conditions but this is a quick and dirty snapshot.

The site we are testing is just one HTML hile and one image, http2 biggest selling point is that it allows the browser to download unlimited multiple files at the same time.

The DNS lookup time is going to be just the same, the time to get the file is the same as both have the file available, the big speed question is whether IPFS can really get the files closer to me compared to the speed benefits from http2.

If the files were on a CDN and there were allot of them being served from http2, IPFS hasn't got a chance, leaving the only selling point is that IPFS is “distributed”

But what if our gateway goes offline?

Well it’s down to DNS filters to detect if our gateway is offline and re-route traffic to another gateway. Check out the video titled “NS1 Managed DNS Demo” here. The video also covers geographical routing allowing you to setup multiple gateways around the world and have the user pull from the geographically closest, giving you more redundancy.

Conclusion

Users demand speed, especially on mobile. Any site with plenty of files needed for that initial load like React/Vue/Polymer is still going to be a lot faster over HTTP2 on a big CDN.

The libertarian loving crypto hippy communities who love beating the decentralized drum, are all up in IPFS. I can hear them crying about CDN’s already “It’s a central point of failure!” and “A corporate entity controls it and therefore controls your content!”, “Free the internet!”

Wipe the tears for a moment, if that CDN goes down for any reason, or Evil Corp. decides to censor your dodgy site by taking it down, the managed DNS can be configured to start serving the IPFS files from our gateway and/or a public gateway for redundancy until you get the HTTP2 server back up again and you can still bang the distributed drum.

NS1 is a well respected managed DNS provider, the video below gives a walkthrough of the control panel that you would use to configure your DNS for detecting and re routing if a node goes offline. You can also configure the dns to send the user to the geographically closest node. They also have a free tier.

In a world where seconds count, IPFS is a great fall back option but I wouldn't sacrifice losing users over slower or lumpy speeds compared to the blistering speeds and security of a larger more predictable cloud service, who is also protecting you from ddos attacks and pushing your content out to the edges of the web and most importantly pushing all the files to the browser at the same time.

Contact me on Twitter

Still, hankering for more distributed web goodness? Check out how to run IPFS in the browser with browserify. I can imagine the user adding a file to IPFS locally then just pinning it on the server to ensure we keep a copy of it.

Or take a look at A Weekend With GUN— a distributed real-time graph database.

--

--