A Weekend With IPFS
IPFS is a distributed file system, IPFS is an acronym for “Inter planetary file system” Adding a file places it on many machines that are part of the connected swarm. When you add a file you receive an address. This address is to the file not the server where the file is located.
IPFS also has fixed gateways — machines connected to IPFS that are part of the swarm and help facilitating the publishing and fetching of files in the swarm. These gateways also have a Javascript API allowing front end applications to interact with IPFS directly, again without a server.
It’s pretty cool stuff and it’s very simple once you make the shift of thinking about the file system in a slightly different way.
The plan for the weekend is to set up a virtual private server as an IPFS gateway to host a little website with a domain like http://ipfs.domain.com
This article is aimed at the professional or hobbyist developer interested in the distributed web and we will be covering…
- Setting up IPFS on a server to create a gateway
- Keeping the gateway running as a service
- Getting files from and publishing files to IPFS
- Point a domain name at a public gateway and at our gateway
- Set up SSL with let’s encrypt for nginx
- Reverse proxy the gateway with nginx, re route requests from a web facing port to an internal port that IPFS is running on.
- Quick and dirty speed test comparing HTTP2 to IPFS
- Caching config for our new IPFS gateway.
Over at Digital Ocean I made a new Ubuntu droplet. I then followed the ssh key setup guide and added the public SSH key. After the droplet was built I got an IP address which we can now connect to with:
ssh root@128.199.236.232
This IP address is the one I got for this droplet, replace it throughout this guide with the one you get. I will also be using a domain I have “bkawk.com through this guide.
Typing “yes” when prompted we are now connected to the droplet, let’s make sure everything is up to date.
sudo apt-get update
sudo apt-get upgrade -y
Next, download the latest version of IPFS (you can check the available versions here).
wget https://dist.ipfs.io/go-ipfs/v0.4.8/go-ipfs_v0.4.8_linux-amd64.tar.gz
Unpack and move into the extracted folder to install:
tar xvf go-ipfs_v0.4.8_linux-amd64.tar.gz
cd go-ipfs
sudo ./install.sh
Check the version and let’s initialize ipfs:
ipfs version
ipfs init
Make a note of the output. The peer identity identifies the peer as opposed to the content that the peer will publish, we will need this again later.
initializing IPFS node at /root/.ipfs
generating 2048-bit RSA keypair...done
peer identity: QmeQe5FTgMs8PNspzTQ3LRz1iMhdq9K34TQnsCP2jqt8wV
to get started, enter:ipfs cat /ipfs/QmVLDAhCY3X9P2uRudKAryuQFPM5zqA3Yij1dY8FpGbL7T/readme
For a little bit of joy, you can do as it suggests and run the below which will show you the IPFS readme document.
ipfs cat /ipfs/QmVLDAhCY3X9P2uRudKAryuQFPM5zqA3Yij1dY8FpGbL7T/readme
All good so far, but we want to make sure it stays this way. To keep the IPFS running at all times, we should set up a daemon to run in the background. Let’s move to the system and make a service file:
cd /lib/systemd/system/
nano ipfs.service
and copy in
[Unit]
Description=ipfs daemon[Service]
ExecStart=/usr/local/bin/ipfs daemon
Restart=always
User=root
Group=root[Install]
WantedBy=multi-user.target
and Ctrl + X
to save it and then reload it with:
systemctl daemon-reload
systemctl enable ipfs.service
Start it up and let’s have a look and see what magic happened.
systemctl start ipfs
journalctl -u ipfs -n20
You will see the last 20 lines of logs from IPFS. The output is below, make a note of the address that the gateway is listening on.
-- Logs begin at Sun 2017-04-02 02:42:26 UTC, end at Sun 2017-04-02 02:42:59 UTC. --
Apr 02 02:42:34 ipfs systemd[1]: Started ipfs daemon.
Apr 02 02:42:35 ipfs ipfs[1420]: Initializing daemon...
Apr 02 02:42:35 ipfs ipfs[1420]: Adjusting current ulimit to 2048...
Apr 02 02:42:35 ipfs ipfs[1420]: Successfully raised file descriptor limit to 2048.
Apr 02 02:42:45 ipfs ipfs[1420]: Swarm listening on /ip4/10.15.0.5/tcp/4001
Apr 02 02:42:45 ipfs ipfs[1420]: Swarm listening on /ip4/127.0.0.1/tcp/4001
Apr 02 02:42:45 ipfs ipfs[1420]: Swarm listening on /ip4/128.199.236.232/tcp/4001
Apr 02 02:42:45 ipfs ipfs[1420]: Swarm listening on /ip4/128.199.236.232/tcp/4001
Apr 02 02:42:45 ipfs ipfs[1420]: Swarm listening on /ip6/::1/tcp/4001
Apr 02 02:42:45 ipfs ipfs[1420]: API server listening on /ip4/127.0.0.1/tcp/5001
Apr 02 02:42:45 ipfs ipfs[1420]: Gateway (readonly) server listening on /ip4/127.0.0.1/tcp/8080
Apr 02 02:42:45 ipfs ipfs[1420]: Daemon is ready
Let’s make sure it comes back up on reboot and then reboot to check it
systemctl enable ipfs
reboot
Log back in and check it’s active with
ssh root@128.199.236.232
systemctl status ipfs
Let’s tidy up by removing the files we downloaded and unpacked
rm -rf go-ipfs/
rm go-ipfs_v0.4.8_linux-amd64.tar.gz
Now, check that we are connecting to other peers in the swarm
ipfs swarm peers
and you should see a big old list of peers! Yay! Awesome!
Other members of the swarm will also be able to issue the same command and see our gateway in the list.
Let’s get a picture of a cat from the swarm, this picture of a cat has been added to the IPFS by someone else at the address:
https://ipfs.io/ipfs/QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg
was then made public for us to use here.
The cat command actually just prints out the contents. Using that with `>` places that information in the cat.jpg file and that’s how it’s saved.
This is not to be confused with the cat picture. And we are not getting the picture of a cat from any one particular server.
ipfs cat /ipfs/QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg >cat.jpg
then let’s check we really got it
ls
and boom there is the cat picture in file system!
And have look at the picture in a browser:
https://ipfs.io/ipfs/QmW2WQi7j6c7UgJTarActp7tDNikE4B2qXtFCfLPdsgaTQ/cat.jpg
Ahh how cute it is!
Time to publish!
Just like when we set-up apache or nginx for web hosting, I’m going to make a folder with the same name as the domain that will be pointed at the folder. So I’m naming this folder “ipfs.bkawk.com” you can call it whatever you want but later when you have many sites, you may not know which domain is pointing at which folder.
cd ../var
mkdir www && cd www
mkdir ipfs.bkawk.com && cd ipfs.bkawk.com
Ok, let’s make a really cool website. It all has to be static so nothing more than good old HTML, JavaScript and CSS
nano index.html
and copy in or make something better:
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width,initial-scale=1">
<title>A weekend with IPFS</title>
</head>
<body>
<h1>IPFS Magic! Yay!<h1>
<p>You got it running in a weekend! Woo hooo!<p>
<img src = "cat.jpg">
</body>
</html>
We can publish this by jumping out of the folder
cd ../ && ls
You should just see the “ipfs.bkawk.com” folder we made before. Now we want to add the entire folder and its contents to IPFS.
ipfs add -r ipfs.bkawk.com
The -r flag will recursively add every file in the folder, and we get the output
added Qmd286K6pohQcTKYqnS1YhWrCiS4gz7Xi34sdwMe9USZ7u ipfs.bkawk.com/cat.jpg
added Qmd4kHTYwyZHZcDihK5JxrAVRwPb9FjDztMDknhPng7BM3 ipfs.bkawk.com/index.html
added QmV8G4EzLq9AMvrw7f9kdjzdPsGefyjrCp6hnP7urWa8ED ipfs.bkawk.com
IPFS will generate a hash for each file that was added. At the end, it will then give you your site hash, this is the one we are interested in.
QmV8G4EzLq9AMvrw7f9kdjzdPsGefyjrCp6hnP7urWa8ED
We are going to use another public gateway https://ipfs.io to view our site.
https://ipfs.io/ipfs/QmV8G4EzLq9AMvrw7f9kdjzdPsGefyjrCp6hnP7urWa8ED/
let’s see if we can browse our site on the link above, how exciting!
Wow, magic!!! But hang on, if I want to change the site that hash is going to change, and if I link a domain to the hash it’s going to break each time I change the site. What we need to do is link the site’s hash to the peer identity.
ipfs name publish QmV8G4EzLq9AMvrw7f9kdjzdPsGefyjrCp6hnP7urWa8ED
and you get the below 2 hashes, that are now linked together, the first hash is your peer identity which we saved earlier, and the second is your site hash.
Published to QmeQe5FTgMs8PNspzTQ3LRz1iMhdq9K34TQnsCP2jqt8wV: /ipfs/QmV8G4EzLq9AMvrw7f9kdjzdPsGefyjrCp6hnP7urWa8ED
Linking a peer identity to a file or folder uses IPNS.
IPNS can be thought of in the same way as DNS, a domain that does not change and can be linked to any IP address. With IPNS we have a peer identity that can be linked to any file or folder.
Notice we are now using “ipns” (interplanetary name system) and not “ipfs” (interplanetary file system) as we did before in the link below.
https://ipfs.io/ipns/QmeQe5FTgMs8PNspzTQ3LRz1iMhdq9K34TQnsCP2jqt8wV/
So even if we change the website this link is always going to be good. And to change the website we just need to do…
ipfs add -r ipfs.bkawk.com
ipfs name publish <THE HASH THAT WE GOT>
Mental note: Objects added through
ipfs add
are pinned recursively by default. Ipfs pinning is a way to ensure garbage collection does not remove the objects you want to keep
Let’s make that URL look a bit nicer, wherever you purchased your domain from they will give you a control panel, in that control panel you can edit your DNS records. In the dns let’s add a TXT record to ipfs.bkawk.com and wait for it to propagate.
dnslink=/ipns/QmeQe5FTgMs8PNspzTQ3LRz1iMhdq9K34TQnsCP2jqt8wV
Now we can go to
https://ipfs.io/ipns/ipfs.bkawk.com
better, but still not the best, let’s add an A record to point at the ip address of https://ipfs.io
ping https://ipfs.io
the result is
104.236.151.122
So let’s point the A record for ipfs.bkawk.com at 104.236.151.122 and again wait for it to propagate.
For anything more than just testing you will need some added protection from services like Incapsula or CloudFlare and maybe add your own rules to nginx.
This could alternatively be done with a CNAME for ipfs.bkawk.com to point at gateway.ipfs.io which would avoid linking to any IP address.
Now we can browse to ipfs.bkawk.com
Just to make sure the server isn't being relied upon, go ahead and power it down, and again request the domain ipfs.bkawk.com and you’ll find it’s still there!
So long as people keep requesting it, it will stay there, if nobody requests it for a while the nodes garbage collector will get rid of it. If you keep your server up then it will always be there.
Cache me outside!
Is it even possible to add caching to IPFS?
Upon checking the network tab in Chrome Dev Tools, you can see the big difference. The HTTP2 site is caching, so the second page reload takes just 134 milliseconds whereas the IPFS site still takes 2 seconds.
Can we use nginx to proxy the ipfs gateway?
Lets try! Back in the configuration..
nano /etc/nginx/sites-available/default
and change the location block to
proxy_pass http://127.0.0.1:8080;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
and test and restart
sudo nginx -t
sudo systemctl restart nginx
We now have the site being served from our own gateway over IPFS with SSL but still no caching!
Let’s try and get the nginx reverse proxy to cache the IPFS response properly. This isn’t going to make the initial page request load faster but on the but all subsequent requests it will be incredibly fast as it will be coming from the local disk.
nano /etc/nginx/sites-available/default
Remove this line
proxy_cache_bypass $http_upgrade;
Add this line
proxy_cache STATIC;
proxy_cache_valid 200 1d;
proxy_cache_use_stale error timeout invalid_header updating
http_500 http_502 http_503 http_504;
Add this expires block above the existing server block
# Expires map
map $sent_http_content_type $expires {
default off;
text/html epoch;
text/css max;
application/javascript max;
~image/ max;
}
In the server block add
expires $expires;
Next, edit the nginx.conf
sudo nano /etc/nginx/nginx.conf
Under http, add
proxy_cache_path /data/nginx/cache levels=1:2 keys_zone=STATIC:10m inactive=24h max_size=1g;
Uncomment
gzip_vary on;
Add
gzip_min_length 1100;
gzip_buffers 4 32k;
gzip_types text/plain application/x-javascript text/xml text/css;
Then test
sudo nginx -t
The test may fail because some folders don’t exist, so go ahead and add the ones it complains about by hand and test again. with
mkdir <folder name>
Then restart again…
sudo systemctl restart nginx
Yay! Back in the network tab of Chrome Dev Tools, the html files are checking in with the server and getting a 304 not changed and the image is getting a 200 from memory cache and loading lightening fast.
Performance insights are maxed out!
The final nginx config files are in the gist below
The final version with the caching gateway running from the Digital Ocean droplet is online for a while here https://ipfs2.bkawk.com/. If you want to compare it to the non-caching you can visit the slower public gateway at https://ipfs.io/ipns/ipfs.bkawk.com
Speed comparison between IPFS and HTTP2
below if the gist for installing http2 and SSL so we can make this comparison.
To run a fair test, use the 4 locations at https://tools.pingdom.com.
Results
These times look pretty close for first time loads, the times will vary depending on network conditions but this is a quick and dirty snapshot.
The site we are testing is just one HTML hile and one image, http2 biggest selling point is that it allows the browser to download unlimited multiple files at the same time.
The DNS lookup time is going to be just the same, the time to get the file is the same as both have the file available, the big speed question is whether IPFS can really get the files closer to me compared to the speed benefits from http2.
If the files were on a CDN and there were allot of them being served from http2, IPFS hasn't got a chance, leaving the only selling point is that IPFS is “distributed”
But what if our gateway goes offline?
Well it’s down to DNS filters to detect if our gateway is offline and re-route traffic to another gateway. Check out the video titled “NS1 Managed DNS Demo” here. The video also covers geographical routing allowing you to setup multiple gateways around the world and have the user pull from the geographically closest, giving you more redundancy.
Conclusion
Users demand speed, especially on mobile. Any site with plenty of files needed for that initial load like React/Vue/Polymer is still going to be a lot faster over HTTP2 on a big CDN.
The libertarian loving crypto hippy communities who love beating the decentralized drum, are all up in IPFS. I can hear them crying about CDN’s already “It’s a central point of failure!” and “A corporate entity controls it and therefore controls your content!”, “Free the internet!”
Wipe the tears for a moment, if that CDN goes down for any reason, or Evil Corp. decides to censor your dodgy site by taking it down, the managed DNS can be configured to start serving the IPFS files from our gateway and/or a public gateway for redundancy until you get the HTTP2 server back up again and you can still bang the distributed drum.
NS1 is a well respected managed DNS provider, the video below gives a walkthrough of the control panel that you would use to configure your DNS for detecting and re routing if a node goes offline. You can also configure the dns to send the user to the geographically closest node. They also have a free tier.
In a world where seconds count, IPFS is a great fall back option but I wouldn't sacrifice losing users over slower or lumpy speeds compared to the blistering speeds and security of a larger more predictable cloud service, who is also protecting you from ddos attacks and pushing your content out to the edges of the web and most importantly pushing all the files to the browser at the same time.
Still, hankering for more distributed web goodness? Check out how to run IPFS in the browser with browserify. I can imagine the user adding a file to IPFS locally then just pinning it on the server to ensure we keep a copy of it.
Or take a look at A Weekend With GUN— a distributed real-time graph database.