Learn to securely share files on the blockchain with IPFS!

Coral Health
Feb 20, 2018 · 10 min read
Image for post
Image for post

Before reading this article, we recommend reading our previous post “Code your own blockchain in less than 200 lines of Go!”.

Interest in the blockchain has hit feverish levels lately. While much of the buzz has been around applications of the blockchain such as cryptocurrencies and ICOs, the technology itself is just as exciting. The blockchain provides a democratized trust and validation protocol that has already disrupted banking and is on the verge of overhauling healthcare, financial services, social apps and more.

However, from a technological perspective, the blockchain is not without its warts. Current proof of work consensus mechanisms have slowed transaction speeds to near crippling levels. Waiting for Bitcoin transactions to complete makes the platform nearly unusable to some and Cryptokitties almost brought the Ethereum network to a grinding halt.

This makes storing data or large files on the blockchain a non-starter. If the blockchain can barely sustain small strings of text that simply record a balance transfer between two parties, how on earth are we ever going to store large files or images on the blockchain? Are we just going to have to be OK with limiting the utility of the blockchain to things that can only be captured in tiny text strings?

Enter IPFS

You can think of it as being similar to BitTorrent. It’s a decentralized way of storing and referring to files but gives you more control and refers to files by hashes, allowing for much richer programmatic interactions.

Here are some simple diagrams so you can see the workflow of IPFS.

Image for post
Image for post
  1. John wants to upload a PDF file to IPFS
  2. He puts his PDF file in his working directory
  3. He tells IPFS he wants to add this file, which generates a hash of the file (you can tell it’s IPFS because the hash always starts with Qm…)
  4. His file is available on the IPFS network

Now suppose John wants to share this file with his colleague Mary through IPFS. He simply tells Mary the hash from Step 3 above. Then steps 1–4 above just work in reverse for Mary. All Mary needs to do is call the hash from IPFS and she gets a copy of the PDF file. Pretty cool.

Image for post
Image for post

Security Hole

There is an obvious security hole here. As long as anyone has the hash of the PDF file, they can retrieve it from IPFS. So sensitive files are not well suited for IPFS in their native states. Unless we do something to these files, sharing sensitive files like health records or images is a poor fit for IPFS.

Enter Asymmetric Encryption

Let’s edit our workflow diagram a bit so we include encryption and decryption:

Image for post
Image for post
  1. John wants to upload a PDF file to IPFS but only give Mary access
  2. He puts his PDF file in his working directory and encrypts it with Mary’s public key
  3. He tells IPFS he wants to add this encrypted file, which generates a hash of the encrypted file
  4. His encrypted file is available on the IPFS network
  5. Mary can retrieve it and decrypt the file since she owns the associated private key of the public key that was used to encrypt the file
  6. A malicious party cannot decrypt the file because they lack Mary’s private key

The Blockchain

Of particular importance is this diagram:

Image for post
Image for post
From “Code your own blockchain in less than 200 lines of Go

Pay attention to the BPM part. This kind of simple text recording is all the blockchain can really handle today. This is why cryptocurrencies are a good fit for the blockchain. All you need to record is the sender, recipient and amount of Bitcoin (or Ether, etc.) being transferred. Because all these hashes need to be calculated and verified to preserve integrity of the chain, the blockchain is horrible, absolutely horrible at storing files or large amounts of data in a block.

This is why IPFS is so powerful when coupled with the blockchain. Instead of BPM above, we simply store the hash of the IPFS file! This is really cool stuff. We keep the simplicity of data that’s required on the blockchain but we get to enjoy the file storage and decentralized peer-to-peer properties of IPFS! It’s the best of both worlds. Since we also added security with asymmetric encryption (GPG), we have a very elegant way of “storing”, encrypting, and sharing large data and files on the blockchain.

Image for post
Image for post
Revised block diagram

A real world application would be storing referents to our health or lab records in each block. When we get a new lab result, we simply create a new block that refers to an encrypted image or PDF of our lab result that sits in IPFS.

Enough talk already. Show me how to do this!

  • Set up GPG
  • Set up IPFS
  • Encrypt a file with someone else’s public key
  • Upload the encrypted file to IPFS
  • Download the file from another computer (or Virtual Machine) and make sure only the privileged party can decrypt and view it

Things you’ll need

  • A second computer or a Virtual Machine instance. The second computer simulates a person with whom you want to securely share your files.
  • A test file. We recommend downloading this, which is a sample PDF lab result. This is the exact type of sensitive, personal data we need to protect and since we’re a healthcare company, it’s a nice example. Put this file in your working directory.

That’s it! Let’s get started.

Setup

Let’s download GPG on both our main and secondary computers.

Follow the instructions in this article for your OS. On Mac, the easiest way is to open your terminal and brew install gnupg assuming Homebrew is installed.

Generate a key on each of your computers after GPG installation. Use the following steps:

gpg --gen-key and follow the prompts and pick the default options. Make sure to securely remember or store the password you choose for your username and email.

Image for post
Image for post

You’ll get to a stage where gpg asks you to do some random things to generate entropy. I just typed a bunch of random characters until the process was finished.

Image for post
Image for post
Image for post
Image for post
Success message

After the key has been generated on the second computer, we need to add that key to the keyring of the first computer, so we can encrypt files that only the second computer can decrypt.

Export your public key on your second computer into an armored blob using the email address you chose when creating the key

gpg --export --armor -email > pubkey.asc

Move the pubkey.asc file you just created to your first computer. Make sure to do this securely. A USB stick is better than sending it over email.

Once the pubkey.asc file is on your first computer and your working directory, import it into your keyring like this

gpg --import pubkey.asc

You can check to see it was imported correctly with gpg --list-keys. My second computer’s name was Cory Heath and it shows up correctly:

Image for post
Image for post

Great! We’re done with GPG setup. Let’s move onto IPFS.

IPFS

Follow the instructions to download and install IPFS for your OS here for both computers. Once you’ve done that, initialize IPFS with ipfs init on both computers and start your daemon with ipfs daemon on both computers:

Image for post
Image for post
Image for post
Image for post

Nice! We’ve set everything up. Let’s get to encrypting and uploading our PDF file to IPFS.

Encryption

Let’s encrypt that file (I renamed it myriad.pdf since the lab result was produced by Myriad Genetics) using the public key of the 2nd computer (in my case, named Cory Heath).

gpg --encrypt --recipient "Cory Heath" myriad.pdf

Image for post
Image for post

If you check your directory now with ls you’ll see a new encrypted file named myriad.pdf.gpg

Only your second computer can decrypt and see this file. Try it! Email it to another friend and try as they might, they won’t be able to open it! Even if they rename it back to myriad.pdf

Image for post
Image for post

We’ve got our encrypted file now. Let’s upload it to IPFS!

Uploading to IPFS

ipfs add myriad.pdf.gpg

We get an output like this:

Image for post
Image for post

The Qm... string is the hash of the file. You can send this to your friend or anyone to whom you wish to give access so they can download it from IPFS.

Let’s just double check to make sure our file is available on IPFS with ipfs pin ls

Image for post
Image for post
Highlighted hash

You can see the hash of our file is indeed present and now available on IPFS!

Downloading from IPFS

In our case, instead of a second computer we’re using a Ubuntu VM with Vagrant. This is not a requirement.

On your second computer, download the posted encrypted file from your first computer from IPFS using the same hash:

ipfs get QmYqSCWuzG8Cyo4MFQzqKcC14ct4ybAWyrAc9qzdJaFYTL

This is what it should look like when successfully downloaded:

Image for post
Image for post

Decryption

Let’s give it a try.

Decrypt the downloaded file and let’s rename it to myriad.pdf

gpg --decrypt QmYqSCWuzG8Cyo4MFQzqKcC14ct4ybAWyrAc9qzdJaFYTL > myriad.pdf

Moment of truth:

Let’s open this file and if all went well we should be able to see it on our second computer.

open myriad.pdf

Image for post
Image for post

TADA! We successfully downloaded, decrypted and opened our file which was stored fully encrypted on IPFS, protected from anyone who shouldn’t have access!

Recap and Next Steps

Let’s do a quick review of what we did:

  • Recognized that the blockchain is pretty bad at storing large volumes of data and files
  • Got IPFS up and running, connected to the network
  • Secured sensitive files using GPG and stored them on IPFS
  • Understood hashing in IPFS and how we can store the hashes on the blockchain to combine the strengths of the blockchain with distributed file storage

Where you take what you learned here is completely up to you. There are many places to branch off from this. Consider deploying these examples to live servers to act as your own IPFS nodes to store important files. The drawback to IPFS is that if your files aren’t very popular, when you stop your node, your file is gone from the IPFS network. You can prevent this by spinning up cloud servers to act as their own IPFS nodes, so you can host them yourself until more nodes become interested in your files and start storing them.

Check out our previous “Code your own blockchain” tutorials, Parts 1 , 2 and 3 and 4. Once you’ve gone through those, try integrating IPFS and blockchain with your own large, encrypted files. You can also learn about Byzantine fault tolerance, Turing completeness and other advanced blockchain concepts here. If you’re so inclined, here’s how to start your own Hyperledger blockchain and here’s how to build a DApp on Hyperledger.

To learn more about Coral Health and how we’re using the blockchain to advance personalized medicine research, visit our website.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store