How Private is IPFS?
A Misconception of Privacy
While IPFS itself is a public network, a common misconception we see from Pinata users is that IPFS is private if they don’t explicitly share the hashes (also known as CIDs) for content they’ve stored.
Unfortunately this isn’t the case.
When adding content to the IPFS network, the node storing that content gives back a hash that can later be provided to any IPFS node to retrieve the content that was originally uploaded. Without understanding the internal workings of IPFS, it might seem like this hash behaves like a private link (If the hash isn’t shared with the public, then nobody would know it exists).
For now, this somewhat works. But, this isn’t because IPFS is inherently private. Rather, this is due to the fact that IPFS is still young and people haven’t implemented tools for monitoring the network.
How IPFS Hashes Become Public
IPFS, like many distributed data storage technologies, uses what’s called a Distributed Hash Table (a DHT for short).
In practice, this means that when an IPFS node pins new content, it announces that it has the content to all of the peers it’s connected to. It does this so that the IPFS network knows where to find the content it has. The more peers the original node sends content announcements to, the more discoverable that content is.
To most of the world, these content announcements happen behind the scenes and are just part of how the IPFS network works. However, depending on a company’s business model, these content announcements might be quite valuable, and as such, they would be incentivized to record as many of these announcements as possible.
For companies looking to keep track of this data, all that’s required to track the data is a slight modification to their IPFS nodes. This tweak would simply add code that logs each DHT announcement instead of letting DHT records expire like they normally do.
Since this strategy requires logging nodes to be connected to announcing nodes, it is most effective when attempting to log announcements coming from specific nodes. In order to log the entire IPFS network as a whole, you’d need to maintain a connection to every IPFS node that’s storing content. Such a feat would require a huge network of extremely powerful nodes spread all around the world.
Users Can Be Tracked Too
It’s not just the host nodes that can be tracked. Users requesting content can be tracked, too! When a user running a node requests content from the network, each node they’re connected to receives a message asking for that content.
Similarly to how content announcements can be logged, content requests can be logged as well. In fact, if a node doesn’t have the content requested, that node will relay that request to other nodes in an attempt to find the content, which means an even greater number of nodes have the potential to log the information.
How Can IPFS Be Private?
Let’s talk about a few ways that you can utilize IPFS while still remaining private. Each method has its own pros and cons.
Private IPFS Networks
Private IPFS networks provide the greatest level of privacy from the outside world. A private IPFS network behaves the same as the public network except participants are only able to communicate with other nodes inside that same private IPFS network. This means that only those nodes in the private network will be able to see things like content announcements / content requests.
In order to connect to a private IPFS network, nodes will need private access keys for that private network. When running a private network, be careful who you give access keys to. If these keys fall into the wrong hands your network security and privacy will be compromised.
Applications that wish to maintain usage of the public IPFS network may wish to consider adding encryption to content they upload to the IPFS network. Encrypted content can still be tracked, but the main difference is that without a decryption key, the content will be unreadable.
For applications using encryption, it’s important to keep in mind that as content is public, malicious nodes could potentially save encrypted content in hopes that they either somehow obtain the decryption keys, or the encryption mechanism is broken at some point in the future. For this reason, it’s important to make sure any decryption keys are highly guarded and to never upload content that could have disastrous consequences if the encryption is ever broken.
I’ve talked about public gateways a bit in the past. While they definitely have their own issues concerning scalability, that doesn’t mean gateways are all bad. In the context of privacy, they remain a useful tool for users looking to hide their identify while requesting content on the IPFS network.
Requesting content through a public gateway allows a user to retrieve content from the IPFS network without running their own node. While the rest of the network can still see the gateway requesting the content, the request will simply appear as one of many requests coming through that gateway.
Keep in mind public gateways could be tracking public IPs and the content those IPs request. If you’re concerned about public gateways tracking your information, you’ll have to look further to preserve privacy while using public gateways.
When using IPFS, it’s important to realize that you’re utilizing a public network. Anything that happens on the public IPFS network is technically trackable and people uploading content should do so with the understanding that their content can be viewed, monitored, or saved by anybody.
With that being said, there are methods to preserving privacy on IPFS. Each with their own pros and cons. Things like private IPFS networks, content encryption, and gateway utilization act as valuable tools for increasing privacy while using IPFS. As with most technologies, everybody’s individual needs will likely differ, so it’s important that applications weigh the pros and cons of various privacy approaches against their specific user needs.