Dedicated IPFS Networks
Scaling Through Network Specialization
Scalability. It’s something we’re constantly thinking about at Pinata. With IPFS’s massive growth and adoption, we need more and more ways to optimize content discoverability via IPFS’s DHT (Distributed Hash Table). In this post, we will take a look at a couple of challenges faced by IPFS and a potential solution for projects facing performance issues right now.
A Lot of Nodes
The first scalability challenge with content discoverability has to do with the amount of nodes in the IPFS network. An individual IPFS node frequently announces content to other nodes in the network. This allows other nodes to point to the node hosting that content if anybody ever asks about it, as shown below:
This works great when there’s a reasonable amount of nodes in the IPFS network. However, as the amount of nodes in the network increases, the network tends to slow down. This is because all of the IPFS nodes are using the same DHT to locate data. With the growth of IPFS nodes on the public network, content discovery has started to look more like the diagram below than the one above:
This node congestion slows down the ability for the DHT to find the node that has the content. Additionally, not all nodes are equally stable. Many of these nodes that join the network can frequently go offline or become unresponsive. This creates additional delays when attempting to route information through the IPFS network. However, the amount of nodes in the network isn’t the only scalability challenge. The amount of content on those nodes is also a challenge.
A Lot of Content
The second reason IPFS can struggle with efficient content discoverability is the amount of content being announced by nodes in the network. The act of announcing content takes a little bit of time for IPFS nodes. Additionally, these nodes want to announce all of the content they have. However, as nodes start storing more and more content, the DHT records for that content start to expire before the node is able to re-announce that content. As shown below, this backs up the content announcement queue and can lead to a situation where a node can only keep the network aware of a fraction of its content at any given time. So, how does the network scale with an increasing amount of nodes and content?
Private IPFS networks are fairly simple in nature. Instead of connecting to the main IPFS network, participants in a private IPFS network will only connect to the nodes in that network. Access to these networks is controlled by a private “swarm key” that each member in the network must possess in order to take part.
To learn more about setting up a private IPFS network, visit these instructions on github.
To this point, most private networks have been thought of as a way to add privacy to content. However, they become much more powerful when thought about in the context of network scalability.
Most applications using the public IPFS network don’t actually care about content hosted by nodes other than their own. And, likewise, most other nodes don’t care about their content.
By using IPFS’s ability to set up private networks, we can set up “dedicated” networks that consist only of nodes for a specific application’s data. These networks can be private through safeguarding the private access key that allows other nodes to join the network. Or, they can be “public” dedicated networks.
A “public” dedicated network is a private IPFS network with a publicly known access key. These public, private IPFS networks allow anybody to join them. The difference being that these networks are separated from the main IPFS network to specialize themselves. Nodes who don’t care about the data on these networks have no reason to join them because doing so would be a waste of resources.
By separating themselves from the main IPFS network, a dedicated IPFS network can realize significant performance gains. The main reason for these performance gains is that nodes in the network will be connected to most, if not all, the other nodes in the network at all times.
How does this help with performance?
In a previous post, Speeding Up IPFS Through Swarm Connections, we talk about how being directly connected to a host node allows a requested node to instantly locate content. This essentially bypasses the DHT lookup process.
This same functionality applies with dedicated IPFS networks. Since most nodes are going to be connected to each other at all times, content discovery happens near instantaneously throughout the entire network. This means no more lengthy lookup times.
This also solves the issue of too much content being announced from a single node. When directly connected to a host node, it doesn’t matter whether or not that node has had the time to announce that content to the DHT. By being directly connected to the host node, any requesting nodes are still able to instantly discover the content. But, how do we serve the content to the end user if they don’t have an IPFS node?
IPFS gateways can be run on dedicated IPFS networks just like they are on the public network. In fact, gateways work exactly the same with dedicated networks as they do with the main public IPFS network. The only difference is that instead of searching the public IPFS network for data, a gateway on a dedicated network will only search for content on nodes in the dedicated network.
This becomes a huge selling point for applications that rely heavily on gateways to retrieve content. Instead of a user waiting through a lengthy content discovery progress each time they want to consume data, they can find it near instantly from a dedicated network gateway.
Real World Examples
Let’s say that we have three decentralized networks, as detailed below:
- A Decentralized Autonomous Organization that stores and shares all of its files through IPFS
- A network for sharing scientific data over IPFS
- A decentralized exchange that uses IPFS for its order books
With dedicated IPFS networks, each of these examples can make sure that their data flows as efficiently as possible through their networks without dealing with the outside network noise that they don’t care about. But, what if you want to switch between dedicated networks?
Switching Between Networks
Another advantage of using a dedicated IPFS network is the ability to transition to the public network at any time. If, at any point, a node or network of nodes decides that its data wants to become part of the public IPFS network, all that’s required is a few small configuration changes and a node restart. When switching back to the public IPFS network, all of a node’s data will remain safe and unchanged. The only change is that the node will now be participating in the public IPFS DHT again.
This allows for a relatively easy transition path for applications that may choose the dedicated network approach for performance in the near future, but ultimately would like to participate in the public network as it becomes more performant.
The same concept works for switching to a different dedicated network. If a node wants to join a different dedicated network, it can simply change its node configuration to point to the new dedicated network.
Dedicated IPFS networks provide a lot of power for scaling an application’s usage of IPFS. But, that doesn’t make them right for every project.
These are important things to keep in mind when considering a dedicated network.
Every node that wants to participate in a dedicated network needs to be properly configured to do so. This can become a user experience problem if not properly accounted for.
By default, IPFS nodes are configured to bootstrap themselves to public IPFS bootstrapping nodes run by Protocol Labs. To successfully run a dedicated IPFS network, dedicated bootstrapping nodes will need to be run and nodes will need to be configured to connect to them on startup.
Public Data Access
Nodes participating in a dedicated IPFS network won’t be able to access any data on the public IPFS network. If a network relies on data from the public network that won’t be hosted by users on their dedicated network, this could pose a problem.
Likewise, nodes on the public network such as public gateways, won’t be able to access data that’s hosted on a dedicated IPFS network.
Network Size / Peer Counts
One of the biggest performance boosts of a dedicated network comes from the ability of each peer to maintain a connection to every other peer. If a dedicated network becomes too large, or if nodes participating in the dedicated network don’t have powerful enough hardware to connect to a lot of nodes, the performance boost could be minimal.
Dedicated IPFS networks provide a powerful tool for scaling when an application doesn’t need to participate in the public IPFS network. Leveraging dedicated IPFS networks can provide faster content discovery times for dedicated network nodes. This leads to a better and more stable user experience.
While dedicated IPFS networks provide many advantages, they won’t be a good solution for every project. There are things that should be considered before transitioning a project to utilizing a dedicated IPFS network. If you have questions about setting up a dedicated IPFS network, be sure to ask in our community slack.