The IPFS Gateway Problem
A Bottleneck Between Web2 and Web3
What are IPFS Gateways?
IPFS gateways are how web users retrieve content on the IPFS network without running their own IPFS node. Gateways allow web clients to ask for content that resides on the IPFS network.
Content can be retrieved from an IPFS gateway by visiting a link that’s typically formatted like this:
The “Qma6e8dovfLyiG2UUfdkSHNPAySzrWLX9qVXb44v1muqcp” part of the link above is the Content-Identifier (CID) that’s being asked for. It’s important to realize that a user could retrieve that same image by visiting any public gateway, not just one run by Pinata. For example, try retrieving that CID at the following public gateways:
Notice how you retrieve the same image from each gateway? That’s because the CID provided to each gateway was the same. When you visited each gateway, it searched the IPFS network until it found the content for the CID “Qma6e8dovfLyiG2UUfdkSHNPAySzrWLX9qVXb44v1muqcp”. Once that content was found, the gateway provided it back to you in your web browser.
Gateways provide a stop-gap between Web2 and Web3.
Without using an IPFS gateway, the only way to access content on the IPFS network is by running your own IPFS node. Unless you have an application that natively runs an IPFS node for every user, applications using IPFS will point users to a gateway when they need to retrieve content on IPFS.
An often under-recognized benefit of IPFS is its ability to provide CDN-like functionality when retrieving content from the IPFS network. I talk about this in my previous article: The IPFS Cloud.
Instead of having to provide the IPFS network with the location of whatever content is being retrieved, like with HTTP, you simply tell the IPFS node, “Find me the content for this CID”. This means that the IPFS node will retrieve the content from whatever node can provide it the fastest.
Why Are Gateways a Problem?
When an IPFS node retrieves content, that content is temporarily cached on the receiving node. As more nodes request that content, the speed at which that content can be retrieved increases dramatically. Unfortunately, IPFS gateways negate this benefit. When using a gateway to retrieve content, a significant delay gets introduced to the IPFS retrieval process.
In the image above, imagine that the user and the node with the content are in the same city and the gateway is in a completely different city. Green arrows indicate the successful request and return of a piece of content. Red arrows indicate that a request went out but the node didn’t have the desired piece of content so nothing was returned. The length of the arrows roughly indicates the amount of time each request takes.
With Local IPFS Node
In the first scenario, the user has their own IPFS node locally running on their computer. When the user requests a piece of content from the IPFS network, their node will be able to retrieve that content quickly as their node is in the same city as the node hosting the content. Great!
With a Gateway
In the second scenario, the user doesn’t have their own IPFS node running locally. Instead, the user has to ask the gateway to find a piece of content for them. That gateway will then retrieve the piece of content from whatever node is closest to it and then that content will be sent back to the user.
It’s important to notice that in the gateway scenario the request takes significantly longer. Instead of being able to quickly retrieve the content from the node which is fastest, the user has to reach out to the gateway and then wait for the gateway to find the content and retrieve it. Both of these steps take significantly longer because the gateway is in a completely different city than the requesting user and the hosting node. Not great!
By utilizing a gateway to retrieve content on IPFS, we’re recreating the same Web2 architecture that IPFS aims to disrupt.
Except, instead of the data being directly on the server we’re talking to, the gateway has to first retrieve the content from the IPFS network before serving it back to the user.
Granted, if a piece of content is popular enough, the gateway node will likely be caching that content so future requests don’t take as long. However, the first request after each garbage collection will still incur noticeable delays in content retrieval.
Similarly to how the server is a bottleneck in modern client / server architecture, we’re turning the gateway into a huge bottleneck when retrieving content on IPFS. Instead of being able to rely on many different machines to request and deliver content, we’re forcing everything to run through one machine. The more and more users that rely on a single gateway to retrieve content, the harder it is for a gateway to keep up with demand. Which brings us to another problem plaguing IPFS gateways.
Tragedy of the Common Gateway
- Hosting our own gateway let’s us configure things so that Pinata users always have a reliably fast way to access their content.
- Hosting our own gateway helps the IPFS ecosystem as a whole.
However, doing so introduces a problem:
As a gateway increases capabilities, it’s more likely that the gateway will be utilized by the IPFS network.
A prime example of this is the Cloudflare gateway which recently banned video streaming. It is unknown why Cloudflare took the content down. However, it’s apparent that Cloudflare’s public gateway was generating enough traffic that it became a problem. Such a problem provides an interesting question:
How do you ensure that a public IPFS gateway continues to run without issue?
To maintain reliable service to users, you have to bulk up the gateway to keep up with demand. However, it becomes unsustainable long-term for companies providing the service to subsidize the whole network. If you run the “best” public gateway, you’re enticing public users to choose your gateway over somebody else’s. More users who aren’t paying for the gateway means more overhead. More overhead requires more resources. Pretty soon you’re experiencing a classic case of “Tragedy of the commons”.
One way to solve for this is to remove functionality from your gateway. This is the approach that Cloudflare has chosen to take. In Cloudflare’s case, it makes sense because they only intended for users to host their own websites on IPFS. It was not intended to be used for video streaming.
Another way to approach it is by restricting access to your gateway. This is something we haven’t seen. But, it would be possible for a company to restrict their gateway behind user authentication. With this setup, a company could make sure that only their users would be incurring gateway costs and could adjust their business model to accurately account for those costs.
The Ideal Web Without IPFS Gateways
The ideal solution to the gateway problem is pretty simple. Users need to run their own nodes instead of relying on public gateways. However, making that solution a reality is incredibly difficult.
As great as it would be to have every web user download IPFS and run a node on their computer from the command line, it’s probably not going to happen. For technical users, such an ask is just bad user experience. For non-technical users, such an ask is often impossible.
Luckily, projects like Siderus Orion combined with IPFS-Companion make the process more user friendly. However, for the majority of the public, the hassle of downloading two programs to consume content on IPFS is not worth it. Remember how consuming Flash content was such a hassle? There’s a reason the world moved to HTML5 content.
One solution to this hassle is for applications using IPFS to directly spin up a JS-IPFS node inside their web application. This approach allows developers to hide all of the IPFS technical stuff behind the scenes while allowing the users to interact with the IPFS network without a reliance on public gateways. Carson Farmer, of Textile, wrote a great tutorial on how to include a JS-IPFS node into a web app. The main downside to this approach is that you need to spin up a new node running for each website that uses IPFS, which can lead to performance issues.
As I’ve mentioned in previous articles,
Major browser support will be necessary for IPFS to truly succeed as a web protocol.
Major web browser support will hide IPFS from the end user while still providing the benefits of a p2p web experience in an optimized manner. Because IPFS is still a young protocol, major browser support will take time. However, it’s still something that needs to be strived for if we want to utilize IPFS to its full potential.
IPFS gateways are a bridge for the adoption and usability of IPFS. Unfortunately, IPFS gateways also negate some of the main benefits IPFS solves for. Primarily, gateways prevent users from benefiting from the peer to peer nature of IPFS.
The solution is to have more users running their own IPFS nodes. However, this becomes quite challenging when accounting for user experience. But, just because something is challenging doesn’t make it impossible. As the IPFS ecosystem continues to evolve, nodes are getting closer and closer to users and the knowledge gap to utilize IPFS continues to get smaller.