Problems with Quasar based publish-subscribe systems in peer-to-peer storage networks

8 min readSep 22, 2017

Background

Since I joined Storj Labs about a year ago, I’ve been working with the team on implementing a peer-to-peer storage network as described in the Storj whitepaper v1 and v2 with improvement proposals. Great progress has been made; however, there is still a lot of work remaining to achieve the goal as introduced in the Storj whitepaper:

“A peer-to-peer cloud storage network implementing end-to-end encryption would allow users to transfer and share data without reliance on a third party data provider.”

As it is currently, Storj Labs runs api.storj.io that is a third party provider that manages meta data, and monitors uptime, of files hosted among peers on the network. It’s also possible for people to run a Bridge themselves, however there is currently a bootstrapping problem and a fair amount of manual work that would need to be done by Bridge operators. I have drafted a few improvement proposals in that area: SIP7 for Bridges to be discovered by Farmers, and SIP8 to address some of the issues with automating payments to Farmers from a Bridge. Furthermore, I think that progress around scalable payments on Ethereum will have a positive affect on the growth of new Bridges, such as that of the Raiden Network (inspired by the Lightning Network from Bitcoin). Such a payment system can be used for incentives for bandwidth and file transfer that can be made instantly. I believe these to be important areas of research to achieve the goal of peer-to-peer cloud storage.

For around the first 8 months that I’ve been at Storj Labs, we’ve been focusing on improving the reliability of downloading of files on a peer-to-peer network. When I first joined: the protocol would have incompatible changes frequently and version 1.0.0 was still in the works; files uploaded with over a few hundred megabytes would have around a 50% probability of being retrievable, even immediately after an upload; larger 10+ gigabyte files had a near 0% probability of being retrievable; mirroring was a manual process triggered on the client side, and erasure encoding of files wasn’t yet implemented. I had been sold on the idea of decentralized storage as more reliable; but there we were with disastrously less reliability. We remained steadfast and continued towards the goal.

It was a very exciting when we finally achieved some reliability of downloads. Months of hard work had gone into experimenting with different techniques, and we finally found the right combination and had a path forward for further improvements.

Storj is now at v1.2.0 protocol version and the probability of files being retrievable is moving to near 99% — I can go into more details on this later. Monitoring of Farmers storing data has been added. In the event that a Farmer goes offline, the data that was stored on that Farmer will be mirrored to an existing Farmer from other Farmers also storing the data. Mirrors are also created automatically when initially uploaded. Erasure encoding to files before they are uploaded into the network has also been implemented, this provides the ability to recover shards in the event that all mirrors for a shard disappear. This addition has had the most dramatic increase in the reliability of downloading files, testing of 1GB to 20GB files were retrievable up to 90 days (when the contracts expired). Using Reed Solomon for erasure encoding did however come with its disadvantages, primarily around quadratic scaling with the shard number and size, it’s an area of research that would be greatly beneficial. The combination of mirroring and erasure encoding solved most of our issues with persistence of storing data. The next largest issue was with uploads, we were unable to get storage offers to upload data reliably and often times not at all.

The Problem with Uploads

The Storj network has a publish-subscribe system built into it between participants of the Kademlia Distributed Hash Table. Kademlia DHT is a way to give structure to a distributed network; each node keeps a routing table organized by XOR distance. Please read the Kademlia Whitepaper for more details. The routing table is organized by XOR proximity with neighbors being those that are “closer”. The publish-subscribe system works by each node keeping track of what their neighbors are interested in using a bloom filter. When a node receives a PUBLISH message it will relay that message to the three closest neighbors that are interested in the topic, this will repeat for six hops. The message will be relayed a maximum of 3 ^ 6 times, or 729 times. This method of publish-subscribe is called Quasar and based on a paper written for a publish-subscribe system for social networks. Please read the Quasar Whitepaper for more details.

Quasar PUBLISH message routing graph with only 4 hops and without errors in propagation

Late in the year of 2016, and just prior to when I joined the team, they had just spent a lot of time debugging an issue that was manifesting as a “502 — Bad Gateway”. For more information about this please read the blog post, “How To DDoS Yourself”. They eventually tracked it down to an issue within the Quasar publish-subscribe system. It was an unintentional self-imposed DDoS attack that was caused by a near endless number of PUBLISH messages being relayed through the network. The mechanism to determine when the message should stop being relayed was based on seconds, and the number was set too high. It was eventually resolved by decreasing the TTL and then moving to use a number of hops instead of time. Furthermore in a reaction to this, the Kademlia DHT and Quasar parts of the Bridge were separated into their own services; this is now called Complex and has Landlord and Renter processes with RabbitMQ for messaging. The separation was made so that any DDoS attack, or other issues, on Kademlia DHT or Quasar would not completely lock other services.

When I joined, my first project was working on Complex. The goal was to have many nodes share the same public identity with hierarchical deterministic keys, also known as SIP32, named as similar to BIP32 from Bitcoin. In hindsight, I’ve started to see Complex as having added a lot of unnecessary complexity (pun intended) to the project, with too many layers, sockets, queues, logs, proxies, load balancers any of which can have a problem and cause a collapse in basic functionality. It made debugging arduous, requiring a tenacity digging through log files for the root cause of issues, selectively eliminating each link in the chain of events as the cause.

The Quasar publish-subscribe system is what is used by Renters to advertise the need to store data to nodes able to store data, a.k.a. Farmers, during uploading. Farmers will subscribe to topics to which Renters will send PUBLISH messages with contracts to store data. A Farmer will then look at the contract and send an OFFER message to the Renter. Once received, the Renter will send a CONSIGNMENT message to the Farmer to get a token to upload the data to the Farmer. All of that has to happen within the timespan of an HTTP request. If any of that process goes wrong, the request will timeout and, after several retries, will lead to the error message: “Unable to receive storage offer”.

We ran coordinated stress tests on uploads with the Storj community to gather data and identify points of failure. I added logging to track requests through each step from the Bridge, Landlord, RabbitMQ, Renter and Farmer. I initially found issues in the time it took the Farmer to respond with an OFFER after receiving a PUBLISH message, we fixed that by including the contact details in the PUBLISH message to avoid the recursive FIND_NODE query. This improved the reliability considerably. I also noticed that it wasn’t a lack of OFFER messages from a PUBLISH message, Farmers had space available, it was just that the OFFER message were coming in 20+ seconds after, which was too late for the HTTP request to respond to the client, leading to a timeout. I then added a lookup to a cache of OFFERs before sending out another PUBLISH message. This also helped improve the reliability a bit more. However, it’s still not enough, as we’re still getting errors with uploads, and the “Unable to receive storage offer” error, however just a bit less frequently now. Further digging was needed to identify the root of the issue, the patches were not enough on their own.

The Root Issue

I tracked down the issue by getting the time delta between each step, and the largest delta was between the time when a Renter would send a PUBLISH message and when a reply with an OFFER was received. We had already ruled out the issues between the time a Farmer received a PUBLISH message and when an OFFER was sent, so the last remaining link was the relay of PUBLISH messages between Farmers, also known as the Quasar publish-subscribe system.

Using Quasar publish-subscribe does not work as a secure, reliable, and scalable system for publishing contracts for storage in a peer-to-peer network. There are several issues:

General latency when relaying messages across the network, even when working correctly.
Participants can unintentionally overload their systems that will cause bottlenecks leading to large delays with the relay of PUBLISH messages.
Participants in the network can intentionally refuse to relay the message as to give themselves a better chance to store the data by creating a smaller pool of OFFERs.
Participants, and without much cost, can attack Renters by “surrounding” them and refusing to relay their PUBLISH messages, cutting them off, or severely slowing the ability to store data. This is known as an eclipse attack.
Farmers that are closer in XOR distance to Renters will have the first opportunity to respond and before they relay the PUBLISH message. This can be gamed to turn the structure in against itself.
Shards with a similar hash tend to land in the same neighborhood of Farmers, this is by design of Kademelia DHT. However, Farmers can brute-force identities all in the same neighborhood and have a higher chance to receive data with a specific range of hashes. They can then upload shards with similar hashes. This can then be used to provide false reports, that appear true, to a Bridge.

The Solution

For those reasons, I’ve worked with the team and community to start to move away from using Quasar for publishing contracts to Farmers. We have implemented a fix for these issues, known as SIP6. Bridges, a.k.a Renters, will directly contact Farmers, with Bridges keeping a pool of Farmer contacts available to select using a variety of metrics. Farmers will contact Bridges directly to let them know they exist, where they can be found, and if they have space available. This can be expanded later; for example, to establish a market where participants could connect based on agreed storage prices and preferences.

These changes are now included in the latest release of Storjshare. Once there is a significant adoption, Bridges can activate SIP6 to start using the new commands to improve the reliability and scalability of uploads — time will tell the effectiveness of the solution. I’m interested to see it in action and to make further improvements.

These changes will also pave the way to reduce the unnecessary complexity within the system. Bridges will be able to directly communicate with Farmers using request-response pattern via JSON-RPC, and Farmers will communicate directly with Bridges via request-response pattern via a REST-API. There will not be much functionality from a Kademlia DHT that will still be used — unless there is new purpose. It’s likely that the Complex will likely be no longer necessary, and what remains can be folded back into the Bridge.

The peer-to-peer storage network could eventually end up having a unstructured organization similar to this graph:

Problems with Quasar based publish-subscribe systems in peer-to-peer storage networks

Background

The Problem with Uploads

The Root Issue

The Solution

Written by Braydon Fuller