Decentralized Cloud Storage: Fact vs Fiction
I’m a huge fan of decentralized cloud storage. The concept is simply beautiful. And, in fact, we built our product with the understanding that decentralized cloud storage will probably be the future.
However, I’ve heard a lot of BAD arguments in favor of decentralized cloud storage. And I thought it might be worthwhile to dispel some of these so the community can grow.
Bad Argument #1
“Decentralized is better because it’s encrypted”
The argument generally takes the following form: Decentralized cloud storage is secure because the client software encrypts the data BEFORE uploading it to the “cloud”. Therefore the person/people storing your data cannot read it. And, it’s somehow impossible for non-decentralized solutions to do the exact same thing because… reasons.
It’s true that Dropbox can read your data. As can Google Drive, and many more. However, there are quite a few cloud storage providers that already do client-side encryption, and thus cannot read your data. Storm4 does this, and so does Spider Oak, just to name a few.
Now, the argument can still be made that decentralized storage providers split your files into many different pieces, and thus any particular host only sees a portion of your (encrypted) file. But notice how this argument is VERY different from the original. You’re no longer arguing secure vs non-secure. Now you’re arguing secure vs secure, and the argument essentially distills into a claim that the non-decentralized security has a non-negligible risk that one’s data could be decrypted. In other words, a claim that the crypto can be broken. And it doesn’t take much to counter such an argument.
If you happen to be a fanboy of decentralized everything (like me), you’re probably thinking of a dozen other arguments in your head right now. And screaming them at me. So it’ important to remind you of the purpose of this article: to dispel the BAD arguments (so the community stops parroting them), and get us to the GOOD arguments (so the community can start using them instead).
Bad Argument #2
“It’s way cheaper”
It’s TRUE (at least for Sia). Sia is cheaper. And, of course, it’s not a bad argument to say something is cheaper. But unfortunately most people back this up by comparing Sia to AWS S3 standard storage. I keep hearing people say, “It costs over $20 per terabyte/month in AWS S3. But only $2 per terabyte/month in Sia”. Yeah sure, if you want to compare apples to oranges.
But a fair comparison is probably AWS S3 glacier storage vs Sia. Why? Because Sia is being used almost exclusively for backup purposes right now. And this is because, among other reasons, the minimum file size for Sia is 40 MB. Not a problem if you’re storing zipped directory backups. And thus if we want to compare prices to AWS, it makes more sense to compare to their backup/archival storage tier.
- AWS S3 Glacier Storage: ~$3.75 for 1000 GB / month
- Sia : ~$2 for 1000 GB / month
Sia is still cheaper. But it’s no longer 10 times cheaper. When compared like this, it looks more like one pays a premium to use the mature battle-tested S3 system.
Bad Argument #3
In the early days, I would hear the community use terms like “Dropbox killer”. But as the decentralized providers matured, they transitioned to a marketing pitch of competing with the likes of AWS S3, Azure, etc. And the community has mostly followed suit, but I think the ethos persists. In other words, I still hear people say they’re waiting for Sia to create a Dropbox-like experience. And herein lies the problem.
AWS S3 is not Dropbox. And amazon does not have a dropbox competitor. Why is that ???
Well, first of all, Dropbox ran atop of AWS S3 until somewhat recently. And many other cloud storage apps still do run atop of S3. So in one sense this is simply Amazon sticking to their strengths, and providing the backend services while others write the frontend apps. But what I’m hinting at is deeper than this:
Storage does NOT equal Sync
AWS S3 is a storage system. And Sia is a storage system. And a storage system is NOT a sync system.
Consider, for a moment, the simplest sync system. You have 2 desktop computers, and there’s a folder that’s supposed to be synced between them. There’s a sync app that’s running that’s supposed to do this. So on computer A you drop a file into the folder. And you expect it to appear on computer B relatively quickly. Here’s what is generally required to make this happen:
- computer A needs to upload that file to the cloud
- then computer B needs to be notified that something changed
- but who notifies computer B ?
- well a sync server has to notify computer B
- which means there needs to be a server somewhere that: knows the file was changed, knows which users are interested in the file, knows which devices the user(s) are logged into & knows how to send some type of push notification to those device(s)
But wait, we’re not done yet! Because computer B may have been offline. So if computer B comes online at a later date:
- computer B needs to quickly discover what’s changed
- which means there needs to be a log regarding these changes
- which means there needs to be a sync server storing these logs
But wait, we’re still not done! Because what happens if computer A & computer B both attempt to make different changes to the same file at the same time?
- there needs to be a “truth system”
- for example, computer A uploads its changes first, and the change is accepted
- computer B uploads its changes afterwards, and the changes are rejected
- and thus computer B needs to figure out how to merge the changes, or otherwise resolve the conflict(s)
And it’s worth pointing out that AWS S3 has no support for such a truth system.
Now, obviously, there are multiple ways to achieve these engineering tasks. But the point remains: storage does not equal sync. To get to sync requires solving a LOT of other engineering requirements on the backend. In other words:
It’s easy to layer single-computer backup on top of a cloud storage system. But a cloud storage system is just one component (of many) required to implement sync.
What I’m trying to say is that the community probably shouldn’t be expecting Sia or Storj to implement a Dropbox-like app. It’s far more likely that other development teams will fill that role, and will support one or more decentralized cloud storage backends. In fact, that’s what we hope to do with Storm4.