Distributed Storage —Superior to Cloud in several ways

Sculptex
6 min readAug 29, 2019

--

This is a follow up to one of my previous articles, on Distributed Storage and Computing (link at bottom).

Cloud Storage

Since storage devices such as HDDs are prone to failure, cloud storage providers inevitably have to duplicate any user data stored with them to provide any decent level of reliability (redundancy).

If the duplicate data is held at the same DataCenter (DC) then a network failure to that DC can also prevent access to data. If the data is duplicated at various DCs, then there is still the challenge of routing the network to an alternative DC while ensuring that the alternative DC also has the latest version of the data.

Data Access speeds are typically limited to the network speeds between the user and the cloud DC.

Unless the user utilises client-side encryption, the files are typically stored ‘as is’ on the server. So anyone who gains sufficient access privileges on that server can access the files.

Typically, a cloud storage provider offers around 99.9%. Here is the current AWS S3 Service Level Agreement.

Monthly Uptime % — Service Credit %
>= 99.9% — nil
>= 99.0% — 10%
>= 95.0% — 25%
< 95.0% — 100%

So even the 99.9% should be taken with a pinch of salt, since compared with potential losses incurred of even 43 minutes of downtime in a month (99.9% availability) the losses could be devastating. The 100% compensation only kicks in when a day and a half (36 hours) of downtime is experienced!

Distributed alternatives

By utilizing Erasure-Coding (EC) techniques, users can simultaneously access multiple storage nodes and construct/reconstruct the data from them.

In a common EC 10/16 configuration, 16 nodes are used but only 10 of them need to be accessed to reconstruct the data. So the overhead is 60% which compares favourably with a typical 200% with (2x) duplication.

Access speeds are potentially much faster than cloud because multiple nodes are accessed in parallel although for users with slower connections (home, wifi, mobile), the bottleneck is likely to be on the user side.

With EC, only a partial, encoded segment of a file is present on any of the server nodes at a time, so even if a server is compromised, user data is safe.

Although distributed storage providers may not provide SLAs in the same format as Amazon S3 above, calculations can be made to determine the reliability and availability of sufficient nodes being available to reconstruct the user data.

Comparing Distributed Solutions

Sia

Sia storage defaults to an EC 10/30 ratio which only requires 10 out of the 30 nodes to be online/available. The reason this ratio is so big because so many of the nodes are either:-
a) just regular computer users sharing spare space on their hard disk or
b) run by computer personnel utilizing (temporarily) spare resources available to them.

Unfortunately, since the reward for being a Sia miner is currently so low* there is little reason for loyalty when a more profitable opportunity comes along. Using the default 10/30 gives a 200% overhead, so this is no more efficient than the typical duplication used by cloud providers

* On 16th August, Sia introduced an “Estimation Algorithm Change”. Nothing changed except the way average miner rewards were calculated. This now represents “only hosts proven to be reliable" and there was an instant threefold increase in price (which kinda proves my point above why so many nodes are required to provide redundancy).

Initial Write speeds appear good, but the process of creating redundancy (duplicate encoding across 30 nodes) seems to takes significantly longer.

Read access speeds are generally very good, providing sufficient nodes are available from which to reconstruct the data.

I cannot find published figures for reliability and availability for Sia storage, but with the default EC 10/30 ratio, it will be an order of magnitude better than the Amazon AWS S3 figure above.

NOTE: There is a significant barrier to getting started with Sia which is that you need to fully sync the entire blockchain before you can use it. Additionally, there appears to be no free allowance so you need to also obtain tokens to test its suitability for your purpose. There are several companies offering storage services built on the Sia platform which for example negates the need to sync the entire blockchain. They typically charge several times the going rate for the Sia Storage and may even introduce a level of centralization which users are trying to avoid by using a distributed offering in the first place.

0Chain dStorage

0Chain specifies enterprise server performance hardware for their nodes including the special types of miners (named ‘blobbers’) that provide the dStorage. The entire reward structure is geared towards maximum availability and fastest response time to ensure maximum rewards are achieved. The blobbers stake ZCN tokens to show their commitment to providing the service, and a portion of these tokens can be forfeit in case of penalty.

The miners on the network periodically (and randomly) challenge the blobbers to ensure both accuracy and availability of data held. Any blobbers that fail these challenges are penalized. By maximising rewards to good performers and penalizing poor performers, a high quality network is always ensured.

Access speeds are excellent, the users device is basically reconstructing the data from the 10 fastest responders (all of which are using enterprise quality hardware). Writes are committed and witnessed on the blockchain in seconds, there is no lag waiting for full redundancy to be established.

For a dStorage with a EC 10/15 setup, Boffins at 0Chain have calculated a 14 9's availability (or 99.999999999999%) for dStorage which is exponentially higher than any cloud provider! The overhead to acheived this is only 50% so is much more efficient than duplication.

dStorage prices can fluctuate, but are recommended to be set to 50% of AWS prices. By comparison with other distributed storage offerings such as Sia, 0Chain dStorage may seem a little more expensive but now you can see why; it achieves the optimum balance of a high performance network while ensuring high availability and maximum efficiency.

Other distributed Offerings

There are many other distributed offerings in progress such as Storj etc. which also provide the key distributed features described above. None of the platforms I looked at currently seem to be in a state that they can be tested in the wild and offer significant improvements over the Sia platform.

IPFS and Torrent based distributed solutions have other drawbacks, including guarantee of availability, privacy and difficulty in ensuring quality storage nodes (as Sia above). Although incentive layers such as Filecoin are aiming to address these issues, there is nothing that comes close to being regarded as Enterprise Grade that I have seen.

Conclusion

Both Sia and 0Chain dStorage (plus potentially several of the other emerging distributed platforms not detailed that use EC techniques) can claim superiority to typical Cloud storage in several aspects:

  • Speed (by parallel downloading)
  • Accessibility (redundancy)
  • Cost (typical <=50% of AWS S3)
  • Security (data is encoded)

For reasons outlined above, I think 0Chain dStorage is the only solution that currently demonstrates an Distributed Storage Platform that is worthy of being called ‘Enterprise Grade’. (Storj Tardigrade is supposedly targeted at Enterprise, it will be interesting to see how it compares to 0Chain dStorage when it becomes accessible * see comments).

Here is a link to my previous article which includes some additional analysis on Cloud Storage https://link.medium.com/ab48LQ1IpZ. Since that article, the 0Chain network has progressed into Alphanet and developers have had SDK access to test and develop on the platform. The distributed computing element referenced in that article that was originally planned for 0Chain has since been dropped from the roadmap and the team have instead concentrated on the dStorage element along with several other innovative protocols.

About The Author

(I am a blockchain enthusiast, fairly active in the 0Chain telegram with ambassador status. My comments and views are not necessarily that of the 0Chain team)

References:

https://aws.amazon.com/s3/sla/

https://sia.tech

https://siastats.info

https://0chain.net

https://t.me/Ochain

--

--