How the Digital World of Data Storage is Changing Fast - The Biggest Decentralized Disruptors of Today

How we store, access, and utilize data rules much of our lives today. Many of us spend much of our day digitally accessing data. The data we access is mostly stored on a central server. This centralized business model rules the world of data storage today. With the spread of decentralized innovation, some interesting alternatives have arisen.

Blockchain technology has revolutionized the idea of what money is and how we use it. When it comes to data storage, simply storing on the blockchain can be slow to confirm and the immutability aspect is not always a benefit. It can also be costly, pricing at several dollars to store a kb of data on an Ethereum smart contract.

However, other decentralized and distributed technologies are being built to disrupt the model of how we store and access data. Some of the most exciting alternatives include peer-to-peer file systems, decentralized cloud storage, and distributed database technology.

Peer-to-Peer File Systems:

Created by Protocol Labs, inter planetary file systems (IPFS) aims to address some of the drawbacks of hypertext transfer protocol (HTTP) which rules the web today. It proposes a different model based on a peer-to-peer network.

Protocol Labs is a R&D company for network protocols and also a former Y-Combinator startup. All of Protocol Labs work is open source.

The centralized model HTTP is based on is not very efficient, relying on one computer for the download of files. IPFS is based on a different model which is more fast and efficient. It is a peer-to-peer hypermedia protocol.

There are two key properties of note in IPFS. Firstly, it is a distributed file system. Secondly, it is a versioned file system that not only stores files, but also tracks their versions over time. The combination of these two properties is the new model which IPFS proposes and has a lot of benefits when compared to HTTP.

HTTP is a request-response protocol. Every computer puts in their own request, connects to the server, and retrieves the data. The physical proximity of the computers accessing the same content in HTTP is irrelevant.

How does IPFS work? When files are added to IPFS, they are given a cryptographic hash. This means that when you look up a file, you are asking the network to find nodes storing the content behind the unique hash. The unique hash acts as an address, similar to how you would type in a web address in HTTP. A decentralized naming system called IPNS is also applied so that each file will have a human-readable name.

So instead of using a location address as in HTTP, it uses a representation of the content (the cryptographic hash), to access the actual content. By using this system, IPFS leverages physical proximity. If 100 students in a biomed class are all told to access the same webpage, via HTTP they all need to individually connect to the central server. This can cause several issues. You are completely dependent on this one server for your ability to access the content.

With IPFS, peers can pull data from one another. All that is required is that you access the starting point of the data, which is the cryptographic hash. This is how IPFS leverages the physical proximity and enables a more efficient system than HTTP.

With data stored behind the cryptographic hash, this actually allows websites to be hosted on client-side browsers. Client-side refers to the operations that are performed by the client in a client-server relationship. In HTTP, the client sends a request to the central server for content. In IPFS, the client seeks the cryptographic hash leading to the actual content which is hosted by another client in the network.

Blockchains also use cryptographic hashes. In IPFS, the cryptographic hashes are used so that when you ask the network for a file you can find which node is storing it. In blockchain, the cryptographic hash is used to link each new block to the previous so that if a node were to alter the ledger, every block after the point where the ledger was altered would also be changed. This is one of the properties that makes blockchain technology secure.

With similarities to blockchain technology, IPFS also brings on some of the benefits of decentralization. One of the main benefits of decentralization is that there is no single point of failure. In HTTP, if the server goes down, the content cannot be accessed. In IPFS, the nodes are decentralized and the failure of one does not represent the failure of the whole network.

Another benefit of decentralization is censorship resistance. In a server-client model, the central server can easily censor or reject to hold your content. Content is censorship resistance if at least one node in the network is willing to host it.

One of the drawbacks of IPFS is it can serve only static files. This limits the applications of IPFS hosted content. For example, the hosted content will be unable to take requests from a user and generate a response based on the data in the server. Imagine a user looking to see how many and which seats are left and where for the latest ballet. This kind of activity is not possible in IPFS.

Example of Dynamic Data

Another drawback of IPFS is you need to stay online to share your file. Due to the cryptographic hash pointing to the direction where the data can be found, you must stay online to ensure that your data is accessible.

On one final note, IPNS names are also not very user friendly although there is work being done on this. They can be made more user friendly using a domain name system, but this will introduce an external point of failure. An IPNS will look something like the following.

IPNS Example

For readers who would like to delve further into the technological components that make up IPFS, the below table includes the primary components along with their corresponding function.

IPFS is putting forward an interesting alternative with some advantages over the request-response model of HTTP. The peer-to-peer protocol brings on some key benefits associated with decentralization such as no single point of failure, and censorship resistance. The leveraging of physical proximity is another aspect which the HTTP model fails to utilize.

Decentralized Clouds File Storage:

For those of you who have used Dropbox, decentralized cloud storage can be thought of as the same except that the content is stored on user’s hard drives instead of on a central server. Users essentially rent out their hard drive space and many cryptocurrencies have been developed to facilitate this business model.

Decentralized cloud storage aims to bring the benefits of decentralization such as no single point of failure and censorship resistance. In the typical centralized model such as services run by Google, Amazon, and Dropbox, access to your data can be restricted if there is a problem with the server. The privacy of your data is also questionable in these models. A decentralized model has no single point of failure providing you with unrestricted access to your data and uses encryption to enhance privacy.

However, it can also bring on the drawbacks of decentralization such as scalability issues. In terms of cost, simply storing on the blockchain will be expensive but many of these decentralized cloud storage models have developed a free market model which make it more affordable. You may say to yourself that you can use Google drive for free, but you need to consider what kind of privacy tradeoffs you are making. In the centralized model, the companies have control of your files along with the ability to access them.

The vision of Ethereum as a world computer requires a storage aspect. While smart contracts deal with decentralized logic, and whisper deals with decentralized messaging, Swarm Network is the vision for decentralized storage.

Graph from Ethereum Blog

Similar to IPFS, files are addressed by the hash of their content. However, where it differs from IPFS is that it is actually immutable due to being built on the Ethereum blockchain. IPFS is editable as it uses a distributed hash table and version control system.

Furthermore, in IPFS, the hash points to where you can locate the data whereas in Swarm, the hash points to the actual data itself.

How does swarm network actually work? It has built-incentives which can be split into two main categories. These are bandwidth incentives and storage incentives.

Bandwidth incentives are designed through the Swarm Account Protocol (SWAP), where each node in the network sets the highest price per chunk of data along with an offered chunk price. The highest price per chunk of data is how much you are prepared to pay for a chunk of data from the storage, and the offered chunk is the price you pay to deliver a chunk of data. Each node keeps a balance of payments and nodes who go too much into debt in this model will suffer a bad reputation, thus providing incentive for the model to work.

Storage incentives is based on the idea that nodes that are willing to provide long term storage put up a security deposit. If they are challenged and it is proven that they do not hold the data, they will lose the security deposit.

Although bringing on the benefits of decentralization, Swarm network will also bring on the limitations of the Ethereum blockchain as it plans to be deeply integrated with the blockcahin. One of the biggest limitations of the Ethereum blockchain is the throughput with a maximum capacity of 15tps.

Another project built on the Ethereum blockchain is Storj. Storj is an open source software which anyone is free to fork their own version of. It was created by Storj labs, a for-profit company who run their own version which has developed a strong user base. They charge for the use of their version with users deriving value from the network effect provided.

The key technological components of Storj include encryption, file sharding, and a distributed hash table. The aim of Storj is to be faster, cheaper, and private.

File sharding requires the files to be split into smaller pieces called shards when uploaded using Storj. The privacy benefit of this is that no single entity owns your entire file. Furthermore, only the uploader knows the location of the shards. A private key is required to access the files with it being near impossible to locate the shards without this key. The data is also encrypted before being uploaded and can only be read by users with the private key further enhancing privacy.

The benefits of file sharding are two-fold in the Storj as they enhance privacy but also make accessing the content faster. Shards of the one file can be retrieved from many sources at the one time making it faster when compared with retrieving one large file from a centralized server. The process can be seen below.

Storj Review

Storj puts in place incentives to ensure the safety of the data and the proper running of the network. Periodic audits are conducted, and micropayments are received for successfully passing the audits.

Storj is one of the mostly widely adopted decentralized cloud storage projects. IPFS are developing their own decentralized cloud storage project, Filecoin, and envision a free market model making storage rates more affordable. Storj is the most adopted and it’s demonstrating the effectiveness of the free market model on rates. Storj is also the cryptocurrency used to participate in the ecosystem.

What are some of the limitations of the model Storj is applying? Imagine you store a file using Storj and split the data into different shards to be sent to different hard drives and one of these hard drives go down. This makes content inaccessible. For this reason, clones of these shards known as parity shards need to be sent to other hard drives. If you overdo this, it will slow down the network. It’s a balance between hitting the right amount so it’s secure but not overdoing it as it will be costly. Users are free to choose the amount of parity shards they want to implement but regardless as time goes on, the probability of some users losing shards rises.

One other weak point is that Storj users pay for what they rent so if they disappear, renters may no longer be paid for the content they are storing.

Another decentralized cloud storage option and effective competitor to Storj is Sia. It is a peer-to-peer storage ecosystem where anyone can rent or use spare hard drive space. Siacoin is the cryptocurrency for the ecosystem. It is what is used to pay for renting spare hard drive space.

In Sia, you need to know both the person and the hash to access the file which is one extra piece of information required when compared to IPFS.

Sia has their own blockchain. While Storj addresses scalability issues using file sharding technology, Sia is experimenting with the method to achieve consensus across the network. The consensus mechanism for Sia is proof-of-storage whereas proof-of-work is utilized on the Ethereum blockchain. This experimentation with a different consensus mechanism may help address some of the scalability and sustainability issues associated with proof-of-work.

If any issues arise in the storage of data or hosts do not perform as they have indicated, they will be penalized. This provides a monetary incentive for the ecosystem to function as it should. Penalties will be specified in a file contract which is developed each time a storage operation is taking place. Contracts are stored on Sia’s own blockchain.

Some of the key selling points for Sia is the privacy enhancing technology elements, and the free market structure. You can see some of the advantages it holds in terms of decentralization and costs over centralized structures in the table below. Although, it has not reached the same level of adoption as Storj, Sia still offers lower rates for large amounts of data storage when compared to the centralized data storage market leaders.

One of the benefits of decentralization that Sia doesn’t bring on is the censorship resistance property. Hosts can choose to reject to store some data. This contributes to the ethical element of the Sia ecosystem with hosts having the right to reject illegal or questionable data.

In terms of throughput and scalability, the effectiveness of Sia relies upon the efficacy of proof-of-storage as a consensus mechanism. Experimenting with consensus mechanisms is still a nascent science and proof-of-work is largely considered the most secure. Time will tell whether proof-of-storage is a viable alternative.

Decentralized cloud storage is utilizing cryptocurrency and blockchain technology to provide different models to data storage. Filecoin will be the decentralized cloud storage option for IPFS. It is envisioned to act as free market, but it remains to be seen whether it will really function this way. Storj is the closest comparison in the cryptocurrency space with the widest adoption to date. Storj addresses throughput limitations through its use of file sharding and also has some strong privacy enhancing benefits.

Distributed Database Technology:

Modeling off of blockchain technology, BigchainDB aims to be a distributed database with the properties of a blockchain. BigchainDB is structured to be immutable, decentralized, and to provide a platform for developers to build applications on.

It is highly debatable whether BigchainDB is decentralized. BigchainDB is built upon the RethinkDB cluster. RethinkDB prioritizes availability over consistency for their nodes. Transactions will be processed by the first available node, and consistency will be achieved at a later stage.

This system architecture actually results in BigchainDB failing to achieve Byzantine-fault tolerance. One malicious node can crash the whole network.

BigchainDB is sacrificing security and decentralization for throughput. This is consistent with the scalability trilemma where in a decentralized system where every node processes every transaction, only two of three properties can be achieved where the properties are decentralization of block production, scalability, and safety of the network.

With normal blockchains, there are throughput issues which make it problematic to be used for data storage. BigchainDB is more like a private blockchain that provides some blockchain-like elements with increased throughput and low latency. The real question is whether it is worth the security and decentralization tradeoffs, and by sacrificing these, are we not just better using a centralized system in the first place. The below table is how BigchainDB would justify its use case over traditional databases and blockchain.

Despite the debates regarding decentralization and security, BigchainDB is being actively developed upon. One protocol built upon BigchainDB is Ocean, which is a tokenized ecosystem for data storage and computation.

The ocean protocol envisions many use cases for their ecosystem. The goal is to unlock the huge amounts of data being generated to impact the artificial intelligence (AI) industry. With only a small number of companies having access to the right data along with the AI capabilities, these companies have been placed in an extremely powerful position.

AI advances six times faster when large amounts of data are available and with the unlocking of the large amounts of data, Ocean protocol envisions AI impacts across a multitude of industries including autonomous vehicles, medical data, and computer vision. One advantage Ocean Protocol notes is that many data owners don’t trust centralized exchanges with their data. The blockchain properties of BigchainDB may help in this regard as a transparent and immutable ledger will be used for transactions.

Data Storage Today, Tomorrow, & Beyond:

Centralized servers have long ruled the data storage landscape. Many services are free to use but you need to consider the privacy tradeoffs you are making. Decentralized models look to improve some of the inefficiencies of the centralized model and also offer privacy enhancing benefits.

Whether decentralized technology can be adopted at scale in the data storage industry remains to be seen. Different models being proposed include peer-to-peer file systems, decentralized cloud storage, and distributed database technology. One of the key limitations is the tradeoff which occurs between scalability, security, and decentralization. To increase throughput, distributed systems often need to sacrifice security and decentralization. Within the cryptocurrency space, different solutions being proposed include file sharding and altering consensus mechanisms.

A lot of advantages are offered by these decentralized systems over their centralized counterparts. Centralized models still rule the landscape today but with more and more users and developers experimenting with distributed systems, the future of data storage may look a lot more decentralized.