It is time for our data to be decentralized

We have all used “The Cloud,” it is a magical place where you save all your photos of your family, friends and your cute cat. As you may have noticed by watching the news, this magical place isn’t very secure or private. This magical place is just a server data center owned by Google, Amazon, or Apple; and it isn’t that magical at all.

Cloud storage was a significant shift in the way we save our data and use the internet. It is the process of renting out space on another machine and accessing that machine through the internet. What if instead of renting space from Jeff Bezos, you rented space from Bob, that lives hundreds of miles from you? What if your data was fully encrypted so only you could see it? This idea is called decentralized storage, and it is becoming a reality.

The Current Process

Objectives

The internet was designed to be a decentralized, free exchange of information across the world. Many protocols have been adopted over time in which the internet was built. One of these protocols is HTTP (HyperText Transfer Protocol). The HTTP protocol was designed to fetch resources, such as HTML documents, from servers. The protocol is based around requests (from clients) and responses (from servers). The HTTP protocol is stateless; thus, servers do not store any information about a session. Thus, every session is independent of the others, and all interactions start with a request from the client (your browser).

A high-level example of how the internet currently works is as follows. Say you want to get on Amazon.com. You open your internet browser and type in Amazon’s URL (Amazon.com). The browser sends a request (using HTTP and other protocols) to Amazon’s server. For you to establish this connection, the request is sent from your router to your Internet Service Provider (ISP) to Amazon’s ISP. Amazon then searches for the web page files and send them through their ISP, to your ISP and eventually through your router and to your computer. As we all know this process happens remarkably fast.

In this system, files are organized by physical location. If you save your cat pictures on Google Drive, then they are stored in the same data center(s) as everyone else’s photos and documents that they have saved to Google Drive. Let’s take a look at some of the negative aspects of the current system.

Cons

Cloud storage and HTTP are good at finding files and sending them from the server to the requesting client. However, as the way people uses the internet changes the more issues arise in the current system.

1. Data Privacy –

Turn on any news station, and you will not be able to go fifteen minutes without seeing a story about data privacy. These companies that store your data can legally search through and share your personal data. As we use the internet in more personal ways, this centralized control of data becomes a big issue. There is now a much greater need for data to be private than there ever was before.

2. Inefficiency with multiple calls –

The following example is often used to describe the inefficiency of the HTTP protocol and location-based organization. “Let’s say you are sitting in a lecture hall, and the professor asks you to go to a specific website. Every student in the lecture makes a request to that website and are given a response. This means that the same exact data was sent individually to each student in the room. If there are 100 students, then that’s 100 requests and 100 responses. This is obviously not the most efficient way to do things. Ideally, the students will be able to leverage their physical proximity to more efficiently retrieve the information they need.” — Karan Kwarta

3. ISP –

Every user depends on their ISP to get them connected with the proper servers and move the data quickly. ISPs are chokepoints in the current system as there are around twenty large companies that control the market. Other issues that arise are net neutrality and governments blocking websites. ISPs have control over what we see on the internet. The internet was created to be a decentralized exchange of information, but as the system grows older this exchange has become more and more controlled.

4. Centralized Data Centers –

To make the creation of a website simple many people use large central servers, like AWS, to store their website data and files. If there was an attack on AWS, we would lose access to a large portion of the internet. Similarly, if there is an attack on Google Drive you could lose your cat photos.

The hope for the internet was a free exchange of data. As with most capitalistic systems, a few players control a majority of the data, and many chokepoints have been established. This is not the way people imagined the internet would work, but we haven’t given up hope yet.

Decentralized Storage

Objectives

What does decentralized mean? Simply put, it means that no one entity controls the system, in this case, the data. Thus, decentralized storage does not rely on a Google or Amazon or any large internet corporation to save your data. In the decentralized storage system, only the owner of the data has access to it, and their data is distributed across a decentralized network. Decentralized Storage projects use open source code meaning that anyone can inspect how the data is being processed and protected.

Decentralized storage is based on two fundamental principles. These are peer to peer network and data encryption and are achieved in the following way. When you want to save a photo of your cats to the decentralized storage network your file is encrypted, then it is split up into multiple shards and stored across many different hosts in the network. Sounds pretty simple, right?

Decentralized storage can be a partner with Blockchain. Both operate in a similar fashion by using many nodes distributed across large populations. The aim is decentralization, rather than storing all of your data in a data center in say, Iowa.

Pros

1. Privacy

By encrypting all of your data and sharding it across many nodes, it makes it very difficult for anyone to see your data. Only you, the holder of the private key, can pull all the data back together and see it. Sorry Amazon, you can’t see it.

2. Security

By storing data across many nodes, there is a lower chance of files being lost. Files are saved redundantly so that if one node goes down, there are other nodes that you can access to retrieve your data.

3. Cost

Saving files in decentralized storage is more cost-effective because of the efficiencies that come along with it. Instead of Google having to maintain thousands of servers (keep them cool, update them, and secure their physical location) the data is stored on devices that are cared for by an individual owner.

4. Speed

Parts of the data are downloaded from different locations simultaneously rather than all at once from one data source. This allows for faster download speeds.

Current Projects

Below are five projects that have been working to create decentralized storage systems. They go about it in different ways, but the underlying principles of peer to peer network and data encryption are universal throughout them all.

IPFS

InterPlanetary File System (IPFS) is a peer to peer file sharing system created by Protocol Labs. They claim that they are developing a protocol to replace HTTP. Though, IPFS may prove to be a compliment to HTTP rather than strictly a replacement. HTTP uses the request, response protocol to send a small amount of data over the internet. IPFS moves IPFS objects through the internet. These objects are better for larger amounts of data and maintain file versions (similar to Git).

IPFS’s objective is to connect all computers in the network to the same system of files. Distributed websites are a use case, they have no origin server or server to talk to, and run entirely on the client side. A website wouldn’t be downloaded from one server; it would be downloaded in smaller chunks simultaneously from many nodes.

The data stored with the IPFS protocol is store across many nodes in the network and encrypted. Nodes only store data that they are interested in and some indexing data to help find the other chunks of data that are related. The data isn’t stored by location; it is searchable by the content of the files. Instead of asking what is at a location, you are asking where a file is located. IPFS (and many other decentralized storage systems) uses distributed hash tables to locate all of the chunks of data and piece them back together. Using IPNS the protocol creates human-readable names to store the data, making for a more efficient system to find data. After storing data with the IPFS protocol it cannot be deleted.

Filecoin is a token that the IPFS protocol uses to create a peer to peer storage marketplace. Users that are willing to store IPFS data are given Filecoin as a payment for their storage space.

IPFS is an ambitious project that could redesign the way we use the internet and store data.

Swarm

Swarm is a serverless peer to peer storage and content distribution system built on Ethereum. Swarm is very similar to IPFS. Swarm uses Ethereum to incentivizes nodes to give resources to the network like IPFS uses Filecoin. Swarms objective is to provide an infrastructure for Dapp developers.

Swarm is currently based on chain and is developing ways to use their protocol off of the Ethereum Blockchain. It uses the Ethereum Name Service to create human-readable names for the stored data.

Before storing the data across the network, it is encrypted on your local node. You hold the private key to decrypt your data, and that key is never shared to the rest of the network. Like IPFS, Swarm promotes resistance to DDoS, and you can never remove the data. Swarm is also working on implanting other services such as private encrypted messaging between nodes.

Sia

Sia decentralized storage is slightly different from the previous two projects. Sia is built on its own blockchain which supports a marketplace using Siacoin. Sia looks to replace centralized cloud storage like AWS. They promote a price point that is less than 10% of AWS.

The basis of Sia are smart contracts that connect renters and hosts. Smart contracts allow for the services to be provided and payments processed without a trusted third party. Hosts must hold Siacoin as collateral with the data that they are storing, incentivizing them to maintain the data. They then run periodical proofs that are saved on the Sia blockchain. Every time that a host publishes a proof, they get paid an allotted amount of the total promised Siacoin payment.

When a renter wants to store data using Sia their files are encrypted and divided into 30 segments before uploading. Then all the segments are sent to different hosts, not allowing a single host to be a point of failure. Sia uses Reed-Solomon Erasure Coding which means that any 10 of the 30 segments can recover all of the data in full. Sia provides a secure way for people to rent out storage and is an alternate method than large centralized cloud storage.

Storj

Like Sia, Storj is a decentralized cloud storage platform. Storj is a suite of software and smart contracts that protect data from being censored, monitored or have downtime. Again, like Sia, Storj uses a peer to peer network to connect renters and hosts (farmers). The Storj system is based around renters and farmers periodically checking in with each other making sure they are both online. Storj uses the Storj token to facilitate their marketplace.

SAFE Network (Maidsafe)

The Secure Access For Everyone (SAFE) network is an open source, decentralized data storage and communications network using the spare computing resources of its users. Like Sia and Storj the SAFE network uses a coin, MaidSafeCoin, to create a marketplace of resources for its users.

The SAFE network stores data in nodes called vaults. They create human-readable addresses for the data to be accessed. The network splits the data into chunks with a maximum size of 1MB, and a datamap is created to point to each chunk of the data. All data is encrypted before uploading. The SAFE network can also be used to send secure messages between nodes.

The SAFE network has a unique system to keep vaults from abusing their control. Sia and Storj make the nodes behave by paying them as they fulfill the contract. The SAFE network allows vaults to work together to create a consensus on the state of each other’s data. The vaults are given random identifiers. Then they are grouped by their identifiers. These groups then work together to check one another’s data. To be malicious there must be greater than 50% dishonest nodes in one group, which is unlikely given the randomness of assigning groups. Maidsafe goes one step further; they implemented churn. Churn means that vaults are randomly swapped between groups, leaving a very short window for vaults to try and act maliciously.

Concluding Thoughts

Many of these projects are young and still have a way to go before you can use all the features that they promise. However, all of the projects show that there are important use cases, as well as a market need for decentralized storage. As many things do, the internet grew up and was corrupted by the environment around it. Large internet-based companies have started to use our data against us, taking the power away from the user. Decentralized storage is a push to recover the decentralized spread and storage of information throughout the world.

Sources:

https://medium.com/wolverineblockchain/what-is-ipfs-b83277597da5

https://medium.com/bitfwd/what-is-decentralised-storage-ipfs-filecoin-sia-storj-swarm-5509e476995f

https://blockapps.net/blockchain-disrupt-data-storage/

https://www.investinblockchain.com/decentralized-cloud-storage-platforms/