An Introduction to “The Streaming Problem”: A Case for Decentralized Storage

Published in

iridium

9 min readSep 22, 2021

This is intended to be a (mostly) non-technical introduction to the streaming problem, defining centralization and decentralization, decentralized storage and the Iris project. At the end of this article you should have a rudimentary understanding of what a centralized system, a decentralized system, and decentralized storage are and how they relate to one another.

Before discussing what the streaming problem is, let’s define a few terms:

Centralized System

A centralized system is any system where data flows through a central authority. The central authority may perform validations and/or provide services, usually for some type of fee. The central authority is also responsible for storing user data, of which it is generally the sole owner and steward.

Examples of a centralized system include many of the institutions in the real world, such as banks, schools, most governing bodies, corporations, and platforms such as Facebook and YouTube.

A generalized centralized storage system

Above is a high-level diagram of how users interact in a centralized system. As a concrete example, if the central authority were a video hosting platform, then this flow could represent a content creator adding a video to the platform and the other user watching the video.

In more general terms, a user uploads data to an application and it is stored by the platform provider (in a database). Another user then interacts with the platform provider in order to retrieve a representation of the data, which is fetched from storage and delivered to the user by the central authority.

Decentralized System

A decentralized system is a system where there is no central authority responsible for providing services or storing data. It functions as a network of individual nodes each responsible for supporting the network in some way.

A node might support a network by executing some code, or storing a file, or simply being online. This can really be manifested in any number of ways.

Examples of decentralized systems are: Blockchains, IPFS, and in the real world: ant colonies.

The streaming problem

When streaming services first began to gain traction, they promised an escape from cable television. For a fee, they allow you to access a curated collection of content. Free from advertisements and scheduling constraints, consumers gained the capability to choose when and where to experience content.

But now, there are just so many streaming services. Each with their own subscription and exclusive rights to content.

https://images.indianexpress.com/2019/07/streaming_gettyimages.jpg

As the number of streaming services grows, the overlap between them decreases, requiring a consumer to pay for a subscription to multiple services in order to watch a slice of content between them (e.g. how many people subscribe to Peacock only to watch “The Office”?). As of 2020, 82% of consumer subscribe to 4 streaming services. Many streaming services’ paid tiers still display advertisements throughout the viewing experience. If these trends continue, it is likely that the dream of an “Escape from Cable TV” will result in a more subscriptions, more costly services, or consumers limiting themselves to a small set of subscriptions. If these trends continue, we are no better off than the days before the advent of streaming.

Blockchains to the Rescue!

We’ll get back to the streaming problem in a second. Let’s quickly discuss blockchains.

A blockchain is a decentralized system where nodes agree on the current state in a network. A widely known example is cryptocurrency, such as Bitcoin, where nodes agree on how many tokens are held by each node. This state-management can be extended beyond currencies to more general states, like mapping usernames to node address, or the current state of a game of chess. Some blockchains that allow for a more general state-management are Ethereum and Substrate, and many others that allows for ‘smart contracts’.

Technical note: This agreement between nodes is called consensus, and the process of reaching consensus is controlled by a consensus algorithm shared by each node.

There are a plethora of consensus algorithms used by many different chains, with new algorithms being developed constantly.

Side note: When you hear claims that blockchain technology is bad for the environment, they are generally speaking of proof-of-work blockchains (such as Bitcoin). Many other consensus algorithms require significantly less computing power and do not have such an ecological impact.

https://cryptoslate.com/blockchain-consensus-algorithms/

On top of all nodes agreeing on the current state, the current state also depends on the previous state, which depends on the previous, and so on. This chaining is where the term block “chain” comes from.

This forms a cryptographically verifiable trail of transactions that any node in the network can verify.

To put this otherwise: consensus algorithms are optimized to make it really difficult for data in the network to be modified. If the blockchain says I sent you 10 coins, then, depending on the consensus algorithm used, another node would have to control 51% of the network to change that value from a 10 to 100. It gets really expensive to do, which is why it’s hard.

So how does this help us?

Blockchains allow developers to create decentralized applications, or dapps. In this paradigm, application users each have their own ‘wallet’ with an ‘address’. This address is their unique public identifier that others can use to find them within the network. Think of the wallet as just some software that let’s the user store tokens and other assets. Dapps let users interact directly with each other (via the blockchain’s distributed ledger agreed upon by consensus and its p2p networking layer to send data between peers). This lets us completely remove the need for a third party to facilitate the application’s functionality! There is no longer a need for a hosted database or server (though they can still be useful in some cases).

The ‘public identifier’ is really the public key of a key pair unique to a node. Each node has a public key and a private key. Generation of this key pair varies between blockchains.

Today, the popularity of decentralized applications is growing, with increasingly widespread adoption of NFTs as stores of value, decentralized finance (crowdsourcing, funding, loans, staking, etc.), gaming, gambling, exchanges (exchanging one cryptocurrency for another), and more. Within the next few years it can be assumed that the adoption of dApps will grow exponentially.

However, blockchains by themselves have a major missing piece: storage. Storage in a blockchain is severely limited. To store data in a blockchain it must be stored within a transaction, which each node has a copy of. This could quickly lead to bloat within the blockchain. In addition, this means that any data added is public, since all data “on-chain” is public.

DECENTRALIZED STORAGE

Decentralized storage is a system where many individual nodes provide storage capacity to a network to allow other nodes in the system to interact with this storage and to utilize it.

Specifically, we will focus on a single decentralized storage solution, IPFS.

Without going into details on how it works, IPFS is a decentralized, content-addressed network for storing and retrieving data. When data is added to IPFS it is associated with a unique identifier, called a CID (meaning when two identical files are added there’s still only one CID). The CID can then be used by other nodes to identify and access the data. This is incredibly useful, since now we can use a CID in a transaction and not the actual file bytes, meaning we can have static transaction sizes!

Due to the nature of IPFS, simply adding your data to the network is not enough in most cases. There is no instant replication of data across the network. In order to ensure it’s availability, you must ‘pin’ the content to your node and keep your node running. As an alternative, you can use a pinning service, such as Pinata, but then you do not have a decentralized system anymore. IPFS does not natively provide any incentive to provide storage. Furthermore, all data added to an IPFS node is available to all nodes in the network, so private data is only enabled with encryption. In essence, there are the following four categories that must be addressed:

Indexability: IPFS is not a search engine and is not intended to act as one. There is not an easy way to identify or track data in IPFS without a CID.

2. Security and Privacy: When data is hosted in a decentralized network it is available to any node in the network. There are no permissions and no encryption by default.

3. Availability and Replication: Availability of any CID in the network is not guaranteed and approaches to ensuring it is are not necessarily decentralized.

4. Governance: There are no native governance options and no simple way to purge content from all nodes in the network. IPFS provides a customizable blockList, but attempting to “revoke” data from other nodes is fundamentally not doable. If the IPFS network is public and fully decentralized, then this could allow for malicious content to propagate though the network with no means of purging it.

So how does blockchain technology help to solve the “streaming” problem?

The Iris Project

Disclaimer: Iris is still in a very early stage of development and details discussed below are subject to change in future iterations. There are several already identified issues present in this implementation that will be addressed in future publications.

Iris is a blockchain (or possibly an ecosystem of substrate-based blockchains) that aims to provide tools for content creators and owners to securely store, host, share, and sell their data to consumers, who will be able to directly support creators without a third party intermediary. This will give users (either individuals or even studios/production companies) the ability to add their data to the network and to be purchased and accessed by other users without the requirement of paying for a subscription to an entire streaming service. We aim to do this by leveraging IPFS as storage and Substrate to solve the four issues outline above.

The approach taken today by many is to simply add data to IPFS (probably through a pinning service) and using the resulting CID as values in a transaction (probably in a smart contract). This is illustrated by the diagram below (where the user adds a file to IPFS and submits a transaction containing the CID using Ethereum):

Usage when blockchain and storage are separate

Using https://github.com/rs-ipfs/substrate as a basis, Iris aims to act as an intermediary between the user and an IPFS node, allowing for additional logic and validations that do not exist in IPFS. As opposed to the depiction above, Iris first sends commands to the blockchain, where input data can be verified and logic applied, such as building models for data ownership and access controls. Commands are added to a queue, which may be executed by any node running an IPFS node, and the results published on chain.

Usage when blockchain and storage are combined

Recall that a blockchain is really a state-management system that all nodes agree on by some consensus. In our blockchain, we can construct a state so as to achieve much of the functionality of centralized content hosting and delivery systems but in a totally decentralized way. With the IPFS integration, we can define ownership and access controls to content as well as provide a means to monetize both content added to the network and storage capacity provided to the network.

Work on Iris is underway at: https://github.com/mystery-team/.