Building the missing protocol of today’s internet stack: a decentralized pub/sub network for realtime data
With the Streamr Marketplace recently launched at Consensus, our dev team’s next strategic focus will be to build the second iteration of the Streamr Network, as announced in Transmission #7. The Network was also one of the key topics at our recent retreat. This post will explain why this layer is so important to our overall goal of decentralizing realtime data, as well as set out the roadmap (and challenges ahead) for implementing it.
What is the Streamr Network?
The Streamr Network is a scalable realtime messaging system, which enables applications and devices such as IoT sensors, connected cars, and basically all “smart” gadgets to make available the data they are producing, as well as listen to incoming data from other applications and devices.
The Network employs the publish/subscribe pattern. Messages produced to a stream (sometimes called a topic) get delivered to all subscribers listening on that stream in realtime. It’s a bit like an instant messenger for machines, which — in IM lingo — supports “group chats” (many-to-many), “channels” (one-to-many), as well as “private” (one-to-one) messaging patterns.
Such a publish/subscribe network has the following great properties for application developers:
- Data producers simply “fire and forget”. No need to set up APIs, data silos, or open ports that could compromise security. Just send new data points to the Network and you’re done. This makes integration very easy.
- Data consumers simply listen. They don’t need to know where the data sources are, which IP address to reach them on, or how to interface with them. They don’t need to open server ports either. They just connect to the network, subscribe to what they need, and react to incoming messages.
Examples of applications made possible and powered by the Network are our very own Editor and prizewinning Marketplace, but you can of course use it in your own applications via the API or client libraries. (JS available, with Java, Go, and Python libraries in the works — If you’d like to help by working on a client library, please get in touch).
Doesn’t such a protocol exist already?
Well, not really — such a messaging pattern is not provided on a global scale on the TCP/IP stack, the foundational protocol suite of the internet. Currently, developers of applications such as instant messengers, financial market data delivery, multiplayer games, or data-driven IoT applications all need to either roll their own infrastructure or resort to centralized cloud services providing the needed messaging and storage functionality.
We believe Streamr’s Network is not only useful, it is absolutely essential. We believe it is the missing data protocol of the internet.
The consequence of rolling your own infrastructure is that developers must bear incredibly high costs to put together even the simplest applications. This means less competition and less creativity. The consequence of using centralized cloud services is that most of the data in the world ends up passing through a handful of internet giants, granting them unsafe amounts of power.
So how has this structural deficiency in a widely-used pattern remained unresolved for at least two decades? The reason is that existing broadcast protocols such as IP multicast have lacked the economic incentives for internet service providers and big commercial players to support it. In addition, these protocols have a hard time scaling to potentially millions of recipients due to technical limitations.
And this structural deficit is only going to become more pronounced because the demand for a realtime data economy is about to explode. The number of devices connected to the internet is on track to reach hundreds of billions, each one producing data. All that information needs to be securely transported, stored, and potentially made available to others to create a data-driven value ecosystem.
From this perspective, we believe Streamr’s Network is not only useful, it is absolutely essential. We believe it is the missing data protocol of the internet.
How does Streamr solve the problems of scalability and economic incentives?
Streamr solves the scalability issue of broadcast messaging protocols by using a peer-to-peer (P2P) network architecture. Technically, you might think of it as BitTorrent for realtime data streams instead of static files. The throughput capacity of the network scales linearly as the number of nodes in the network increases.
Streamr solves the economic incentives problem by using cryptocurrency, cryptographic proofs, and game theory to incentivize people and organisations to run nodes (called Brokers) in the network. Streamr DATAcoin, or DATA, is the cryptographic token representing value in the Streamr ecosystem, implemented as an ERC-20 token on the Ethereum blockchain. The companion blockchain is used for value settlement, identity, and permission control, while the data itself stays in the scalable Streamr Network.
Broker nodes contribute bandwidth and storage to the network, and earn DATA paid as usage fees by users of the Network. In a sense, running a Broker node is somewhat comparable to mining, but instead of solving an CPU/GPU-bound artificial problem and wasting energy, nodes provide useful network resources which, using the Streamr protocol, collectively produce the message transport service available to applications.
Together, the P2P structure and incentive mechanism enable decentralization. This means that the network can operate without any central party, not even us, controlling your data or generating value out of it.
This conforms to Streamr’s overall goal: to ensure that the world’s realtime data is owned and controlled by those who produce it.
Some decentralized publish/subscribe protocols exist today, such as Whisper in the Ethereum stack and IPFS pubsub in the IPFS stack, but they are very simple and experimental, and not geared towards enabling the massive machine data economy.
What is the Streamr Network today, and what is the roadmap?
Obviously, the first version of the Streamr Network is already up and running. You can see it powering the Marketplace, Editor, and user-created applications today. The Network currently handles some tens of millions of data points daily. The existing system already provides publish/subscribe messaging, along with an API for applications to hook onto. This functionality is close to what we want to accomplish in the long run — however, big under the hood changes are required to reach decentralization.
Below, I’ll detail where we are now, and what we currently envision are major future releases of the Network will be. In general we are following a principle of incremental decentralization, which allows us to maintain a fully functional platform at all times, with new releases increasingly giving up control over the Network.
We’re naming our Network milestones after our favourite jazz pianists: Monk, Corea, Brubeck, and Tatum.
Monk (current version)
The Network today leverages open source big data frameworks such as Apache Kafka and Apache Cassandra. While distributed and scalable, these frameworks are not building blocks for a decentralized system, because they rely on all nodes being trusted. A decentralized system must be designed to tolerate untrusted and potentially malicious actors.
In general, Monk is missing the following three main components, which will enable the Network to decentralize:
- P2P networking: The network topology and message flows must happen without central servers being in control.
- End-to-end encryption: The data will pass through untrusted nodes in a decentralized network, meaning that all non-public data needs to be end-to-end encrypted to guarantee data privacy.
- Incentivization scheme: The users running nodes will need to be rewarded for doing so. In contrast, unwanted behaviour must be disincentivized.
The following three major releases will each focus on implementing one of the above three key features.
The current Monk network will be replaced with one that implements a P2P network topology and extends the Streamr protocol used in Monk. In the Corea network, all nodes will still be run by us, as the necessary encryption and incentive mechanisms are not yet in place. However, this update gets the Network architecture ready for the path towards decentralization.
So far we’ve done research and built a few prototypes to decide which paths to take and technologies to build on. We have settled on using libp2p to get some potentially useful features such as DHTs and NAT traversal out of the box. We decided to write the node in JS, with the main contender being Go. We might switch the language or create a parallel implementation later on, if Go turns out to bring performance benefits or offer more mature libraries. For the time being, JS speeds up development, because much of the protocol and API code can be reused from the Monk network data API.
Internally we’re expecting to make the switch to Corea in production in 6–9 months from now.
This release adds end-to-end encryption and key distribution mechanisms. With that in place, trusted parties (for example, our partners or select community members) can start running nodes in a production network. At this stage, no incentive mechanisms are yet in place to discourage bad behavior, which is why the nodes need to be trusted. End-to-end encryption provides data privacy, allowing control over the Network to decentralize for the first time, even though it is not fully trustless yet.
At this advanced stage anyone can participate in running the Network, including people with dishonest intentions. This release adds the node incentive mechanisms (earning DATA for behaving well, losing DATA for misbehaviour) to guarantee that the network will decentralize further and continue operation for as long as there is demand for it, completely independent of its creators.
The update from Brubeck to Tatum will be the most difficult and time-consuming one. By the time we start working on Tatum, we’ll have more firepower due to some obstacles in the space being already solved by other infrastructure projects. Examples of exciting developments include Plasma for side-chain scaling, Tendermint for BFT consensus, and several inter-blockchain protocols slowly making their way towards production-readiness.
We are constantly on the lookout for technologies we can apply to better reach our goals. As a general rule of thumb, we aim to support, work together with, and build on top of existing best-of-breed technologies and ecosystems such as Ethereum.
So all this is exciting stuff which should absolutely appeal to those with a love of software development and architecture, a passion for decentralization, or just big data and IoT. The next three quarters are going to be very interesting on this front so stay tuned for more updates. And in the meantime, if you are not already doing so, why not start using the current Network as well as the Marketplace and Editor apps today?