Extending the Tangle network to a distributed cloud

Some thoughts on the technical and social architecture of a machine-economy.

Pritam Roy
The Startup
12 min readFeb 17, 2018

--

What is needed for the future to happen? [ImgSource]

The idea of data-markets backed by digital currencies has been floating around for some time now. This article talks about the software infrastructure needed to build such data-markets and what such markets might look like using the IOTA protocol as the hypothetical “currency” for data-exchange.

Throughout this article we can theorize IOTA as a handshake-protocol and as an immutable ledger of machine-to-machine transactions rather than as a platform to run smart-contracts on or as a speculative currency.

The currency to run a machine-economy should have the following features —
Support micro-transactions.
Single machine transactions may be orders of magnitude lower than typical human transactions and much higher in volume, ideally the network fees should be very close to zero to realize micro-transactions at any scale.
Support transactions-at-scale.
Machine transactions will be much more frequent and will quickly out-number human transactions when introduced into any market, for an example check the order book at any trading index with a public API.
Provide Security to the network.
The cryptographic hash generated through an IOTA handshake can then be used to secure the data. The transaction protocol will be in some sense the backbone of such a network and also be the security blanket.

When such a network is extended to include high-powered cloud based nodes running specialized storage/processing software, we can enable a new ecosystem of real time decision-making through machine intelligence, leveraging the localized decision making of the network combined with the collective wisdom of the cloud.

An incentives based society rewarding good actors in real-time.

The future of governance will likely be incentive based..

The future of governance will perhaps be about creating economic incentives for actors to behave responsibly. If you drive within the limits, you’ll be rewarded by the system in micro-payments, your insurance rates will go down in real-time ..eventually the only actors who oppose the system and refuse to share their data will be the bad actors, game theory and market equilibrium will propel society into this machine economy.

But wait.. what do we need the cloud for?

Or isn’t IOTA supposed to drive perfect networked systems?

Ok, Let’s imagine a perfect Fog network where every machine is given equal weight(computing power) and workloads are evenly distributed amongst all nodes.
But.. that’s not how it works in the real world !!
Exactly.. your Mac has more computing power that your fridge and your Mac has less computing power than a c3.4xlarge machine sitting in an AWS data-center.

Phones usually don’t want to run computation heavy workloads nor do you want to download the software on them to do so.

Even in a Fog scenario, certain kinds of computations will be more suited to a certain class of machines than others and hence even in the perfect fog clouds will emerge..

The Cloud in this context is a node in the network with a reasonable amount of storage to run a data-storage process and processing power to run a compute cluster process and is assumed to be sitting in a given “Infrastructure as a service”. The Cloud machine also has it’s own wallet and is usually running a full node and performing micro-transactions at scale while performing distributed storage and analytics processes on the data.

In this future, where the concept of ownership is made obsolete, the devices we use and sensors in the public domain will essentially become fleets run by corporations and corporations need to converge data from various sources. Which is why the swarm must interact with a a trustless cloud to run broader business decisions. Each node has it’s own view of the world and provides a unique insight which could also form the basis of deeper learning models.

This discussion attempts to delve a bit deeper than mere nomenclature and explore the possibilities that such data driven economies would create and the infrastructure suited to doing so, in discuss the possibility of building such an infrastructure using the SKY Stack, a distributed streaming-data software-only stack.

So what use-cases will this “machine-economy” enable?

IOTA as it stands today is a quaint cryptocurrency with not too many use-cases extending beyond Binance and Bitfinex, However in a future, combined usages by machines at scale can create networked intelligence which combined with fast-data technology, can enable massive data-markets which will dwarf the databases of today. The fact that your machines(you) will sell associated device data in real time can(will) enable cool(or creepy) use-cases like data-markets where your personal (device) data is being brought and sold.

In this universe people sell their data for a living

But why would anybody be interested in the pings from my apple-watch you ask..well it’s not just you but who you are at a given point.. your smart-building may want to know when you are around so that it can get a headcount and adjust building temperature, this will help it cut costs and it’ll pay your device for streaming pings.

The restaurant you visit may want to know if they have seen you before so that can provide recommendations, order supplies in real-time etc.. It is your situational data, combined will that of others in the network which form complex relationships at a given time that can help a business make a decision or reach a goal and they’ll pay your device for it.. in a constant stream of usage based micro-payments where nothing is lost to transaction fees.

And what about the software stack?

The requirements for a software stack to handle the scale of connected cities will be massively parallelized, distributed real-time computing and storage.

  1. Data streams must be cryptographically secure. This is achieved through a proof of work type system where both the source and sink must perform some work (in this case approve two other transactions) to secure the network.
  2. The computation layer must be distributed and support acyclic data flow.To enable fast-analytics on a massive scale of public IOTA networks the computation layer should be memory-based and run optimized storage DAG based queries.
  3. The IOTA network propagates data through an implementation of the Gossip Protocol and is as such eventually-consistent. The data model needs to be strongly-consistent in order to be the source of truth for transactional use-cases, support Distributed ACID transactions and be able to handle a variety of workloads. You can read an excellent description of the CAP theorem here. There is also an interesting paper of learning eventual consistency through baseball. You can read an introduction to RAFT protocol here. You can also read, YugaByte specific implementation details here.
  4. The data and compute model needs to be trustless, massively distributed and scalable to hundreds of nodes and be able to support high-volume concurrent reads and writes.
  5. The data layer must support multiple APIs to enable different kinds of models most intuitive to the given machine-economy use-case. Business requirements based on the Tangle network may be most intuitively modeled through a Graph API. Requirements for massive parallel computing and real-time streaming analytics are more easily modeled in a denormalized SQL or CQL models when we don’t want to do expensive JOINS in real-time and duplication is acceptable. Requirements for extremely fast cache reads may be best modeled in a Redis like API.
Architecture of a machine-economy

How do we model the connected machines?

How do you model a public IOT fleet with hundreds of thousands of nodes each performing thousands of micro-transactions.. Is this a graph problem or a time-series problem? Use-cases modeling on the tangle graph can be most intuitively modeled over a graph API, other more data query centric views or time-series oriented views may require modeling over the CQL API. The sensor data emissions can also be modeled as a time-series problem as well. While some models are represented better in SQL others such as those doing fast-analytics on streaming data are better represented in columnar CQL models.

Data cannot exist in Silos
The IOTA chain is Partition tolerant and thus allows for offline transactions, In a massive public network, nodes could span over major public-clouds or on-prem datacenters. The data layer connecting IOTA nodes must be infrastructure agnostic. The network as a whole must be able to update it’s state of the world.

A bottom-up view of the architecture

The view from the Cloud

Apache Spark

What do Apache Spark and Tangle have in common? They both use Directed Acyclic graphs to perform their computations. A Spark network can perform analytic queries on the edge and stream the data to sinks or write back to the storage. This cleaned real-time analytics data can then be further streamed to clients via 3rd party data- markets which may perform a similar handshake on their side.

Spark Streaming models with the SKY stack.[Source]

Here’s what a typical SKY stack architecture looks like. A “network aware” Spark architecture can interface seamlessly with YugaByte storage engine via the Cassandra API.

YugaByte DB

YugaByte is a fast, strongly-consistent, distributed, scale-out database designed to handle ever growing data networks like that of the tangle.

Elastic data cluster with support for Redis and CQL APIs

It achieves consensus amongst nodes using a modified version of the RAFT protocol.

The design choice made is to give away availability in favor of consistency on failure occurrence while also limiting the loss of availability to only a few seconds till the new tablet leaders get elected. It supports Cassandra and Redis APIs out of the box and can be plugged into Graph APIs which allow for models that fit various streaming transactional-data use-cases with strong consistency guarantees. The storage engine beats open-source Cassandra in YCSB benchmarks and provides sub-ms latency in networked environments and works out of the box on AWS, GCP or OnPrem. You can see comparisons with other distributed SQL and NoSQL databases here.

JanusGraph

JanusGraph is a graph query layer for both OLTP and OLAP systems and works quite well with distributed scale out databases. It is an implementation of the Gremlin protocol, optimized for storing a large number of vertices. Graphs can provide powerful intuitive query models for relationships that naturally represent networks. You can read more about ecosystem integration with the storage layer here.

Apache Kafka

Apache Kafka is a distributed streaming platform and used as a data collector for the streaming devices. It is in this layer that the decryption of the data will be performed and the IOTA transactions will be made.

IOTA API

IOTA transactions can be used to cryptographically sign and verify the correctness of streaming data.
You can also read more about Masked Authentication Messaging protocol here. IOTA fits in the eco-system as it’s da-facto currency. The transaction will also generate an access key which will be used to cryptographically secure the streams.

Simulating a Machine Economy

Traffic Management in the cities of the future

Your Mustang’s wallet was just charged 1k MIOTA and here’s your ticket..

In a world-view where compliant vehicles have a GPS emitter and a multi-sig wallet. The city traffic system is monitoring all running vehicles in real-time. The system is sending IOTA to vehicle wallets in return for for heart-beats with a json payload over a fixed-period of time. This data can be used to form complex models in real-time and answer many analytical as well as general queries about the state of the world at any given point.

Premise

Each vehicle emission contains the co-ordinates of the vehicle and the time-stamp at which is was sent. It is collected into the Akka sink and is fed a YugaByte node. YugaByte can run Spark on the same nodes as the database.

Handshake

The handshake can be done via a protocol similar to Open Authentication protocols that we have today. Each transfer will create an access token and a refresh token, every X seconds the access token will expire and a new one will be generated with a micro-payment of n IOTA.

Data Models

Let’s create two simple CQL Data Models and show some powerful queries we can run. (Production time-series models may need to be partition and cluster aware, check out this tutorial to learn more)

// CQL can be used for transaction hash models and business models // based on device metadata use-cases.// A simple model of vehicle emissions by time-stamp
CREATE TABLE vehicle_emissions
(vehicle_id varchar,
event_ts timestamp,
vehicle_lat float,
vehicle_long float,
location text PRIMARY_KEY (vehicle_id, event_ts));
// Another model of vehicle emissions by street
CREATE TABLE vehicle_on_street_events
(vehicle_id varchar,
event_ts timestamp,
street_id varchar PRIMARY_KEY(event_ts, street_id));

Real-Time Intelligence

Once we have a stream of lat-longs emitted by a vehicle we can use it to approximate the Speed(velocity) in near real-time. We can do feed it to Spark Streaming combined with historical data reads .

The SKY stack which consists of Spark, Kafka and YugaByte can be used to model such use-cases.

Get me the speed of the car with uuid CA1234 over last 20 mins.

SELECT lat, long, location
FROM vehicle_emissions
WHERE vehicle_id='CA1234'
AND event_time > ’2026-06-03 07:01:00′
AND event_time < ’2026-06-03 07:21:00′;
// Between any two pings
// distance = √((x2-x1)^2 + (y2-y1)^2)
// velocity = distance / interval_between_pings

If your access patterns dictate that after this you do an operation like-

Get me the details of the vehicle with vehicle_id CA1234 then rather than doing an join on the application layer you update your CQL model to include the new data (Based on the premise that storage is cheap and it is ok to duplicate data as long as we can reduce seeks).

Alerting

Aggregated streaming location data can lead to powerful inferences.

SELECT lat, long, location
FROM vehicle_emissions
WHERE vehicle_id='CA1234'
AND event_time > ’2026-06-03 07:01:00′
AND event_time < now();
> Pipe result-set to inference-engine to compare vehicle movement patterns against normal vehicle movement patterns on the road.ALERT: Erratic Driving patterns observed on Highway 280. Possible DUI.

3rd party Data-Markets

  1. Insurance
    The system can pipe data to insurance providers who can make inferences in insurance claims such as car collision with speeds in time frames.
SELECT lat, long, location
FROM vehicle_emissions
WHERE vehicle_id='CA1234'
AND event_time > ’2026-06-07 09:01:00′
AND event_time < ’2026-06-07 09:04:00′;
// Calculate second-order derivative of rate of change(Deceleration) of vehicles between continuous heartbeats.
// Infer whether the car in-front suddenly braked to cause the car behind to bump into it.

2. Forensics
The system can pipe data to forensics and detectives who can then use it to investigate cases.

Give me the UUID of all cars which turned left from Dolores on Saturday 12th Nov between 1AM and 2 AM.

SELECT vehicle_uuid
FROM vehicle_on_street_events
WHERE street_id='CA4433'
AND event_time > ’2026-06-07 09:01:00′
AND event_time < ’2026-06-07 09:04:00′;

Also more “ETL Like” Queries

Give me the transactions made by all vehicles that went over 60 mph over the last 1 month.

Get me all the places that vehicle with uuid CA1234 has gone over the past one-month.

So where do we go from here?

Finally these musings are made possible through open-source projects that stroke our imagination and the devs working to make make them possible.
Check out these cool software integrations and see if you can come up with some interesting, futuristic, use-cases of your own. :)

YugaByte DB — Distributed source-of-truth database[CP Database with High Availability]. (Docs)

Apache Spark — Distributed compute cluster using DAGs to perform acyclic order computations.(Docs)

IOTA JS — JS implementation of the IOTA library.

Apache Kafka — Process, Store, Publish and Subscribe events in Real-Time.(Docs)

Interesting related reads —

Distributed ACID Transactions

Sample SKY stack integration

Masked Authentication Messaging

CAP Theorem explained by an original author of Cassandra.

I have also begun somewhat of an effort to actually model some simulations. The github can be found here.
Remember If you liked the story there’s 50 ways(claps) to show your appreciation . ;)

This story is published in The Startup, Medium’s largest entrepreneurship publication followed by 297,332+ people.

Subscribe to receive our top stories here.

--

--

Pritam Roy
The Startup

Activist-developer, CMU Alum, Founder/Developer at Kashmere Labs