Storage unchained: Storing IoT data in Web 3.0
In IoT, the data collected directly from the Sensors/Devices are called Raw Data. After processing the raw data to extract business-relevant information, it will be treated as Processed Data that the system will store for an extended period.
As part of the series about IoT in Web 3.0, we have covered so far.
The upcoming discussion will focus on Storage. You can build a stateless distributed system and scale it on Multi-zone or Multi-Region infrastructure. Still, things start to get complicated when it comes to storing data.
CAP Theorem
A quick detour on CAP Theorem to set the expectation, there is no plan to bend the law of physics and math here, A distributed-storage solution that can scale on a regional scale. They will still comply with CAP Theorem.
In Blockchain, according to this article, you choose between AP or CP. Let me quote this section from the article conclusion.
The real point is that depending on how one configures one’s clients, one can choose — in Bitcoin in particular — and blockchains in general if the entire system will be AP or CP.
Data Gravity in Cloud provider.
CAP theorem is a theory, practically having a centralized storage solution will always sabotage availability unless you choose to go multi-cloud and implement a proper cross-cloud and multi-region replication layer.
Moving data across Cloud providers and regions will add a significant Egress traffic Cost. Most Cloud Providers have Zero Ingress Costs but relatively High Egress Costs, which also can add additional technical challenges if the API is not compatible. This problem is insignificant, assuming Local Storage will be NFS or Unix, and Object Storage will be S3 Compitable.
Decentralized Storage.
Web 3.0 focuses on decentralization; Blockchain is one of the core backbones of Web 3.0 technology. It stores data and system state, but it is not as good as holding a large amount of data. The cost becomes unreasonable in the case of Ethereum.
I will distinguish two categories of data in Web 3.0.
- On-chain data: Data stored on the Blockchain, usually metadata, hashes, or valuable data that the owner is willing to pay the cost of keeping forever, in that case, Arweave, can become a more practical option than Ethereum or any other Ethereum inspired Chains.
- Off-chain data: Here, things become a little wild west. There are a lot of projects and protocols, Sia, Storj, SCPrime, FileCoin, IPFS, and many others.
A Dive into off-chain data Storage options
Not all generated Data is business-relevant data. Off-chain Data is a more practical approach, especially when dealing with IoT Raw Data. Also, users may want to delete their data for a regulatory reason, such as GDPR; having permanent data on-chain for a lifetime is a niche use-case, such as NFT, artwork, or blockchain data itself.
Decentralized off-chain solutions use a distributed protocol. It assures the data resiliency by replicating the file on multiple nodes like in Cluster IPFS or using Erasure Coding.
Erasure Coding implements an old trusty erasure coding algorithm used in CDROM and DVD for data correction and is picked by most tokenized Storj, FileCoin, and Sia. Some protocols combine both replications and Erasure Coding.
A protocol such as Storj provides an S3 gateway either hosted or deployed on your own to offer a drop-in replacement to Cloud Object Storage.
Comparing those storage options is out of the scope of this article, but I will focus only on IPFS.
Interplanetary File System or IPFS for short
IPFS is not Blockchain, but it still shares a lot of aspects and technologies used in Blockchain.
Immutable: It uses Hashes, similar to Bitcoin using Merkel Tree IPFS adopt Merkel DAG, in case of Bitcoin uses Merkel Tree to hash each transaction and go through the tree up to hash entire block, in IPFS user provides the file then IPFS splits it into chunks and hashes each Chunk in multi-level Merkel tree.
Deterministic: Same data generates the same Hash. In both cases, the Hash is not the file itself or the block in the case of Bitcoin. The Hash is the root node of the Merkel Tree.
Permissionless: Blockchain and IPFS are permissionless. Any node can participate in the network.
Peer to Peer: No central server or governance organization controls the service and works directly on top of internet TCP/IP.
Despite the similarities in technology, they have fundamentally different goals. IPFS's primary goal is to store relatively large amounts of data and do not intend to store it for long. Nodes have a garbage collection mechanism that cleans up unused data unless you PIN this Data on your node or use a Pinning service such as Filecoin.
IPFS to store IoT raw data
Using IPFS is relatively simple, is mainly executing this command.
ipfs add <filename>
or programmatically, for example, in node.js
import * as IPFS from 'ipfs-core'
const ipfs = await IPFS.create()
const { cid } = await ipfs.add('Hello world')
console.info(cid)
The developer will add sensor telemetry to IPFS, and IPFS will generate the cid to be addressable content.
but each entry will have its cid for practical data processing, and the processor or data scientist needs a way to access all those information; an alternative approach is to use IPNS, which is an additional addressable layer on top of IPFS that gives the developer the ability to use IPFS to store mutable data storage, so instead of creating a file for each entry, append the entry on one file and publish it to IPNS
// The address of your files.
const addr = '/ipfs/QmbezGequPwcsWo8UL4wDF6a8hYwM1hmbzYv2mnKkEWaUp'
ipfs.name.publish(addr).then(function (res) {
console.log(`https://gateway.ipfs.io/ipns/${res.name}`)
})
Those approaches should work, but they still sound a bit clumsy and hacky. To get the DB experience, another project on top of IPFS that brings DB to a decentralized Peer-to-Peer System is OrbitDB.
OrbitDB: Peer-to-Peer Databases for the Decentralized Web
As per OrbitDB's official website, The website describes OrbitDB as follows.
OrbitDB is a serverless, distributed, peer-to-peer database. OrbitDB uses IPFS as its data storage and IPFS Pubsub to automatically sync databases with peers. It's' an eventually consistent database that uses CRDTs for conflict-free database merges, making OrbitDB an excellent choice for decentralized apps (dApps), blockchain applications, and offline-first web applications.
We are also using IPFS-Pubsub for messaging modules. We can have a straightforward uniform design. The edge node for the Decentralized IoT System will consist of the following.
- IPFS daemon: Will serve as P2P connectivity layer.
- MQTT Broker: Enable the Edge node to publish and subscribe messages
- MQTT-IPFS Bridge: distribute the events over the P2P network.
- OrbitDB: will connect to the P2P network over the IPFS daemon.
Conclusion:
Web 3.0 pursues decentralization. It does not mean everything must be on Blockchain, IPFS, and similar technologies such as BitTorrent provides a decentralization for fundamentally different use cases.
IPFS looks at the Internet in Web 3.0 as no longer the Cloud of an interconnected cluster of devices. It is leaning toward a Forest of Merkle Trees, The trees look independent, but their roots crawl over each other through the DHT P2P discovery algorithm.