Mo Siam
Zus Network
Published in
3 min readDec 31, 2018

--

ON X OFF CHAIN DATA

With the advent of smart contracts and data transactions, storage on the blockchain is becoming the next frontier in the development of blockchain technology. Storage of the blockchain state is one frontier being addressed by sharding technology, but another form of storage is structured and unstructured external user data.

Structured data such as JSON is typically 100 bytes in size per line on average and thus a network supporting 1000 transactions per second, has a data throughput of approximately 1MB per second for a typical 10 lines of JSON structured data or equivalently an 8Mbps bandwidth. This kind of bandwidth is actually right on the global average bandwidth of 7Mbps taken over 200 different countries. However, one should note that this network bandwidth is shared between data transactions and value transactions and thus the actual network bandwidth available for data is less than that.

Aside from bandwidth, nodes on the network have to store all the transactions; and thus at the maximum data throughput, the network is accumulating 30TB of data a year that has to be stored by each node. This is still manageable, costing each node about $1500 a year per those 30TB in terms of on disk storage if owned and about x12 that if hosted, probably less with cheaper drives coming to the market and eventual realization of state sharding in the future. Today this is not a problem yet, as for example Ethereum’s blockchain is about 1TB in size today, costing a locally setup node (in terms of storage only) about $50 a year and a hosted one about $600 a year. That said, this architecture clearly has scalability limits, both technically and commercially, as nodes not only need to hold all this data on disk, but need to store the full state in RAM and have computational power to process and validate transactions let alone mine new blocks.

Whilst nodes can opt to be miners as well, where one miner at a time gets compensated via newly mented tokens as a reward for successfully mining a new block along with the transactions’ fees paid for by the users whose transactions got included in the mined block, the question remains who is compensating the unsuccessful mining nodes, the ones who weren’t successful in mining the block but still store all the transactions, which include the user data, on disk?, does the reward for the successful nodes compenssate the ongoing capex? and what about those nodes that are not mining to begin with and are there to support the ecosystem via ancillary services? — Whilst we can go into the details of networks like ethereum etc, most share the same common theme, the rewards are tightly coupled to the network’s consensus mechanism and thus, as a corollary, it follows that it’s either very costly for users to store data, or service providers are getting short changed e.g. non mining nodes or unsuccessful miners, or most commonly a combination of both.

With the current status of development in the space, the prevalent network architectures today are clearly not intrinsically scalable due to either technical limitations or diverging commercial incentives between the users and the nodes, or both. Given these scaling difficulties and the drive of humans to innovate, data hashing amongst other solutions came to the rescue, some as adhoc solutions, e.g. hashing, to realize blockchain data storage — but what does it mean holistically?, is data still on chain? — is data meant to be stored on chain to begin with? And if so, how can all parties come out on top, if at all?

Stay tuned as we dive into this maze of exciting new storage technologies building atop of blockchain!

To be continued …

--

--