Public vs Private, Permissionless vs Permissioned, and the recognition of data (a matter of contexts)

The following post includes some philosophical and architectural reflections and thoughts on the current state of blockchain technology and does not represent other than my personal point of view.

Not enough information is brought to life to better understand, from a research perspective, what is it really about this separation between the consumer and the enterprise side of blockchains, and why some people, I included, don’t believe in the utopia of a single data store.

Since always, it’s made sense for data to be stored in single repositories when there was a need to enforce something, for examples constraints.

This is one of the reasons on why it makes so much sense that the information of transactions of a token (e.g.: bitcoin) is stored in “shared” database (namely the ledger in most of the cases) between all the interested parties.

The new transaction depends on the previous one, and the next one depends on that recently created transaction. So, without the whole history, ownership of those tokens can’t be assured to the stakeholders.

But when we talk about more complex information, like supply chain data, medical records, or even customer data (KYC), it gets really hard to follow the same pattern as with tokens, and the world is trying sometimes to fit everything on that same model.

Why information about your medical record has to be stored in the same nodes and “ledgers” as the information on international coffee prices? In software development, it has never made sense to create unnecessary dependencies, and we usually try to look for parallel processing, distributed storage, and componentized solutions. But today, the “one blockchain to rule them all” idea is missing some of the learnings we’ve had over the years.

Why, information about a local community in the Amazons, has to be replicated to every single node in the world connected to a blockchain implementation?

If that tokenization of value exchange, for instance, makes sense for people on the other side of the world, for sure it is necessary to use a single blockchain for that, but what if it doesn’t? What if the recognition of that data representation of something (a token) only makes sense to the country, like Brazil, and just a tiny piece of information needs to be derived to a global set of stakeholders (like the Paris Agreement countries to account for sustainable transactions in the food production and consumption chain)?

Public vs private blockchains can become a heated topic between some folks that believe that it makes no sense to have a blockchain that is not public. Just like the believers of the idea that identities as necessary versus the “pseudo-anonymous” believers.

I think that that differentiation makes no sense. It not about one or the other. For sure these are properties of different blockchain implementations, I can not argue with that. But I don’t think that is the right lens to see that architectural decision.

I believe that the decision over one or another property to be useful or not for a specific problem should be addressed from the following question:

To whom the veracity of this data makes sense, and what dependencies does this data have?

Another important matter is that a permissioned blockchain doesn’t necessarily mean no crypto-economy, as permissionless doesn’t necessarily mean crypto-economy.

Having a single source of unrelated data have so many problems that I don’t think we will ever be able to “fix” its challenges without conceptually changing it all.

It is well known that problems around “public” blockchains surround every development on top of it. From the size of the single “ledger” to processing dependencies due to the “shared” resource pool with unrelated data (incapability of parallel processing).

The idea of unstoppable, uncensorable, and decentralized applications is so sexy, that I think it is the main reason on why so many people find a fix on this to plenty of the modern world problems we face today (distribution of power for instance).

I think this is something worth researching and experimenting about. The separation of the service provider (the nodes and miners) from the ownership of data (the smart contracts and cryptography) certainly fix a lot of problems with the big digital players that today control and centralized the internet. It’s like having a completely neutral cloud provider (brought by virtually anybody in the world).

Going back to the matter of unnecessary dependencies and sharing of contexts, some relevant people in the area have already brought out this topic to the table by mentioning things like logical centralization of data, through decentralization of architecture and political matters. That means that data a public blockchain (for instance) is politically decentralized (no single party can make unilateral decisions, for example), architecturally decentralized (no single source of computing power runs everything, like a honeypot of data or a single point of failure), but logically centralized (all the data resides on the same Terabyte of data that everybody needs to copy to have a full node). I think the next step is to bring decentralization to the logically centralized part of blockchains.

The data contexts lens brings plenty of benefits while reducing the technical challenges we don’t really know how to solve. This means that while more blockchain technologies appear (source code), different implementations (like Ethereum for instance), interoperability protocols arise, and a way to blindly testify events in shared world records will let us scale while keeping the easiness of deployment of data and smart contract systems. This vision needs some root coordinating chains.

A big advantage of the early designs of blockchain is that data integrity is easily proved through chains of hashes, making it a possibility to just store small proves of a combination of data, and later check if it was not altered.

What I think on this matter, rather than having a “single blockchain” containing smart contracts’ data that is completely unrelated, therefore unnecessarily linked waiting for others’ verification, we could have multiple ledgers linking between each other through latest verified hash each certain time.

So, storage and processing of data can be distributed amongst nodes of different blockchain implementations (even private ones), increasing throughput while keeping security and veracity of data.

It is kind of a mixture of what the Ethereum core developers have proposed regarding Sharding (took from traditional sharding of databases) and what Hyperledger Fabric has enabled through what is called Channels (the actual “ledgers”) but taking it to a multi-blockchain world instead of looking just for better processing of a single blockchain implementation.

If we take that idea as a standard between any blockchain that wants to prove its integrity, by storing only integrity checks from other blockchains with this standard, it can prove its integrity the other way around, moving from the differentiation of public and private blockchains, to having a crossed referenced set of logical storage, processing, and validators.

In this way, having “private” blockchains or multiple blockchains instead of just one (e.g.: Ethereum) makes so much sense. More blockchains mean more security, but standards need to arise, and implementations for things like Plasma should go for other networks as well. Having mature governance models can also speed up and benefit a model like this one. This approach also allows for the confidentiality of data, while for the general public or non-nodes participants (users, client applications) it is possible to see if certain blockchain’s data is really valid.

This is important on this conversation, since not only due to scalability issues multiple separate but cross-referenced blockchains make sense, it’s also about the two previous question I brought. To whom the veracity of this data makes sense, and what dependencies does this data have?

Designing with this on mind, bringing a lens of contexts, and thinking of ways of taking advantage of other blockchain implementations, the separation of public or private, permissionless or permissioned, can be a matter of system requirements rather than philosophies or security sacrifice, and having a more intelligent partitioning of data in the context of blockchain, plus logical decentralization.