Introducing BONSAI. Blockchainisation of NoSQL Server Authentication of Identity

In trying to find a solution to storing & querying large-scale data off-blockchain, Totem discovered that IPFS JSON files were not the answer. So we built a solution that is.

Chris J D'Costa
totem | live accounting
6 min readApr 24, 2020

--

Photo by Arnaud Berthomier

The Totem Live Accounting network is being developed on Parity’s Substrate blockchain framework, and the aim is to provide decentralised product that meets the objectives of a decentralised economy, i.e. doesn’t rely on a global software vendor to operate it.

In the current state of blockchain development, and the mix of available “decentralised” tools and resources there are some significant missing pieces — particularly if you are developing a user-facing decentralised software product. A good example of this is that there is no decentralised equivalent to a queryable NoSQL database, and we don’t see in the eco-system any scalable proposals on the horizon to solve this issue.

So what is the issue?

First, we should clarify why this is important in the context of decentralised applications that are scalable.

In traditional software, web applications and mobile applications, much of the data is stored in a central database, and this database is based in a remote location that your application connects to.

An authentication method is used so that only “authorised users” can place or update data in the database. In short this means the user has to sign-up and remember passwords in order to gain access or update their data.

Often this means that any company developing software needs to build an economic model balancing the cost of storing this data versus what the user is prepared to pay, or how much revenue can be raised from selling that data to cover those costs. Obviously a bad privacy proposition for most users.

Storage is therefore a premium and whilst developers can make optimisations (deciding which data should be retained, and which should be overwritten, using the minimal amount of space required to support the functionality of the app) the principle remains, that they must somehow cover their costs for supporting the database.

Supporting that database also means ensuring the servers are secure, and that the application itself does not open up security issues that may compromise the database.

In contrast, when decentralised applications developers are building blockchain applications, authentication is already taken care of and writing to the (blockchain) database has an economic model built in — it costs you transaction fees.

So why don’t we just store everything on chain?

The main reason why not is that anyone running your blockchain software should not also have to contend with storing everyone else’s data especially if that data fails the tests we set out below — because there is no obvious incentive for them to do so.

The store on-chain or store off-chain test

Consider a very simple data record. It could contain any or all of the following elements:

  • a description or text
  • an amount or quantity
  • a unit of measure for the quantity
  • a currency for the amount
  • a reference number
  • a name
  • a status for the record
  • a key that uniquely identifies the record

As a principle a blockchain developer should ask these questions:

  1. What (if anything) will change its state over time?
  2. If it does not change, does the data imply that an event took place?
  3. If the value is an integer will we need to perform calculations on this value or otherwise manipulate it?
  4. Is there a need to audit the changes in state that take place?

If your element fails it is likely not a candidate for storing in a blockchain, and could be stored off-chain in a database.

(Keen observers will immediately realise that inevitably following this analysis we are returning to the centralised model of data storage)

A secondary but equally problematic issue, is that data stored in a blockchain is not “queryable”. What do we mean by this? Think of the Google search box. You can type literally anything into it and you get results back. This is called “fuzzy search” and we all expect it from our applications these days.

The problem is that you cannot do this with blockchain storage. Unless you already know the data you are searching for — you cannot conduct “fuzzy string searches” to retrieve a set of results that vaguely match what you are looking for.

Due to this second problem, no decentralised application can function entirely on a blockchain.

So in a nutshell decentralised application development is hard because:

  1. You can’t store everything on-chain
  2. You need to be able to query the data (fuzzy text serach)
  3. You do not want to identify your users — it would be great if you could use a common (blockchain) method of authentication
  4. You need an economic model to fund the off-chain storage, to incentivise others to do the same
  5. You don’t want anyone to sell your data

Where is this all going?

Talking to developers in the Substrate community it’s quite obvious that whilst many of them are concentrating on the blockchain development, they have not yet addressed the UI or usability issues that full application development requires.

When we asked, “how do you store off-chain data” we had several common responses:

  • In JSON files on a server
  • In JSON files on IPFS

When asked “how do you query the data”, we got a completely unscalable answer: “we read the JSON file into memory, then we query it”.

Here’s where Totem stands on the issue: we have a single database of all the companies in a single country (4.7 million companies). That database is 8.5Gib. We estimate that with expansion to include Europe and the United States companies alone, that database will exceed 350Gib — it would probably break IPFS if you tried to stick it in there!

In short it isn’t practical, scalable or realistic to use IPFS and/or JSON files to provide any sort of supporting database functionality.

BONSAI is a significant solution to the problem.

We quickly realised that there is a problem to solve here.

What we needed was a NoSQL database that had a front end that rejected all attempts to add or update data, unless the data had been pre-authenticated. The method we chose for pre-authentication involved a Totem user sending a signed transaction to the blockchain which included a hash of the data that they wanted to store off-chain.

This is a very simple solution to the problem which addresses all of the qualities required for this to work:

  1. The blockchain transaction pre-authenticates the data associated with a particular address, because that address signs the request to store the hash, and other checks can be performed in the blockchain runtime for correctness.
  2. This data could be encrypted before it is hashed, meaning that it will be sent as encrypted data to be stored by the database, with only the user being able to decrypt it in their app. This prevents anyone from selling the data too.
  3. Crucially the database is agnostic as to who communicates this storage request: as long as the database can recreate the hash from the data it receives (encrypted or not), and it matches a hash that exists on chain, this is sufficient to conclude that the owner of the data has requested this insertion.
  4. The database can operate in a peer-to-peer fashion meaning that data be sent to it from peers in a decentralised network, not just users.
  5. The database can independently verify that the data it receives can be accepted or rejected by independently querying the blockchain in search of the hash.
  6. A peer-to-peer network of off-chain databases can be imagined and would be far more robust than a centralised equivalent.
  7. An economic model can be built around third-parties storing and sharing the data with peers. This model could be built into Totem or exist on another Substrate chain.

With these properties we believe that BONSAI meets the goals of a decentralised application storage solution that isn’t blockchain. i.e. off-chain queryable data storage, that can now be fully independent of centralised services.

See BONSAI in action!

Totem has already implemented BONSAI for our app and you can see it in action. Everything that is now stored off-chain requires that at least one blockchain transaction to pre-authenticated the data.

BONSAI protocol in progress at https://totem.live

If you are inerested in becoming part of the team developing a next-gen decentralised application (that isn’t DeFi) — then get in touch now. You can also start trying out the features we have deployed already, we would really love to have your feedback on our Discord channels.

--

--

Chris J D'Costa
totem | live accounting

Founder at Totem Accounting. P2P Accounting for the gig economy.