LiteTree: A blazing-fast, branchable database engine

Opening blockchain development up to over 8.5M SQL engineers

Hunyoung Park
Aergo blog
8 min readOct 11, 2018

--

Imagine being able to have multiple connections to a single database with each connection reading a separate branch or commit at the same time. Imagine a single database with each connection writing to entirely separate branches. This is all possible with a new tool that one of our developers from Brazil, Bernardo Ramos, has poured twenty years of software development and relational database management experience into creating.

We are proud to announce our first of many open-source project releases: LiteTree, a speedy SQLite engine with branching capabilities. LiteTree has garnered significant interest and has become very well accepted among the developer community since we first open-sourced the initial version to our GitHub page. It sat at first place on Hacker News for several hours; it became a hot topic on Hackaday, and it was well regarded on OpenNET. LiteTree’s innovation and contribution to open-source were recognized and are evident in how it has over 1,250 favorites on GitHub as of writing.

There are two main features offered in LiteTree:

  • Branching: In a Git-like flow, LiteTree allows developers to branch SQL-based data
  • Speed: LiteTree is blazing-fast; twice as fast as the original SQLite software is for both reads and writes.

LiteTree allows you to store multiple versions of one database, whereby each branch can continue to develop and update independently. This means that a user can access a database through numerous read points, each one associated with a specific development branch in the database. LiteTree is more than twice as fast as SQLite is typically on Linux and MacOSX. It is implemented by storing the SQLite database pages on LMDB.

The following will explain why and how we came to develop it for use at Blocko, and the role it will play on the Aergo protocol that we are working hard to push out next year.

Coinstack

A few months back, the team here at Blocko wanted to provide a SQL-based point in time recovery functionality in Coinstack, our full-stack permissioned blockchain product deployed across over 23 enterprise-IT environments. This point in time recovery feature would provide a database storage rollback function for the blockchain when reorganizing blocks. We concluded that it would be an excellent feature to provide in Coinstack after many discussions amongst ourselves and our partners. We began by using LiteReplica, a modified SQLite engine for database replication, to give Coinstack this database function.

But what exactly is point in time recovery? See Figure 1 as an example to make it a bit clearer.

Figure 1. Rollback with SQL-based storage for a blockchain (point in time recovery)

Let’s say a block can have zero or more contract transactions. If the block is successfully connected to the longest chain, it leaves one commit to the corresponding SQL storage for every contract transaction. This commit contains all the updates of contract transactions. A block contains commit points associated with the SQL database storage. The commit point of the contract that has not been updated is the same commit point of the previous block.

In Figure 1, the original best block is block 103. If new blocks are received from block 101 to block 110, the blockchain must be rolled back to block 100. After that, contract A’s commit point is commit 1, and contract B’s commit point is commit 8. Rolling back the contract B database to commit point 8 can make his SQL storage correspond to block 100.

Through point in time recovery, we are effectively able to execute contract transactions of new blocks seamlessly. We implemented this feature into our Coinstack platform and clients were able to benefit from it across many different use cases.

Fast forward many months, we found ourselves building Aergo. Unlike Coinstack, Aergo would be entirely open-source, and the main chain would be public. For this reason, we found ourselves wanting to appeal to developers even more so than with Coinstack. We decided to incorporate similar features to expose the platform to the broader community of database engineers familiar with languages like SQL.

Aergo

On our mission to create an open-source platform fit for mass use, we plan to provide support for SQL in the blockchain in a few ways. Our plan to do this entails two things:

  • Provision of database functionalities in Aergo: Allowing for the interrogation of data and codification of future interoperability features through SQL
  • SQL-based smart contracts: Supporting SQL-based syntax and semantics in smart contract coding

A blockchain implementation of LiteTree addresses the first of these by incorporating sophisticated database functionalities in the blockchain. It provides an incredibly more convenient way of handling data on chains and in many of our interoperability features when compared with a simple key/value store based system.

Provision of database functionalities in Aergo

There are two main features we can bring to Aergo through the use of LiteTree:

1. Database storage control for the Aergo chain

LiteTree allows for functionalities of a relational database in the ledger to enable block reorganization and rollback features through point in time recovery, as explained above. In Aergo, LiteTree will allow for speedy point in time recovery; faster than what LiteReplica was able to achieve when it was applied to Blocko’s Coinstack.

In Figure 2, the SQL-based state storage is managed using LiteTree. The SQL storage can store structured data and is accessed through SQL. There is only one SQL storage per contract.

Figure 2: Example of how LiteTree would be used in a blockchain

LiteTree branching on Aergo: With LiteTree, you not only have the robust features of a superfast SQLite engine, but it also allows for branching. This is something we initially planned to leverage in our n-tier architecture on Aergo. We originally planned to provide Git-like branches in the blockchain as part of a block parallelization solution, and implementing LiteTree would allow for relational database storage control in branches (described in our current technical white paper). However, we have since changed the plan to utilize a 1st-layer protocol level database storage instead. This means what we initially developed LiteTree’s branching functionality for will not be used. We are active in research and will inform our technical community on how branchability at the storage level will be applied in Aergo.

2. Multi-Version Concurrency Control of state in accordance with blocks

With LiteTree implemented in Aergo, multi-version concurrency control is enabled so developers can see consistent data among different block heights.

SQL-based smart contracts

Supporting SQL for smart contract development is another move by Aergo to broaden platform usage among the broader developer community. The platform will include AergoSQL, a canonical smart contract engine based on a coding language that uses SQL-like syntax and semantics.

We are happy to announce that development of AergoSQL has been well on track and a private beta version is currently being tested. More information regarding its open-source release will be announced in the future. The first upcoming release of AergoSQL will include call-level SQL support. Grammar-integrated SQL support will be released later on after that.

How to use LiteTree

Each database transaction is saved as a commit, and each commit has an incremental number. Let’s consider an empty database in which we run this first SQL command:

Now it will have the first commit in the automatically created master branch.

When we execute new transactions, it will add new commits to the current branch.

Now we have 3 commits.

To include many SQL commands in a single commit, we must enclose them in BEGIN and COMMIT commands.

We create new branches informing the source branch and commit number.

After this command is executed, the new branch is created but without any new data added to it. The database connection also moves to this new branch, having it as the current branch. We can check the current branch with the command.

In this case, it will return: test

If we execute a SQL command on this database connection, the commit will be saved in the connection’s current branch.

Now, the graph state will be:

We can also read the database at this new branch.

It will return these values:

first

from test branch

We can move to the master branch.

Executing the same SELECT command, but now in the master branch, it will return.

first

second

Different content for the same table on separate branches.

Commits in separate branches have the same numbering based on the distance from the first commit.

We can read the database in a previous point-in-time by moving to that commit, like this.

At this point, the table t1 has a single row, and if we do a SELECT it will return just first.

Writing is only possible at the head of each branch, so we cannot write to the database when we are in a defined commit. If you want to make modifications to some previous commit, you must create a new branch that starts at that commit. It is also possible to truncate a branch at a specific commit, rename a branch, delete it, and retrieve branch info.

You can use LiteTree with big databases that require many gigabytes of storage space. There is no data copying when creating a new branch; when a new transaction is committed, only the modified database pages are copied.

Merging on LiteTree

We initially decided not to include diff or merging capabilities in LiteTree as those functionalities are not needed in Aergo. However, many developers who came across LiteTree have shown interest in these features. We are actively working to incorporate these functionalities. As not only an open-source project but an entire open-source platform and supporting ecosystem; we plan to make the code and tools we ship out as attractive and useful as possible to bring as many developers as possible into our ecosystem.

Conclusion

To summarize, implementing LiteTree on Aergo will allow us to provide many database functionalities to developers and will enable them to leverage their SQL coding capabilities on the blockchain. We will also be able to leverage LiteTree’s branching features for database storage functionalities into Aergo’s n-tier blockchain design, maximizing programmability for developers or businesses deploying and managing chains on the Aergo platform.

We hope to see many of you using LiteTree in your projects and look forward to seeing the things our technical community comes up with. Please stay tuned for more updates regarding additional features to LiteTree shortly. In the meantime, feel free to contribute to LiteTree as you wish.

This post is available in Korean here.

If you would like to engage with us regarding LiteTree or any technical aspects of Aergo, we have recently opened up our official Discord server:

To join us on any of our other channels:

--

--

Hunyoung Park
Aergo blog

Board Member of the AERGO Foundation and CTO at Blocko Inc.