Turbo-Geth, What’s Different: The Database

4 min readFeb 11, 2020

Some months ago, I joined the Turbo-Geth team and started to actively contribute to Turbo-Geth. For anyone who doesn’t know what Turbo-Geth is, it’s an alternative version of Geth (currently still being developed), whose objective is to be faster and more efficient than its original counterpart. How Turbo-Geth is trying to accomplish this, includes the following:

More Organized Database Format
Less Read and Write operations with the Database when interacting with the state
Trie Optimisation (Possibly by replacing tries with an alternative data structure)

In this article, I would like to focus on what is the difference with Geth in terms of database. The main differences in this regard are the following:

Different database(use of Bolt instead of LevelDB)
Subdivision of the database in buckets

So my focus in this article is on the two points above. First of all, let’s talk about Bolt.

What is Bolt and what are the differences with LevelDB?

Bolt and LeveldDB are very similar. both of them are key/value stores, made to provide simple, fast, and reliable databases for projects that don’t require a full database server. LevelDB is the database used for Geth and Bolt is the one used for Turbo-Geth. However, there is a key difference between them: How the database organizes its data. Both LevelDB and Bolt are key-value storage. LevelDB is an LSM Database and on the other hand, Bolt uses buckets, where each bucket contains a B+-Tree structure. We can think as a bucket as “little databases inside the big database”. The main difference between the two is that LSM Databases (LevelDB/Geth database) are optimised for heavy appending and range scanning but not for random read performance and while offering consistency they don’t allow to read and write from the database at the same time. Also, for the sake of performance atomicity is not implemented. Bolt instead is slower for inserting but is faster for random read, implement atomicity, and consent reading and writing on the database concurrently. Let’s explain atomicity a bit better:

Atomicity: something being atomic means that is indivisible. For Turbo-Geth this means that if, for example, we want to put in the database a set of hashes and one of those hashes can’t be inserted then none of the others will be inserted. only if all of the hashes can get inserted then the change will occur: not having atomicity implemented(like LevelDB does) means that a workaround must be used for safely inserting data in the database. In other words, for this point, we think that Bolt is better because it’s safer in terms of adding data to the database.

Organization of the database.

As we said before Turbo-Geth is divided into multiple buckets. We can think of buckets as “little databases inside the big database” and each of them contains a B+-Tree structure.

Below is the current subdivision of Turbo-Geth Database at block no. 9346492 (Archive):

Subdivision at block no. 9346492 (Archive node)

Total Chain Size of Geth (Block no.9346492): 3.7 TB
Total Chain Size of Parity (Block no.9346492): 3.6 TB
Total Chain Size of Turbo-Geth (Block no.9346492): 652.62 GB

Each section is a bucket. Here is a brief explanation of the most important of sections shown above:

Preimages: Association of hashes to addresses and storage location hashes to storage locations
Receipts: contain the transaction receipts
History of Storage: Contain the changes in Storage
History of Accounts: Contain the changes in Accounts
Block Headers: Contain the headers of each block
Block Bodies: Contain the bodies of each block
Contract Storage: Contain Storage
ChangeSet: Database Change History
Accounts: Contain Accounts

The reason why we use so many buckets is to split the database into multiple not so deep B+-Trees so that iteration to the database is easier. In other words, we use many buckets to improve performance when reading from the database.

Another alternative that can be used as a Database: Badger DB

After switching to Bolt, Turbo-Geth experienced problems when dealing with random keys (such as transaction hashes), because Bolt sorts the keys before committing them, and since the hashes are random and are a lot, lots of sorts are generated causing massive write amplification. BadgerDB uses log-structured-merge (LSM) which can result to be a better option. This matter is still under experimentation. However, we are implementing a workaround to fix this in Bolt.

Here is a little graph that shows the performance of BadgerDB vs BoltDB in the overall made by Alexey Akhunov: