SDBS #27: StealthExplore Real-Time Rich List

Published in

stealthsend

7 min readJan 24, 2020

In today’s blog post, I describe the functionality and implementation of the StealthExplore real-time rich list.

Introduced in SDBS #25, the StealthExplore API (application programming interface) will be a full featured explorer-type remote procedure call (RPC) API optionally built into the reference client using a simple compiler flag. StealthExplore allows any full node to provide an explorer API without the need for a second layer database. The StealthExplore API is being engineered for performance and scalability. To underscore our engineering decisions, I will describe the rich list implementation and how this implementation will enable the rich list to update every block without a loss in performance.

— — — — — — —

The Rich List and Its Computational Requirements

Organizing accounts by their value, a rich list seems like a simple representation of wealth data. The Stealth rich list is kept on several explorers. One such example is at chain.stealth.org/richlist, which shows the top 100 Stealth addresses. With respect to computer science, the critical feature of a rich list is that it is sorted. Typically rich lists are sorted in descending order, with the most valuable accounts being listed first.

To understand how the sorting problem fits into rich list generation, consider a simplistic protocol for creating a rich list:

Go through every account and calculate each balance.
Sort these accounts by balance.

— — — — — — —

Calculating Address Balances: The Search Tree

For Stealth and other UTXO (unspent transaction output) blockchains, calculating the balance of a given address means that all UTXOs sent to that address must be tallied. For explorer databases, these UTXOs are added when a new block is connected to the chain and subtracted when a block is disconnected from the chain, as in a chain reorganization. By far, the computational cost for each addition and subtraction operation comes from the requirement to lookup the address in whatever data structure holds the balances. In contrast, simply adding or subtracting the balance, once found, is computationally negligible.

To have a rich list (or any similarly structured data, for that matter) therefore requires that the data structure be rapidly searchable. To this end, computer scientists have devised what is known as a search tree, that reduces the computational complexity of searches considerably.

This figure shows a search tree. To see how it works, consider the steps required to find the query address “JJJ”.

Start at the so-called “root” of the tree, “HHH”.
If “JJJ” comes alphabetically after “HHH”, go to the right of “HHH”, if not, go left.
Repeat for “LLL”.
“JJJ” matches. Stop.

This search required only three decision steps at “HHH”, “LLL”, and “JJJ”. At each decision point, the remaining search space was cut in half. For example at the “HHH” decision point, three of the six unsearched addresses were eliminated (“EEE”, “CCC”, and “GGG”). This successive reduction in search space means that the computational complexity of searching a tree is logarithmically related to the number of addresses to be searched.

To give an example, imagine searching 1023 unique addresses. This set of addresses would require 10 decision steps (the equation is 1023 = 2¹⁰ — 1). Now imagine searching a tree of 1,048,575 addresses. This only requires 20 decision steps (1,048,575 = 2²⁰ — 1). Even though the search space went up over 1000-fold, the computational requirements to search it only doubled.

Compare this search to a flat, unsorted, data structure, where each item was sequentially compared to the query. This type of search would require N/2 steps on average, where N is the number of addresses in the data structure. For very large search spaces, the computational savings of a search tree become incredible compared to a flat, unsorted data structure.

One caveat of a search tree is that the tree must be “level” to get the best computational savings. In other words, the right half of a tree can not have significantly more members than the left half, or the computational savings begin to degrade. For example, let’s say that the right half of a tree had 1000 members and the left half had only 10. If the first decision sent you to the right half, the computational savings would only be 1% and not 50%.

For this reason, optimized databases that are designed around search trees have sophisticated algorithms to keep them level as they grow. In fact, the embedded database used for StealthExplore is called “LevelDB” for this reason.

— — — — — — —

Using a Search Tree to Keep Addresses Sorted by Balance

A critical feature of the search tree in the above figure is that it is sorted. If one starts at the leftmost endpoint (leaf) of the tree and works from right to left, it is clear that one will visit each address in order (“CCC”, “EEE”, “GGG”, …). Search trees are kept in order so that they may be searched efficiently. To see how order is preserved, consider adding the address “FFF”, which goes between “EEE” and “GGG”.

The resulting tree would look like the following figure.

To insert “FFF”, it was made into a decision point (called a “node” in computer science parlance) and “GGG” was moved to a leaf. As a consequence, a new empty leaf (represented by the open box) was also made. This insertion necessarily made the tree unlevel, but it is as level as possible given that it has 8 elements and not a number of elements that can fit the formula

N = 2^D — 1

Where N is the number of elements and D is the number of decision steps.

As mentioned above, rich lists are not sorted by address, they are sorted by balance. We can fit a rich list into a search tree data structure by using data elements that have two parts, a balance and an address. For example imagine a rich list with three balance-address combinations.

In this very simple tree, addresses can be kept sorted by balance because (1) the balance is included in every data element, and (2) the balance is used for keeping the search tree sorted.

— — — — — — —

Solving the One-to-Many Balance:Address Relationship

The balance-address tree has two important issues that could hurt not only scalability, but also feasibility.

First, in terms of scalability, the whole data structure must be kept on disk so that it can be persistent between restarts of the client. In other words, if the client program is terminated, even unexpectedly, the rich list must be preserved on persistent storage (disk drive) so that it doesn’t need to be rebuilt when the client starts.

Second, and more critically, two addresses can have the same balance. This one-to-many relationship must be reflected in the design of the search tree.

To address scalability, StealthExplore keeps two search trees. One search tree, that includes addresses, is kept on disk. This search tree can be searched by balance to retrieve addresses associated with the balance queries.

A second search tree is kept in memory and contains only balances. This in-memory search tree is kept synchronized with the disk-bound tree. Importantly, the in-memory search tree is built from the persistent search tree upon restarting the client. The usefulness of the in-memory search tree is that it is possible, for example, to find the richest 100 balances quickly, iterating over the sorted in-memory tree in 100 steps, and using these balances to query the disk bound tree to find the addresses corresponding to the 100 balances. The complexity of this search operation can be kept to n, where n is the number of richest addresses to retrieve.

To address the one-to-many relationship, the disk-bound search tree simply uses a list of addresses as the second part of the data element. For example, let’s imagine that both Bob and Carol have a balance of 200. The corresponding search tree would look like the following figure:

Here, instead of single addresses, the search tree stores lists of addresses, represented by enclosure in square brackets. For example “[Elizabeth]” is a list of one address. To see how this is useful, imagine we want to find the second richest address, where all known balances and addresses are included in the simple tree above.

Start with the rightmost element of the in-memory tree, the balance is 800.
Obtain the number of addresses in the list corresponding to a balance of 800 in the disk-bound tree.
Proceed right-to-left in the in-memory tree until the total number of addresses equals two, because we are searching for the second richest address. This address is “Elizabeth”.

This process may seem complicated, but it is very computationally efficient. The computational requirement follows the formula r*log(N), where r is the rank of the address for which we are searching and N is the total number of balances in the search tree.

— — — — — — —

Conclusion: The Rich List Exemplifies Infrastructure Scalability

Although the rich list is arguably a nonessential feature of the StealthExplore API, it serves as an ideal example of our approach to infrastructure scalability. All engineering decisions of the StealthExplore API will incorporate computer science principles that reduce computational requirements, allowing the API to scale with Stealth usage and adoption.

— — — — — — —

Hondo

— — — — — — —

Website / Telegram / Slack / Medium / Twitter / Reddit

SDBS #27: StealthExplore Real-Time Rich List

The Rich List and Its Computational Requirements

Calculating Address Balances: The Search Tree

Using a Search Tree to Keep Addresses Sorted by Balance

Solving the One-to-Many Balance:Address Relationship

Conclusion: The Rich List Exemplifies Infrastructure Scalability

Written by Stealth