The Spin matching engine: optimizing NEAR gas prices in layman’s terms

Published in

Spin

11 min readApr 27, 2022

In web3, the problem of transaction fees (or gas fees) is acute. Unlike web2, users pay for using the infrastructure (blockchain) every time they access it. The cost depends on “how much your operation will load the blockchain.” Or in other words, one operation requires a certain commission, measured in so-called gas units. These gas units determine the relative complexity of your operation. To get an absolute value of how much the user must pay in NEAR to complete the operation, this value must be multiplied by the gas price, which dynamically changes depending on several factors.

This question comes to the fore when implementing a DEX operating through an on-chain order book. The matching engine should be able to process thousands of orders simultaneously: add them to the order book, remove them from the order book, and match orders with each other. Our task is to minimize gas consumption in all of these operations.

Let’s look at an example of our first implementation of an on-chain order book “on the forehead” — respectively slow and expensive. Later, we’ll dig deeper into optimizations that have reduced gas consumption almost 10 times less in the order book.

Matching Engine v1: Slow & Expensive

What is an order book from a developer’s point of view? It’s a list of price levels for bids and asks where each level contains some set of orders placed in the order book at that price. The structure of the order book from the code can be described in this way:

TreeMap and LookupMap are data structures built into NEAR that follow the interface of regular hash tables (element insertion, element removal, element presence check). But, all have different internal implementations and slightly differ in additional properties. You can read more about these differences in the official NEAR documentation: https://docs.near.org/docs/concepts/data-storage#rust-collection-types.

Such implementations are built on the internal data structures of NEAR, but it works rather poorly: matching a large number of orders requires a large number of cumbersome operations since you have to constantly go over some data structures and look for certain orders, which in turn leads to consuming a lot of gas. This implementation allows you to match one order with only 15 orders (each of the fifteen orders is at a different price). Otherwise, the gas limit is exceeded.

Obviously, with such order book performance indicators, it is almost impossible to compete with CEX exchanges. We have done a lot of work on various optimizations, and these optimizations helped reduce gas consumption by 10 times. Throughout this article, we’ll describe how we managed to achieve this in more detail. Some optimizations are only geared for the matching engine and others towards the internal implementation of other aspects of our exchange.

Balance storage optimization

When you come to the Spin DEX, you must make a deposit and transfer the amount from your main wallet to your “trading” wallet. Our smart contract needs to store the information somewhere about who deposited how much money to check if there are enough funds to place the order or not. The easiest way to store this data is to use a nested hashmap structure (without limiting the generality of reasoning, we consider that we have only 64 tokens):

By storing data in this way, we can quickly retrieve the necessary balances, but they take up a lot of memory, and you have to pay for memory on the blockchain. Unlike markets, it is easy to see the inefficiency of this storage method here:

- Currency names are duplicated for both Alice and Bob,

In fact, the names of currencies can not be stored if we agree that the balance for the `TKN_1` token will be in the first cell of the array, for `TKN_2` — in the second, and so on.

You can store this data in a more optimal way where each element in the array corresponds to a particular token:

This can be further optimized. Let’s say we have a lot of tokens, not 64, but 1024. We want to get the balance of the `TKN_32` token for Alice. To do this, we need to dump the entire array of 1024 elements into our memory, although we only need one element — this is not efficient! Therefore, it makes sense to break the entire array of 1024 elements into “pieces,” for example, 32 elements each.

If Alice_i is calculated as i = TKN_N // 32 (integer division). Then, to get the balance of the TKN_32 token for Alice, we need to read only 32 elements in the memory instead of all 1024.

Order execution

Our contract must match your order with the orders in the order book when you place an order. This is done by sorting the opposite orders from the best to the least suitable for matching.

To find the best price and speed, applications are grouped by price and sorted from best to worst. Within the same price, orders are placed in the order they were added to keep the queue out of execution.

At the beginning of the comparison, we load the first 32 best prices. We download a lot at once because it is cheaper than downloading them one at a time.

These loaded prices are a binary tree or a BinaryHeap. We use this structure because it allows you to add and retrieve prices according to order priority while storing data efficiently. The tree structure looks like this: the best price is at the top. In this case, 40:

We extract this price from the tree and rearrange it so that the smallest element is still at the top. You can read more about this here: https://en.wikipedia.org/wiki/Binary_heap.

After we have retrieved the price (40 in this example), we need to get a list of orders in the order they were added (to the queue). We store them as linked lists, where each of the orders points to the next one in the queue and the previous one.

In this example, the first order in the list is an order with a quantity of 100 units. We extract it from the list and take it for further calculations. After that, the list will look like this. And now our price is 40 points for an order of 13 units.

If we do not need all 100 units to close our position but need, for example, 50, we return the remainder to our linked list, and we change the pointer in our price to an order with a balance of 50. We have finished the order execution procedure since we completed it.

Otherwise, we go through the entire list and alternately extract and collect order by order. At the moment, when there are no orders left at the given price (40), we extract the next price (60). We check that the given price suits us and repeat the algorithm for collecting orders at this price. So we do this until we have completed the application and the current price is still suitable for us. But there may come a time when the top 32 prices loaded at the beginning run out and then we need to load subsequent prices.

Similarly, with the price tree: we have a tree of price trees 🤓

We will demonstrate how this tree of trees works in the 10th number system, although act binary prefixes are used for clarity and simplicity.

At the root of the tree, there is a 32-byte binary tree (let’s call it the root), which can contain 256 elements (0.255); it takes up very little space (32 bytes) and allows us to split all the prices into 256 cells (by their first byte) This can be thought of as if we were grouping prices by the most significant decimal place.

For example:

*In the first group, we have valuable 1, 2, 5, 8, …; In the second 11, 12, …; In the third 22, 28, … and so on.*

In this binary tree, we are looking for the minimum element. For example, let it be 1. This is the prefix of our future price (high order). Using the obtained prefix as a key, we load the next level of our tree (let’s call it intermediate). This tree consists of intermediate price categories.

Example:

*We have loaded the intermediate tree for the most significant bit (1).*

From this tree, we can get the key to load the last level of our structure (let’s call it the leaf). We, by analogy, get the root of the intermediate tree since it contains the minimum element. In this case, it is (3). We combine this element with the prefix obtained from the root (1) and get (13). This is the complete price tree key and using it; we can load the price suffixes. The leaf tree, by analogy with the previous level, is a BinaryHeap.

We have come a long way to get to this point, uploading a lot of intermediate data to do this as rarely as possible. We get not 1 element from this tree but 32 elements at once. Or if there aren’t 32 elements, then we take the whole tree. In this example, we will take the entire tree. We bypass it and restore prices by adding a prefix to all the elements of this tree.

At each level, we store only a part of the price digits, which allows us to reduce the size of the stored data. Since we have taken the entire leaf tree, we need to remove the pointer to it in the intermediate tree.

*We uploaded the intermediate tree earlier (1) before and after removing the element.*

Since there are elements left in the intermediate tree, we do not need to remove the reference to this intermediate tree in the root tree. Next time, we save the received prices in the cache to match them immediately and continue the order execution process. We get the best price, compare orders at this price, and so on until we complete the application or run out of orders that match the price in the order book.

Placing new orders

When you add an order to the order book, we must find the best place to store it to execute it with minimal resources later. For example, we will place a buy order with a price of 110 and a quantity of 400. If we went through the entire “Order Execution” algorithm and failed to execute it, for example, due to a lack of supply or an unsuitable price, we must save it to the order book for further execution.

Step 1

First, we load the title of the linked list with the price from the order. In addition to the link to the first order in the list, the title contains a link to the last element.

We load the last element of the list and specify a new order for it as the next element and in the new order, we indicate the last element as the previous one. We also update the link to the last item in the price header.

If the list is empty, we create a new list with one element and links to it in the header.

Step 2

We need to make sure our price is referenced in the price tree. If our price is present in the price tree, then the process of adding an order is complete. If our price is not in the price tree, then we need to add it.

We need to add a price of 100.

We add the price to the last available item. We go top-down-left-right.

And we need to restore our hips. The parent element must always be smaller than the child element. To do this, we simply move our price up the tree until this ratio is correct.

That’s all. The price has been added. You can stop there. But imagine if we add and add prices, and the tree grows. In this situation, users will be forced to pay to download a large price tree that they do not use. So when there are too many prices (64), we take the worst 32 prices and dump them out of our price trees.

In this example, for simplicity, we will consider the case when all uploaded prices have a common prefix. For example, we will upload prices (130, 132, 133, 136). We take the highest price digit (1) and add it to the rooted tree.

We load the intermediate tree by key (1) and insert the next part of the common prefix (3) into it.

*The intermediate tree before and after price bit insertion*

The last step is to store the price suffixes in the leaf tree.

Other minor optimizations

1. For some, it may not be obvious, but only the necessary information needs to be stored on the blockchain. For example, storing an order history — even if it is short — is a very bad idea, as the performance of the order book drops significantly!

2. `U128`. Binary serialization has been redone. The size of the binary representation depends on the value, and 0 is encoded as 1 byte. Due to our contract’s 4 decimal places of precision, most `U128` encodes less than 5 bytes. This allows you to download twice as much data per reading for the same price.

The binary representation of a number:

- If the number is 0, → 1 byte FF

- If the number is not equal to 0 → 1 byte mask (the number of binary zeros at the beginning + the number of binary zeros at the end) 1–16 significant bytes.

Worst case `U128` 17 bytes (unreachable case in our contract).