Fast LP net value calculation of all providers

TL;DR:

Published in

zelos-research

9 min readAug 30, 2023

We will explain how we calculate the net asset value of the full address in the previous article. https://medium.com/zelos-research/find-smart-money-from-eth-usdc-005-lp-b4fa83f92473

In previous articles, We discussed how to implement Demeter and why we use one-minute sampling to backtest the LP strategy. But at the same time, there is another demand, replay, which seems similar to backtesting, but there are many different details to discuss.

What is the replay: Calculate the historical net asset value of the full position for all time.

The historical net asset value is not simply calculated to the expiration date, but the profit and loss at any point in its life cycle should be calculated.
Quickly, the full data of a pool needs to be calculated in an acceptable computing time.

At first glance, we can use Demeter to replicate the on-chain behavior in the strategy using a time-trigger. But each time we start, the time pointer needs to be calculated from start over again. This calculation time is unacceptable.

There are differences between backtesting and replay:

The position replayed in the collect fee has on-chain settlement, and the starting point and endpoint of a position are completely determined and do not need to be decided by strategy.
The replayed position has a clear life cycle and does not need to wait for signals like event-driven.
Replay will contain a large number of positions, and there must be many common calculations that can be optimized.

Solution

Fortunately, we have found a new data structure that solves the efficiency problem. We call it the one-minute-sampled heat map.

Let’s take a look at the following table:

This table records the fee income in a tickspace. To calculate accurately, we will generate two tables to record the fee income of pool token0 and token1. Each row is the fee generated during this minute for each tickspace. Each day in history will calculate two tables to calculate the two-token income, The table has a fixed 1440 rows, and the columns depend on the price range of today. Each data point represents how much fee income can be obtained in this minute’s tick space for 1 unit of liquidity.

How do we calculate the fee income of a position on this day?

Any position is a rectangle on this table.

We notice that the basic information of any position is:

Life cycle
The upper and lower boundaries of the position
Liquidity

Correspondingly:

On the time axis, we can select according to the row
Select on the column
The liquidity is used to calculate how much the fee can be divided

We only need to add up the fee income of the unit liquidity fee in this range and multiply it by the liquidity of the position. Then, for any position, we can accurately calculate the fee income of the position at the granularity of one minute. Through this method, we have excavated most of the common calculations encountered when calculating all position fee income and improved the calculation efficiency by exchanging space for time.

For pool construction from the beginning to June 20, 2023, the data size is 3.4G.

Acceleration

Our previous optimization breaks down the common calculations to the smallest unit from the algorithm’s perspective to ensure the reuse of calculation results to optimize the algorithm. The next idea is to start from the time axis. If a position has a lifespan of 1 hour and 15 minutes. Then, according to the old method, we need to count 75 rows of data. This operation of time accumulation is also a common calculation that many positions are doing. So we can also optimize in this direction. The method we use is to use the minute-level fee table just mentioned to calculate it in hours. For a position with 1h15min, the possible situations are as follows:

We can calculate the fee income of the first five minutes and the last 10 minutes in the minute table, read 15 rows. Then read the hourly version of the data for the middle hour. This reads one line from the hourly table instead of adding up to 60 lines, which is our optimization on the time axis (75vs26). For most positions that exceed one day, there is a 60-fold optimization efficiency. The same idea can also be optimized on the tick axis, but considering that the current support for statistical tasks is sufficient, no further optimization is done in this direction. By now, we have achieved the net asset value calculation of the full position’s profit and loss in half a year within 6 hours.

Problems and Fixes

There are a total of 350,000 positions, and we have conducted an error analysis on them. When it expires, we use minute data to calculate, and there will be errors as follows:

We can see that if a position is long enough, its error will also be relatively small. If the position lifespan is relatively short, the error will be much larger.

So how are errors generated? After our analysis, there are probably the following types of errors:

Calculation errors of the program itself. After all, Python and EVM still have some differences.
MEV attack
The precision loss is caused by minute sampling itself, for example, if liquidity is provided three times within one minute, we will record all 3 “mint” in the last second of that minute.

Since it is a replay, there are errors between our calculation and reality, and what we need to do is to allocate the difference back. We made a simple linear distribution for each minute because the relative error was not much to begin with. We did not further use a more reasonable averaging method here.

Group by address

The next step is to extract all position operations. Including when, how much liquidity was added, and how much the range was added. Then sort according to the position id, transaction, and log order to form a file of about 176MB. For liquidity added through a proxy, users will receive an NFT. This NFT can be transferred to others. At this time, the address that added liquidity and the address that destroyed liquidity will have differences. Therefore, to accurately count the net asset value of an address at a certain period, it is necessary to count the transfer of NFTs.

After statistics, there are about 1400 transfers like this, which is very small compared to all positions. Due to time constraints, these situations are not dealt with first. All positions are assumed to be held by the address that was first minted. With this assumption, it is easy to establish the correspondence between address and position. Just search which address added liquidity when minting for the first time.

The net asset value of an address

The principle of obtaining the net asset value of a position is very simple. Through the mint and burn events of the position, we can know how much liquidity the position has at any time. Then accumulate the fee income at that time to calculate the net asset value. However, many details need to be considered for specific calculations. Specific calculation steps are :

Load the tick fees. Since loading more than a thousand CSV files is too slow, the fees are all read into memory, and a dictionary is generated in the format of date-fee content, which is then saved to a pickle file. Through this optimization, the loading time can be reduced from 10 seconds to 0.2 seconds. (This greatly improves the mood when running data)
Then process each position individually.
Then, according to the life cycle of the position, generate a time series. If the final liquidity of the position is greater than 0, it is considered that the position has always existed, and the end time is set to the end of the statistics.
According to the range of the tick, find out how much fee can be obtained from the fee at the corresponding time. Based on the price at that moment, get the net worth and calculate the rate of return.
If there is a “collect” at the current time, a compensation operation is required. Because the fee here is estimated based on the swap and tick range, there will be a difference from the actual value (the balance after deducting the liquidity principal in the “collect”). Therefore, this difference needs to be compensated for the previous time. The start and end time of the compensation are the last collect (if there is no collect, it is the start time of LP) until this collect.
Accumulate the fee for each hour as the current fee.
Calculate the current net value and rate of return.

In order to reduce the calculation amount, the precision of the data is reduced here. The previous sampling rate was 1 minute. Here it is changed to 1 hour.

Regarding the calculation of net worth and fees. There is an issue here. Since a sampling rate of 1 hour is adopted, assumptions need to be made for the start and end time of adding liquidity.

We assume that the price of this hour is the price at the beginning of this hour.
For net worth, we assume that the net worth of this hour is the net worth at the end of this hour.

Therefore, for the calculation of net worth, the following formula can be used, where f refers to the function of obtaining the LP principal amount through liquidity and price:

The calculation of the rate of return is more complicated. Here, the rate of return for each hour needs to be calculated first.

Calculating the rate of return for each hour is different from simply calculating the net worth. The increase and decrease of liquidity need to be excluded, and only the impact of asset price and fee income is considered. This makes the rate of return only reflect the impact of asset price and fee.

To calculate the rate of return for each hour, you need to know the net worth at the beginning and end of this hour. The calculation method is as follows:

Start

and end:

The liquidity used is the value at the beginning of this hour, so the calculated rate of return excludes the impact of liquidity increase and decrease.

With the beginning and end net worth, the rate of return can be obtained in the following way:

With the relationship between address and position, as well as the net worth and rate of return of the position at any time. This step is very simple. First, find all positions of the address, and then for a certain time,

Net worth: Calculate the sum of the net worth of all positions at the current time
Rate of return: according to the rate of return of all positions at the current time, weighted by the net worth, and then added up. As follows:

Finally, each address gets a file, with time series as rows. Net worth and rate of return are columns.

Conclusion

We have briefly introduced how we calculate the rate of return of the full addresses as the basic data for our analysis. I solved the problem of calculation efficiency and accuracy by using new data structures.