Liquidity Forecasting in Mosaic: Part IV — Machine Learning Based Methods

Jesper Kristensen and 0xbrainjar

Published in

Composable Finance

7 min readDec 31, 2021

To this point in our Mosaic liquidity forecasting series, we have covered in great detail — across three posts — how we developed traditional non-Machine-Learning-based (non-ML) forecasting methods.

First, we reiterate the impact of good liquidity forecasting on the behavior of the Mosaic product. The ability to forecast liquidity values in the vaults part of the Mosaic network helps improve the overall system’s performance and user experience. This is because depleted (low liquidity) vaults lead to failed transfers defeating the very purpose of the bridge. So the goal is to maintain a balanced network of vaults with adequate liquidity where needed at all times. The ability to accurately forecast a single vault’s liquidity is a critical ingredient in the overall rebalancing algorithm, which we have also built — but yet to be covered.

In this post, we leverage ML for the forecasting task.

Forecasting Liquidity using Gaussian Process models

ML-enhanced modeling with Gaussian Processes

We now describe the application of state-of-the-art ML and artificial intelligence (AI)-flavored algorithms to the forecasting challenges on our PoC. To start, we first employ Gaussian Processes (GPs) that are known to be equivalent to infinite-width layer neural nets and their deep compositions analogous to such infinite-width limits of deep nets. A GP prior, f ∼ GP(μ, K), which is a prior distribution directly over functions, is uniquely defined via its mean μ and covariance kernel function K.

GPs and their deep or shallow compositions are the leading probabilistic and Bayesian non-parametric framework for supervised learning, and in this work, we build upon appropriate state-space representations of GPs with linear O(N) dominant time complexity for such 1-D time-series forecasting settings through Kalman filtering. This allows us to develop an overall scalable online system that rips through the data in seconds, avoiding the naive cubic O(N∋) or more typical polynomial O(MN2 + M3) complexity required in sparse variational GPs.

In this work, we employ a Gaussian likelihood for p(Y|f,X,θ), i.e., noise is normally distributed, and this leads to a conjugate prior-likelihood pair with a closed-form Gaussian posterior distribution p(Y|f,X,θ) over the functions of interest.

Hyper-parameter optimization over θ is performed using a modern optimizer to maximize the marginal likelihood. We explore kernel families such as the Matérn, Periodic, and composite kernel constructions after linear de-trending of the data or by encoding the linear trend in the mean function. More sequence-specialized kernel constructions require an increased computational budget, such as spectral mixture and signature kernels which we leave for future work with other auto-regressive kernels and state-space GP formulations.

Forecasting using GP on the simulated data

We use the same data generated via the LSE in previous posts and perform a rolling forecast with GPs specified by Matérn plus Periodic kernels and a 0-mean corresponding to the seed 100% liquidity level. Hence the 0-mean GP prior helps return the forecast in the far extrapolation regions away from training data back to the seed liquidity level without exploding to excessively high or deficient levels. The probabilistic nature of the framework allows for good uncertainty quantification. In many cases, the true 168-hour ahead liquidity is well within the 95% intervals of the forecast, as seen from the results below.

From the 80 rolling forecasts, we demonstrate a few representative examples of the forecast behavior that highlight various aspects of the model behavior on this dataset. In general, the GP outperforms ARIMA and avoids significant errors due to the prior model capacity and the built-in Occam’s razor principles from its Bayesian non-parametric nature. The coverage for LSE is 80%; that is, 64 of the 80 rolling forecasts contain the true liquidity a week ahead (168th hours into the future) within their 95% confidence intervals. All results are shown in Fig. (1). A comparison of the RMSE values obtained using the GP and ARIMA models is shown in Fig. (2).

Figure 1: Top left: Picking up a periodic structure that results in an excellent fit for the first 100-hours and then slowly returning towards 0-mean prior, which corresponds to the original seed value (100%).

Top right: An example where the 168th-hour prediction comes very close to the true liquidity value.

Middle left: The combination of Matérn and Periodic kernels results in an excellent forecast and very good uncertainty quantification that progressively increases as we extrapolate further away into time.

Middle right: Similar behavior as above, slightly underestimating the true liquidity level because of the training window not providing great supporting evidence for the type of increase observed around 780 hours in the ground truth.

Bottom: An example of over-estimation where the GP forecast is for higher liquidity than the ground truth from the LSE data generating process.

**Figure 2: RMSE comparison between GP and ARIMA models on LSE data.**

Forecasting using GP on the PoC data

Next, we turn to forecasting on the recent Mosaic PoC data.

Polygon (POL) liquidity

We now report results on Polygon liquidity forecasting. To deal with the interventions on the vault that result in resetting to seed values when a certain threshold has been crossed, we embed the model in an overall algorithmic design that handles these ceilings based on the known threshold values and the forecasted predictive mean of the model. Standard linear detrending within a training window takes place, and that trend is also subtracted from the forecast ahead for consistency.

Specifically for the crossing events, we first detect a threshold crossed in the training set and adjust the remaining samples after that threshold by adding the seed percentage liquidity. Secondly, if a forecast mean exceeds the ceiling/threshold, the remaining forecast mean is re-adjusted to return to the original seed 100% level (represented as 0 in our formulation of the 0-mean GP prior).

A further development was to treat the expected hitting time of the ceiling more probabilistically. Rather than relying on the first point of the forecast that crosses the threshold, we can utilize the predictive distribution better (i.e., look at the second moment too rather than just the predictive mean m*) and capture our uncertainty of it. This is done by utilizing the fact that the predictive density p(f∗|X∗, X, f) is Gaussian and computing the probability of the forecast to hit the threshold T via the CDF function of the Gaussian Φ up until threshold T . Then the new predictive mean becomes:

Finally, we can construct smoother versions of this weighted predictive mean by voting across a predictive window ahead. This ensures that when a forecast has a single, non-confident, predictive event crossing the ceiling, it gets adjusted by considering follow-up forecasting points and their associated uncertainty. We call such adjustments that lead to smoother transitions “S-like” rather than the sudden “Z-like” jumps that follow a more complex, Z-shaped change as previously. This is demonstrated in the Arbitrum results in the next section. Again, it’s worth noting the significant improvements in predictive performance over the simpler ARIMA models.

**Figure 3:** **Top:** A forecast that slightly underestimates liquidity level and does not forecast the threshold/ceiling of the vault correctly. This is primarily due to the change in linear trend between training and forecast period, as seen from the higher average level of liquidity in the forecast window. **Middle left:** Accurate forecast of hitting the threshold and resetting the vault’s liquidity. **Middle right:** A case where liquidity crosses the threshold twice and is picked up by the model and reflected in its forecast ahead. **Bottom left:** Accurate estimation of hitting time for vault’s threshold/ceiling at latter parts of the rolling forecast period. **Bottom right:** An accurate double-crossing event captured by the model.

Arbitrum (ARB) liquidity

The Arbitrum dataset exhibits floor-events, i.e., thresholds and vault resetting due to the initial seed’s depletion. Hence the behavior of the signal is a reverse chainsaw effect compared to the Polygon data in the previous section. Again some outstanding predictive performances are reported, outperforming ARIMA models and with further improvements possible in the future, as discussed in the last section. Some indicative runs from the Arbitrum case are shown in Fig. (4).

**Figure 4:** **Top left:** An initial window forecast on Arbitrum. The forecasting trend is off due to the training window, and the model slightly overestimates liquidity. **Top right:** Hard transitions (Z-like) on Arbitrum liquidity forecasting based on a single prediction exceeding the lower threshold. Underestimating the hitting time. Single crossing event. **Middle left:** Hard transitions (Z-like) on Arbitrum liquidity forecasting based on a single prediction exceeding the lower threshold. A good estimation of hitting time. Single crossing event. **Middle right:** Smoother, S-like transitions on Arbitrum liquidity based on the CDF of the predictive density. Single crossing event. Bottom: Smoother, S-like transitions on Arbitrum liquidity based on the CDF of the predictive density. Double-crossing event.

Conclusion

We have introduced ML-based forecasting into the Mosaic liquidity system. The ML model can capture the liquidity evolution across vaults very well, and it outperforms the non-ML-based methods such as ARIMA developed in previous posts.

We will be shipping this ML capability to Mosaic. Finally, some potential future directions for improvements on liquidity forecasting:

Explore richer Kernel families within a GP framework such as Spectral Mixtures and Signature kernels.
Explore alternative frameworks such as ANNs, State Space GPs, Deep GPs, etc.
Explore the nature of the process across re-seeding events and potentially treating these as i.i.d trajectories to increase sample size and afford deep learning models.