Maker Vault Liquidation ML Model

Predicting liquidations by applying major historical price shocks

Published in

Block Analitica

9 min readSep 2, 2022

Introduction

Liquidations are one of the key features of the MakerDAO system. Without them, the protocol would be at risk of under-collateralization in the case of sufficient drop in collateral asset prices. Despite the currently functioning liquidation mechanism whose aim is to prevent that, there still exists a risk of bad debt accrual in case of major market shocks.

MakerDAO’s MCD (Multi-Collateral DAI) system was launched almost 3 years ago (November 2019). Since then, the system experienced multiple stress-tests when large amounts of collateral were at risk of being liquidated. The available Ethereum on-chain data enables us to analyze how different market shocks either caused vaults to be liquidated or not. Behavioral patterns of these vaults can then give insight into what makes a vault well protected. We can use an ML model to automate the pattern recognition process, test its performance on data not available during the model training process and finally estimate what’s the probability of vault liquidation at a specific point in time.

The aim of this post is to share the model framework we have built, its performance and how similar modeling can be further used to make the DeFi ecosystem both more capital efficient and robust against various market shocks.

We show that we can achieve high model performance predicting vault liquidations, what is the predicted liquidated amount on current portfolio state and add conservative scenarios where there is an initial decrease in CR buffer (loan safety margin). We extend the analysis by applying the predictive model to a period of relatively high portfolio risk and compare the results with the current portfolio state’s predictions.

Methodology

As the majority of vault liquidations happen during only a few irregular and rare market shocks, this needs to be taken into account in the modeling process. Our aim is to model vault liquidations without also implicitly trying to predict market shocks (arguably unpredictable). There is a potential workaround by simulating market prices as a stochastic process with jumps (eg. GBM) and extracting price drops after applying large volatility multipliers. We decided not to pursue this approach as it not only introduces strong modeler bias (how much volatility multiplier to apply, what’s the time frame that we’re taking, etc.), it is also less interpretable than our chosen methodology.

Our approach instead is to build a training set by filtering out only the largest market price shocks since MCD’s launch. We extract both vault state and its historical behavioral patterns before the price shocks. This also shapes our interpretation of the results. It means that the predicted probability of vault liquidation comes with the assumption of vaults experiencing a market shock in the next day that’s generalizable across all of the price shocks that compose the training data. Our training data’s size is therefore the number of price shocks multiplied by the number of active vaults the day before. While not filtering out only selected days would increase the training data, it would also decrease the signal to noise ratio significantly, making it more difficult for the model to find meaningful patterns.

Vault liquidation is a supervised, binary classification problem given that we have available ground truth and only two potential outcomes. It also comes with the challenge of class imbalance as the majority of vaults are not liquidated (even with the applied filter). The additional filter is only selecting vaults that had at least 100 outstanding DAI (definition of an active vault).

The model’s features (signals) aim to best describe the behavioral patterns of how likely a vault is to be liquidated. Some of the features include CR buffer (loan safety margin), debt amount, historical activity, ecosystem maturity/development and whether a vault is likely to be automatically managed through a bot.

The model’s label is a binary yes/no whether the vault was liquidated within 48 hours after the extracted vault data (snapshot state and historical behavior). Liquidation is used as a proxy for loan default (aligned with our previously developed methodology of Vault protection score). The predicted probability of vault liquidation could be further used as a component in determining individual credit scores.

Given that there is a strong class imbalance we can’t rely on accuracy as the only metric to look at. The first evaluation that we looked at is recall because false negatives (not correctly classifying a vault that is liquidated) are more costly than false positives (not correctly classifying a vault that is not liquidated). In that regard, this use-case is similar to fraud detection and cancer classification in CT scan images. We trade-off recall at the cost of precision by decreasing the decision boundary from the default 0.5 to 0.25. The second evaluation is AUC ROC curve which shows model performance across a range of decision thresholds. The third evaluation is the Precision-Recall curve which is more optimal for evaluation compared to AUC in case of class imbalance. We also show a confusion matrix performance on the test set to make the precision/recall trade-off simpler to convey.

We experimented with a range of classifier models, especially the gradient boosting methods. We further increased the model performance by using hyperparameter tuning and various class imbalance techniques. For evaluation, we used a stratified train-test split with a k-fold cross-validation.

After the evaluation step we apply the prediction model on different portfolio states over time to extract vaults’ probability of liquidation if there would be a price shock comparable to the worst that we’ve seen so far historically.

Analysis

EDA

In the training data, we have more than 22 thousand vault instances. Around 5% of instances were liquidated, indicating the class imbalance mentioned above. This means that if we used a simple classifier and predicted all vaults to not be liquidated, we would achieve an accuracy of 95%, despite no (useful) information gain in the model. For this reason we need to optimize other performance metrics. The table below shows the distribution of active vaults included in the dataset across different stress-test days (the actual price shock happened the day after this date).

Model performance

Our best model achieved a recall of 0.87 which was increased because of both model tuning and change of decision boundary mentioned above. This indicates that we were able to correctly classify 87% of liquidated vaults in the test set (out of sample).

The ROC curve shows classification performance with its trade-off between precision and recall. The aim is to maximize AUC (area under the curve) which varies from 0 and 100. AUC of 97 (out of 100) indicates an excellent performance on the out of sample data.

As ROC curves might be deceptive in their performance on imbalanced datasets, we next turn to the Precision-Recall curve which takes this better into account. We achieve a PR AUC (under-the-curve) of 67 which again indicates good performance, far better than a model with no predictive power.

The confusion matrix below shows actual predictions (Predicted Label) on the out of sample data and how they compare to ground truth values (True Label). As expected, most of the values are true negative (top left). Because we aim to maximize recall, we want to minimize the number of instances on the lower left (false negatives), potentially at the expense of decreasing precision (increasing number of instances in the upper right). In the bottom right we can see the true positives which is the key number for us to maximize.

Feature importance

We also explored how important (on average) each feature is for determining model output. Some of the features with the most impact were the proxy for ecosystem development, CR buffer, vault’s tenure and recent activity. We also looked at individual prediction interpretation with SHAP values to validate our assumptions of how various features either positively or negatively impact model output.

Model output

One key informative value is the total debt that is predicted to be liquidated in case of a historically comparable price shock. The current amount is $5.4 million. This is low compared to the current debt collateralized by volatile assets which is around $1 billion. This brings a figure that 0.5% of vault debt that can be liquidated is likely to be liquidated. In order to explain why such a low number is predicted, we need to dive deeper into the underlying training data.

Given that a few vaults contribute to the majority of debt exposure, we will focus on the top 10 largest vaults (almost 60% of total debt collateralized by volatile assets). In the table below we can see that most of them currently maintain a conservative margin of safety which would take from 50% to 70% daily drop for them to be liquidated. This is not taking into account other features which most often contribute to even lower probability of liquidation.

We can take the exploration further by modeling a fast initial price drop which decreases by vaults’ CR buffer and then apply the predictive model. At 0% initial drop, we can see the same amount that we extracted above. When we incrementally increase the additional price drop, we predict a liquidation of more and more debt. At 30% additional price drop, the predicted liquidated debt is $94 million (9.4% of total). At 50% the model predicts liquidation of more than 63% of total debt ($630 million). While an increasingly conservative scenario, this can show the extensibility of the modeling methodology.

Liquidated amount per additional CR buffer decrease

We can also choose an arbitrary point in time and use the model to predict the liquidated amount. It’s valuable to compare different time periods, each with different portfolio risk profiles and answer the “what if” questions of what could have happened if there was a market downturn at that point in time. Meanwhile, it’s necessary to not predict on days that are before the price shocks included in the training dataset.

As an example, we chose November 17th, 2021 because it satisfies the above condition and it also falls into a time frame when our portfolio risk metric (Capital at Risk) was at its peak. The aim is to compare a period of low portfolio risk (currently) and a high portfolio risk (related to ETH price and crypto sentiment being at its top).

The model returns $176 million as the predicted liquidated amount, while total debt collateralized by volatile assets at that point was $5.5 billion. This way we get that 3.3% of liquidatable debt would likely be liquidated which is 5x compared to the ratio computed for the current portfolio state. This shows how important it is to go beyond only total debt as the only metric to estimate portfolio risk.

Predicted liquidated value: September 2022 (low risk) vs. November 2021 (high risk)

Conclusion

In this post we introduced an ML modeling approach for predicting the probability of vault liquidation in the next day. While this methodology is used on MakerDAO’s system, it could further be extended to other DeFi lending protocols such as Aave and Compound.

We showed that Maker’s current portfolio risk (quantified by predicted liquidated debt) is low but can increase substantially if there is additional initial drop in vaults’ CR buffer (price shock without any vault owner counter-action). Meanwhile, we also compare it to the historical peak of portfolio risk which shows how important it is to deep dive into vaults’ historical behavior for a more holistic overview.

Furthermore, this methodology’s natural extension is also determining an entity’s (vault-level, wallet-level etc.) credit score. We could weigh an entity’s debt exposure across different protocols to determine a high-level score. As risk quantification continues to evolve as one of the key challenges in DeFi, we will continue developing our methodologies to bring additional robustness and resilience to the ecosystem.

Acknowledgements

Thanks to Primož Kordež, Angela Kreitenweis and Eryk Lewinson for reading earlier drafts of this article and providing valuable feedback.