Enhancing Risk Management in Lending Protocols using AI pt.1
The world of DeFi is evolving rapidly while its adoption is growing among financial-users all around the world. A crucial role in this scenario is played by Lending Protocols, markets where users can lend or borrow assets. However, this process is not risk-free: in this and further related articles, we will implement a data-driven approach to this problem, where we’ll try to improve current solutions and we’ll try to build a new one using Machine Learning.
Context
The evolution of lending protocols can be summarised in three steps: a first phase in which the first protocols were created, a second phase in which its development became democratised and decentralised through governance mechanisms, and a third phase (the current one) in which efforts are being made to optimise them, as well as to maximise the lending capacity of users while ensuring proper risk management. Being an early-stage sector, there are still few solutions to optimise protocol parameters. One of the most popular is the one developed by Gauntlet, where multi-agent based simulations calculate risk parameters to maximise users’ lending capacity, proposing via governance mechanisms changes to risk parameters such as the Collateral Factor. This solution is currently the most widely used among the most popular protocols such as Compound and Aave. However, it has several limitations such as:
- Proprietary solution. The solution offered is developed by a company that owns the algorithms, where no one can verify and sanction the veracity of the data.
- Lack of flexibility. The parameters are updated at very long intervals, often several months, making it impossible to adapt to sudden changes in market conditions.
- Excessive centralisation. All algorithms used to calculate risk parameters are run on centralised servers, contrary to the principles behind DeFi.
We strongly believe that the solution to risk management should be open source, arbitrary and permisionless. A solution that can be individually managed by each borrower, who must be aware of the risks and benefits of a loan. A solution with high reliability and performance, as only AI can guarantee. In this series of articles, we would like to discuss our approach to risk management in every step of development, from data analysis to the development of the ML model. The articles will be structured as follows:
- Data Analysis (i.e. this one) in which we will introduce the problem and through a data-driven approach we will study the lending protocol market on a global level, defining which markets present higher risks helping us defining the way we’ll follow to approach this problem.
- Dataset creation. In this second article, we will disscuss which variables will be considered for the training of the model, after a scrupulous analysis of the features.
- Model Development. In last article, we’ll discuss the development of the model, from its architecture to its parameters optimization, until the discussion of its performances.
Our aim is to propose a new solution, based on a democratic and decentralized AI system, that we’ll try to promote mainly in two ways: developing DeFi Protocols by our own on top of most innovative chains that fit better our aim of a new blockchain wave, and offering it to already-existing protocols.
Introduction
In this first article, we will address some general issues that are necessary to correctly set up an efficient, flexible and performing AI model. Specifically, we will address two problems:
- We’ll firstly analyze the liquidations events, defining what are riskier assets in terms of amount liquidated and in terms of frequency of liquidations event.
- The second step is an analysis based on time variable, where we won’t analyze globally the assets of the lending protocol but we’ll see it from a time-perspective.
In this article we will address these problems using a data-driver approach: we will extract the data on which we will perform our analysis while displaying results obtained, meticulously describing each process and output obtained.
We have created a repository where we share the source code and the outputs obtained, which can be consulted by anyone. We invite you to visit our repo and leave a star!
Liquidations Analysis
Before approaching the time-analysis, we’ll introduce a quick analysis on assets liquidations. We’ll see what are assets most commonly used for collateral and what are the liquidations shares.
The first phase is data extraction. As we plan to build lending protocols based on the Compound architecture, we restrict our analysis to this platform for now. We will extract data on version V3, the most recent version of the dApp. The Data Gathering method used is the queries made via TheGraph protocol. In this first phase, we extract the data of all the Markets present, obtaining the settlement account and the total amount in USD. The query used is as follows:
{
markets(block: {number: 17493265})
{
name
liquidationThreshold
maximumLTV
liquidates{
hash
amountUSD
blockNumber
timestamp
}
}
}
The output data have been processed (you can check how in the function getLiquidationsRawData) and two charts have been built:
This one represent the distribution of liquidations count.
While this other one shows the distribution of total amount of USD liquidated.
Time Analysis
The second analysis we are going to conduct is related to the time of liquidation.
Even in this case the Data Gathering phase will be conducted using TheGraph. This approach allow us to safely and fastly gather on-chain data. Now that we know the lending pools to analyze, we can execute this query:
{
markets(where: {id: "poolId"})
{
liquidates
{
amountUSD
blockNumber
timestamp
}
}
}
We are basically getting historic liquidations events happened on the given Lending Pool (in our case, USDC-WBTC and USDC-WETH). Processing these data over the days where subgraph are active (from August, 2022 until today) we are able to get two arrays with three fields: the first one is the date, the second one is the count of liquidations on that day and the third one is the total USD amount liquidated on that day. In the following charts are displayed the output for both BTC and WETH tokens:
Those two are the liquidations data related to WBTC Token.
These other two are related to WETH.
As you can see, there are more liquidations on WETH since it has a lower Maret Cap and is more volatile, but liquidations on WBTC when happen are worth much more.
Building a statistical method is helpful to better understand how liquidations events works on the time-perspective. This is exactly what we are going to do.
In a nutshell, by means of the same query performed earlier we iterate over all settlement events and calculate how much time elapses between two settlement events. The result is stored in one of the following variables:
- Less than one hour.
- Between one hour and one day.
- Between one day and one week.
- Between one week and two weeks.
- More than one week.
By calculating the frequency of each of these variables, we can develop a histogram to which we can associate a Gaussian distribution. Again, we have shared all the source code used to develop these models. Results obtained are as follows (Categories are shown in the same order as the list above):
This histogram shows WETH liquidation events distribution.
This other chart shows WBTC liquidation events distribution. It shows a better-distribution in frequency of variables but an higher probability for >14 days variable. That’s because of the lower frequency feature described above.
By considering the categories mentioned, we can conclude saying that in the WETH liquidations, by considering a frequency of 2 weeks we can prevent up to 43% of liquidations events. In the WBTC case, since its lower frequency, with the same timeframe considered we could prevent up to 60% of liquidations events.
Conclusions
This first analysis of liquidations events were extremely useful. We were able to determine which are tokens more sensible to liquidations events, and how the liquidations events of this assets work over the time.
In the next section, we’ll start working on our own solution: through an accurate data gathering and data modeling phase, we will be able to build our dataset. Let’s wait for the next article to know more details. Hope you enjoyed the article!