On-chain lending data aggregation and analysis. Part 1. Historical health factor
This article focuses on the challenges and solutions we faced when building a robust on-chain reputation scoring mechanism within the collateralized DeFi segment. At the current stage, we are building a dataset, based on Aave that unifies health scores across lending protocols by aggregating and analyzing on-chain lenders’ data. In DeFi terminology, health factor is a real-time vision of borrowers’ solvency — a metric used by the lending protocols to establish whether a borrower has enough coverage to repay their loans.
We started with transactions on Aave, the largest lending protocols by total value locked (TVL), and proposed the following requirements:
- Unified data schema: all entries for different protocols should have identical data mapping.
- Comparable attributes: transactions should be normalized to a common base currency.
- Adequate features: we need to define and implement features sufficient to calculate unified health factors.
Based on the requirements, we decided on three components for sourcing and processing the data: The Graph, DeFi Crawler, and CCXT.
Main Source of Data
We selected The Graph, a decentralized protocol, to query user information and transaction history for the lending platforms. The protocol advocates collaboration of several roles: developer, indexer, curator, and delegator. Developers create code and script, called subgraphs, for what data they want. Then, indexers use subgraphs as a guideline while taking care of the infrastructure to store and process data off-chain and provide query service for a fee. Besides, curators discover promising subgraphs, and delegators attest to indexers’ reputation for rewards.
A subgraph is a set of instructions for indexers. Indexers “listen” to smart contracts, handle transaction events, map events to entities, and store entities. For example, the subgraph for Aave Protocol V2 requests indexers to listen to borrow events. When a borrow occurs on Aave, indexers should capture and map entities specified in the script, such as id, amount, and timestamp. Data consumers refer to subgraph schemas to compose queries.
The query to obtain the list of Aave borrows with id, amount, and timestamp is:
Subgraph schemas are not normalized, so the query for the same information among subgraphs can be different. If we want borrows on Compound with the same attributes, the request looks like this:
The inconsistency among subgraphs makes aggregating data from multiple lending platforms complex. Transaction-type-specific attributes also cause complications; borrow rate only appears in borrows, and liquidator only appears in liquidations. Thus, we resorted to the tool DeFi Crawler.
Accessing the Data
We selected a middleware, DeFi Crawler, to hide the intricacy of requesting multiple subgraph APIs with various fields. DeFi Crawler is a Python library that structures API requests and maps subgraph schemas to standard attributes. Data consumers can get a set of transactions with unified fields (time, transaction id, user, token, and amount) by specifying any lending protocol, time range, and transaction type.
The sample data frame for Aave borrows:
Even after implementing DeFi Crawler, we realized that additional customization was required due to a limitation — its schema didn’t allow us to move all the necessary fields from subgraphs to our dataset. We modified the query schema to collect attributes such as total borrow amount, the total deposited amount at the time of the transaction, whether a certain deposit was provided as collateral, and historical Loan-to-Value ratios.
The updated sample data frame for Aave borrows:
Then, we need to resolve that transactions are still with different tokens.
For health factor calculations, all amounts need to be normalized to the base currency — Ethereum. We picked CCXT, a Python library that provides market data from major cryptocurrency exchanges. We chose Binance, a trading platform with dominating volume, as the source. Since some currencies are not directly convertible to ETH on Binance, such as AKRO, we created synthetic pairs for them. For AKRO to ETH, we used USDT as the intermediary currency and calculated the rate as if exchange AKRO to USDT and USDT to ETH. We prepared the exchange rate for all direct and indirect pairs in one-minute granularity and converted amounts with the nearest-minute rate.
By putting all the above together, we created the following features: total collateralized balance, total borrowed balance, and health factor at the time of transaction.
For each transaction, the total collateralized balance is calculated by summing the amounts of individual tokens specified as collateral and converted to the base currency. It also accounts for balance changes — when a deposit or withdrawal happens, the tokens’ collateral status updates.
Similar to the calculation of collateralized balance, the total borrowed balance is the sum of debt tokens converted to the base currency. The calculation is simpler since it only pertains to the sum of a given wallet’s stable and variable interest debt tokens.
After we have the above two features in the same base currency, the health factor is the total collateralized balance divided by the total borrowed balance multiplied by the liquidation threshold. The health factor is updated when either balance changes.
The created features are available on our project website, incyd.io.
We will continue adding more networks, e.g., Compound and Cream, to the dataset and introducing holdings under combined health factors.