A Multifarm Research Paper: Bribing Optimization

Multifarm.fi
12 min readAug 19, 2022

--

Bribes and bribing strategies in Curve and Convex protocols.

Decision tree classifier with binary

TLDR:

For the past few months, Multifarm initiated a research phase covering the bribing situation in Curve and Convex protocols. Formalizing a data-centric approach to figure a sustainable outcome in voting incentives. What have they found? Is there anything to learn about efficient ways to incentivize users to vote?

The idea behind it was to target TVL as the target variable and study its behavior over the bribing rounds. You might wonder why the focus was made on TVL. It’s on the grounds that TVL is a leading metric which is commonly used as a great signal showing changes in attractiveness and influence in pools as well as DAOs.

Over all of the models tested, the Decision Tree Classifier gave the best performance in predicting the binary outcome of TVL. Whether TVL would increase or decrease from one round to another. Reaching 79% of accuracy in the training set and nearly 75% in the testing set. The model enabled the identification of four sets of feature variables playing an important role, of which we can name CRV price, Gauge Weight, Total Voting Power, and the CRV rewards (token emissions)/bribes ratio.

The Decision Tree classifier model associated with a binary target variable leads to the best option when it comes to predicting TVL changes. From the final model, bribers could have saved on average 29% or $21 million of the total amount they bribed.

Contents:

  1. Introduction
  2. First Data Filters
  3. Our Target Variable
  4. Data Needs to be Smoothened
  5. Which Models Have the Best Fit?
  6. Our Model
  7. Conclusions
  8. Next Steps
  1. Introduction

For the past few months, we put our focus on “bribes” and “bribing strategies” happening in Curve and Convex protocol. What did we learn from bribes? Are there any strategies that could be applied to it? What were we able to predict from them? We will take you through our research and our findings.

We already had thoughts and ideas to investigate the bribing situation around Curve and Convex. We also had reason to believe there was a lot of room for improvement. In that regard, we needed to investigate. Our initial plan was to try and set a prediction model, giving us indications of outcomes from one round to another.

So we intended to start our “bribing’s prediction” journey by gathering the most data we could from Votium bribes on Convex and also from Curve bribes on bribe.crv. Our goal was to stack as much data among specific metrics such as CRV price, Gauge voting weight, Bribe amount, Number of votes, Unclaimed voting power, Total voting power, Total CRV rewards, CRV rewards/Bribes ratio, and Bribe USD per vote. All of which are metrics that could have a role to play with those kinds of incentives.

Our roadmap was quite simple. First, we needed to fetch the data. As much data as we could process coming from all the different bribers participating in voting rounds. The Data was fetched through Dune Analytics queries (SQL queries). Then, and finally, we could start to look into the model part. Build and implement our models with the data.

2. First Data Filters

To implement the data into the models, it had to be standardized one way or another. Basically, we couldn’t put data where there were missing variables. At this level, it was just not acceptable, and the outcome wouldn’t be accurate or representative.

Therefore, the data we fetched had to follow rules, only from those rules could we then decide whether the data could be included in the model calibration.

The rules were set as the following:

  • If we didn’t have the TVL (total value locked) data, the specific briber data couldn’t be used.
  • If we had the TVL data, but the data history was considered too short (less than 5 bribing rounds), then the briber couldn’t be used.
  • In the last case, if we had the TVL and the data history was long enough, we could then use the data from the briber.

In order to use data coming from a briber, we needed at least 5 rounds of TVL. It also means that we could only select Bribers that had participated in at least 5 bribing rounds back-to-back.

We had our rules set up, and could already filter some data. It was time to focus on the Target variable and models we were going to use.

3. Our Target Variable

The target variable is the variable we want to model. We are looking for some kind of correlation between the data we fetched over bribing rounds and the variable we want to study. In our case, it’s TVL we are looking at. In essence, we are using TVL as a driving variable for bribing activities. To figure out whether TVL will increase or decrease from one bribing round to another. In addition, we are also identifying parameters that are important and are the most correlated when predicting such an outcome.

A great question to ask would be “So why choose TVL”?

To find the answer, we need to take a step back and focus on bribing mechanics. Bribing is essentially a way of incentivizing users to “use their voting power” to direct underlying token emissions. In other words, a briber is paying you so that you vote for him, and so, he gets more token emissions and extracts more value from the underlying protocol.

At the same time, it’s the type of initiative that leads to a snowball effect. The more votes a pool gets, the more it attracts liquidity and volume. The more volume and liquidity, the more it generates fees and attracts votes. Typically, increasing TVL in a pool is a great indicator of a gain in attractiveness and influence from a DAO.

We intended to use TVL as our target variable. In order to do so, we needed to normalize it in a percentage change for each round and on each briber. Strictly speaking, we were looking to express TVL changes in % over the rounds and per briber.

Once we had the TVL in % we could then explore the three following model options to try: Binary Target Variable, Multi-class Target Variable, and Continuous Target Variable.

Binary Target Variable is a regression model where the target variable is binary and can only take two values, 0 & 1. In other words, we are looking for an increase or a decrease in TVL within the next bribing round. The model will try to predict: If TVL % > 1 then event 1 in other cases event 0.

Multi-class Target Variable is a model categorizing data and forming groups based on similarities. Bucket TVL % change into multiple classes and model TVL % change buckets.

Continuous Target Variable is a model where TVL % change is modeled directly as a continuous variable.

For instance, this is a Binary Target Variable representation, where TVL % change is the target variable over different Bribers in Votium voting rounds. Interestingly, it’s quite easy to read out a global common trend from the pie charts, where we see TVL from all the bribers decrease (blue) in comparison with an increasing trend (orange).

Shifting market trends, such as entering a bear market, could be a possible lecture of this kind of setup.

4. Data Needs to be Smoothened

We had our first rules for filtering the data. Our target variable was defined, and we had an idea of the model options we were going to implement. Now, we had to focus on the data that we were going to feed to the models.

To increase our chances of getting significant results, we needed to determine whether our original variables had extreme values in them.

A short-list of the variables (CRV price, Gauge weight, Bribe amount, Number of votes, Unclaimed voting power, Total voting power, Total CRV rewards, CRV rewards/bribes ratio, Bribes per vote) provided which are used for modeling.

  • CRV price: Curve token price
  • Gauge weight: pool voting share in %
  • Briber amount: the amount a briber has spent on a round in $USD
  • Number of votes: the amount of votes directed toward a pool
  • Unclaimed voting power: voting power unused
  • Total voting power: total amount of voting power (on Curve = total amount of veCRV, on Convex = total amount of vlCVX)
  • Total CRV rewards: Curve token emissions
  • CRV rewards/bribes ratio: Curve token emissions /the amount a briber has spent on a round in $USD
  • Bribes per vote: Bribe amount/number of votes

To get a better idea, we used histograms (top) and box plots (bottom). It is easily distinguishable, as shown below, that a large portion of those variables are skewed in the histograms and the box plots (those with long tails to the right or left).

This means we have extreme values in our variables, and they are heavily skewed by them.

Another great illustration of why using original variables with outliers is not a good idea can be seen in the figures below. It shows a simple linear regression with each of the nine variables being the independent variable and pool_tvl being the dependent variable.

As shown, it is quite misleading how poor the fit is for almost all of them. Therefore, it is vitally important that we conduct variable transformation on the independent variables so that we can achieve a better model performance.

Univariate linear regression for original variables

What transformation can we apply? Log transformation is a very common approach as it makes heavily skewed data more linear.

We can comfortably see the greater effect it has on the data and how the distribution of variables becomes more even after the log transformation. Histograms are more even, and box plots show left and right tails shorter after the log transformation than when compared with the original data.

Linear regression testings between TVL changes and each of the variables are showing a lot of improvements. Thus, the transformed data has a better fitting than the original.

Univariate linear regression for log-transformed variables

We can also compare the correlation plot matrix:

A: Correlation plot matrix of original variables

Even though the original variables (A) had higher correlations, the log transformation (B) shows much more reliable numbers.

B: Correlation plot matrix of log-transformed variables

Data is smooth, we can now move on to which model we are going to use as our final model.

5. Which Models Have the Best Fit?

There are multiple ways to predict the % change of TVL: as we saw previously we could use a Binary target, Multi-class Target, and Continuous Target model.

In the case of the Binary target model and Multi-class target, we are looking at a binary event on one side and a multi-class categorical event on the other. That means, we can apply the following algorithms to try and predict the outcome from one round to another.

Logistic Regression, which is a supervised algorithm for linear classification. It estimates the probability of an event occurring based on a given dataset of independent variables.

Decision Tree Classifier, which is a non-parametric supervised learning algorithm, utilizing both, classification and regression tasks. It has a hierarchical tree structure that consists of a root node, branches, internal nodes, and leaf nodes.

Random Forest Classifier, which is a commonly used machine learning algorithm, combines the output of multiple decision trees to reach a single result.

In the case of a Continuous Target Variable model, we can apply the two following algorithms and attempt to model the continuous TVL % change.

Ordinary Least Squares, which is a type of linear function method for estimating the unknown parameters in a linear regression model.

Generalized Linear Model, which is a flexible generalization of ordinary linear regression.

From here on forth, the Decision Tree Classifier has been determined to be our final option.

6. Our Model

We split the data between a testing and a training set for the calibration.
In the initial model, a cross-validation enabled search is used to find the initial parameters of the decision tree. Which means we are trying to determine what is the optimal depth values for a given model. It actually has a lot of incidences in the performance of a given model.

The Grid Search CV method shows the tree maximum depth is 2 and minimum leaf is 5. By using these parameters, the initial decision tree model is run and the balanced accuracy from training to testing set are very similar, 71% and 69%.
In this case, the model is not overfitting.

We tested max tree depth over training set and testing set.
As shown below, the two follow a similar trend until they split at max tree depth = 3.
We can translate it to: above 3, models loose in accuracy, so 3 should be max tree depth.

At max tree depth = 3, the balanced accuracy from the model is relatively high. We are looking at 79% in training set and 75% in testing set. The other evaluation metrics are also showing reasonably good performance

The confusion matrix shows the % of predictions that we defined correctly. The model has predicted 77% TVL decreasing rounds correctly and 81% TVL increasing rounds correctly.

From training to testing, the results are the same, which means that there are no signs of overfitting the training set.

The feature importance shows important variables that have an impact on whether TVL will increase or decrease over a round. We can see that CRV price, Gauge weight, Total voting power and CRV reward to bribe ratio are those important features.

7. Conclusions

What can we learn from the model?

Regardless of the other models, the Decision Tree Classifier gave us the best performance in predicting the binary outcome of TVL. With an accuracy nearly 79% in the training set and 75% in the testing set, we were able to identify the important feature variables which lead us to predict whether TVL will increase or decrease in a single round.

Decision Tree Classifier with a binary target variable is the best option to predict TVL. The other models either do not find enough significant variables and predict with a poor accuracy, or the machine learning methods used often overfit the training set and predict the testing set poorly.

What the final model tells us is, that if all the bribers follow the logic of the tree from the final model as shown below, they could have saved on average 29% of the total amount they bribed (or in total USD 21 million).

Decision tree classifier

8. Next Steps

Where do we go from here?

We will continue to feed the model with as much data we can gather. Likewise, this might change thresholds and the tree structure. With more data, we will also be able to improve accuracy and increase sample size in some end tree nodes.

Shifting into a bull market could also give an interesting outcome. Where we would see increasing TVL overall.

However, incentivizing through “bribes” is increasingly becoming the norm in most protocols where token emissions occur through a voting process. As per the screenshot, you can see how many millions of $USD are spent in bribes. This covers one round which is equivalent to a mere time period of 2 weeks (and only on Votium!). There’s a lot of voting power to be found.

If your protocol is interested in optimizing their bribes, please feel free to reach out to us at info@multifarm.fi or any of our socials.

Discord: discord.gg/aE3DTfXnGC

Twitter: https://twitter.com/MultifarmFi

Website: https://www.multifarm.fi/

--

--

Multifarm.fi

Find the best yield farming opportunities, get the top DeFi alpha.