Finding optimal weights for taking weighted average to ensemble models using Optuna

5 min readJan 19, 2023

Ensembling by taking a weighted average of different machine learning models’ predictions can result in an improved score in comparison to taking the simple average. But the problem arises when deciding what weights to use for which model. Even if we do manage to somehow come up with some weights that improve our score in comparison to the simple average, how do we know that these are the optimal weights? This is where this article comes in, using a very intuitive, effective, and fast hyperparameter framework like Optuna and leveraging its high modularity to find the optimal values for weights, that result in the best score possible.

The dataset that we’ll use for our demonstration is perhaps the most popular in intro-level machine learning courses and literature — you guessed it — the California Housing Dataset. We’ll predict the median house values using this dataset, and score and compare the quality of our ensembles using the Root Mean Square metric. We’ll use models like Ridge (Linear Regression with Regularization), XGBoost — the most popular gradient boosting framework for tabular data, and LightGBM.

Let’s start by importing the required libraries.

Now let’s fetch our dataset.

The dataset doesn’t contain any missing values (code available in the attached notebook). So, let’s store target values in a separate variable and then drop the target column from our dataset. Also, let’s split the dataset into a training and a validation set.

Before we optimize the weights for the weighted average, let’s first set a baseline by taking a simple average over all models’ predictions.

We’ll use the default parameter values for all our models.

We get an MSE score of 0.26. Let’s see if we can beat it using the optimal weighted average.

But, before we go ahead and implement weights optimization using Optuna, it’d be helpful to give a gentle introduction to Optuna and how it works, especially for those not familiar with this framework.

A Gentle Introduction to Optuna

Optuna is an automatic hyperparameter optimization software framework, particularly designed for machine learning. By default, it uses a Bayesian optimizing algorithm (TPE) which is way faster than using something like GridSearchCV which checks for each set of hyperparameters values defining the search space. The reason why Bayesian optimization is faster than grid search is that it uses information from the previous iterations of the search to choose future hyperparameters values instead of treating each set of values independently.

Due to Optuna’s high modularity, we can use it for our task of finding optimal weights for taking a weighted average, and much more.

How does Optuna work?

Optuna requires us to define an objective function, in which we define the values ranges for each hyperparameter that we wish to tune a.k.a. optimize. In this function, we also write code for model training — using the hyperparameter values that Optuna will select algorithmically from our specified ranges, for predicting targets for the validation set, and for computing a metric for those predictions. Finally, we return the score or metric value that we computed.

Optuna then requires us to create a new study, pass it a direction parameter, and call its optimize function — passing it the objective function we created earlier, and the number of trials. Number of trials determines how many times the study object will run the objective function, get the score value in return, and then depending on the return value and the direction that we specified for the direction parameter (maximize or minimize), it will choose the next set of values for hyperparameters that it infers will achieve a better score.

Okay enough with the theory, let’s get our hands dirty with actual implementation!

In case you’re wondering why are we passing the same variable name as an argument to the suggest_int function, it is because it’s a way of telling Optuna to store the parameter value against that name — basically using it as a key, and storing that key-value pair inside its memory. For instance, when you’d access your optimized values after the study has been completed, Optuna will return a dictionary of key-value pairs, with keys being the names you specified in the suggest function argument, and values being the optimized values of hyperparameters.

It is important to remember that all the names you pass to functions should be unique!

Next up, we create a new study and call its optimize method. We pass in minimize to the direction parameter, because the lesser the mean square error (mse), the better our model or ensemble is.

After the study completes, we can access the best score achieved using study.best_value, which in this case returns 0.21, and access the parameters that achieved this best value using study.best_params. Please note that the best_params attribute in this case will return only the values for two weights for which Optuna was suggesting values. The formula for the third one was defined by us. But it’s straightforward to get its best value. All you have to do is subtract the other two weights’ best values from 100, just like in the objective function.

Anyway, our ensembling using an optimized weighted average has brought the mean square error (mse) to 0.21 in comparison to 0.26 which we got from taking a simple average of the predictions of the same models. Hence, the advantage of optimizing weights for taking a weighted average looks pretty clear.

Takeaways:

1. Taking a weighted average can significantly improve our score when compared to the simple average, as it did in our case. But we need a way to find optimized weights for taking the average.

2. Thanks to the high modularity of Optuna, we can find optimal weights for our ensemble, very quickly and effectively.

I hope you found this article useful! You can find the notebook with all the code and comments on GitHub. Since this is my first article, so I’d welcome any feedback as it would help me improve and produce better quality content. You can reach me on Twitter or in the comments.

Finding optimal weights for taking weighted average to ensemble models using Optuna

A Gentle Introduction to Optuna

How does Optuna work?

Takeaways:

Written by Khawaja Abaid