Visualizing Hyperparameter Optimization with Hyperopt and Plotly — States Title

Published in

Doma

8 min readNov 4, 2020

A machine learning (ML) model is rarely ready to be launched into production without tuning. Like bindings on a ski or the knobs and levers in an aircraft cockpit, catastrophe can ensue for those who venture out into the open expanses of AI without all the proper settings baked in prior to launch. That’s why hyperparameter tuning — the science of choosing all the right settings for ML — is a core competency of the data science team at States Title.

But, picking the correct hyperparameter settings for a particular ML problem is tricky to get right. Unlike device settings that we’re used to in everyday life — such as turning the volume knob up one notch or level — even small changes in hyperparameters can produce large and initially counterintuitive changes in model performance.

Fortunately, there’s been a lot of research into “hyperparameter optimization” algorithms that automatically search for the best set of hyperparameters after a little configuration. These algorithms are widely accessible through robust implementations in various Python packages. For example, hyperopt is a widely used package that allows data scientists to utilize several powerful algorithms for hyperparameter optimization simply by defining an objective function and declaring a search space.

Relying on these tools, however, can turn hyperparameter selection into a “black box” that makes it hard to explain why a particular set of hyperparameters works best for a problem. One way to overcome the “black box” nature of these algorithms is to visualize the history of hyperparameter settings that were tried during hyperparameter optimization to help identify underlying trends in hyperparameter settings that work well.

In this post, we’ll demonstrate how to create effective interactive visualizations of hyperparameter settings that allow us to understand the relationship between hyperparameter settings that were tried during hyperparameter optimization. Part 1 of this post will set up a simple hyperparameter optimization example with hyperopt. In part 2, we’ll show how to use Plotly to create an interactive visualization of the data generated by the hyperparameter optimization in part 1.

Prerequisite

This post assumes that the reader is familiar with the concept of hyperparameter optimization. Additionally, although we will create an example hyperparameter optimization to generate the data that we will visualize, we will not go into great detail about this optimization since the intent of this post is not to be a tutorial on hyperopt; check out the hyperopt documentation for a complete tutorial.

For conciseness, code examples will be presented assuming that all necessary imports have already been run. For reference, here is the full set of imports that are needed for the code examples:

Part 1: Setting up an example hyperparameter optimization with hyperopt

Before we can dive into visualization with Plotly, we need to generate some hyperparameter optimization data from hyperopt for us to visualize. There are four key steps that we need to follow to set up a hyperparameter optimization with hyperopt:

Choosing and loading a dataset
Declaring the hyperparameter search space
Defining the objective function
Running the hyperparameter optimization

We’ll provide a brief description along with example code for each of these steps but we won’t go into great detail to justify specific choices since the goal of this hyperparameter optimization is just to generate data for us to visualize. To run the example notebook, please follow the instructions in the file “README.md” in this directory. Additionally, a static (HTML) version of the notebook with output from running the notebook is available here.

Choosing and loading a dataset

We’ll use the UCI Boston dataset as our example dataset for our hyperparameter optimization.

The features for the UCI Boston dataset are various neighborhood characteristics and the target is the median value of homes in that neighborhood. Scikit-learn provides a convenient wrapper function named load_boston. We’ll use this function to load the dataset into a Pandas dataframe as follows:

Here’s what the first five rows of the dataset look like:

Declaring the hyperparameter search space

Since the target is continuous for this dataset, we will want to compare a couple of different regression models. We’ll set up the hyperparameter optimization to compare two types of models: random forest regressor and gradient boosting regressor. The random forest regressor will allow hyperopt to tune the number of trees and the max depth of each tree. The gradient boosting regressor will allow hyperopt to tune the learning rate, in addition to the number of trees and max depth of each tree. The following dictionary declares this hyperparameter search space in the format that’s expected by hyperopt:

Defining the objective function

For the objective function, we’ll compute the mean squared error for each instance of the dataset via a ten-fold cross validation. We’ll report the average mean squared error across folds as the loss. The following code defines this objective:

Running the hyperparameter optimization

We’ll run the hyperparameter optimization for one thousand trials by calling the fmin function. Importantly, we’ll provide an instance of a Trials object in which hyperopt will record the hyperparameter settings of each iteration of the hyperparameter optimization. We will pull the data for the visualization from this Trials instance. Running the following code executes the hyperparameter optimization:

Part 2: hyperparameter optimization visualization

Hyperopt records the history of hyperparameter settings that are tried during hyperparameter optimization in the instance of the Trials object that we provided as an argument to our call to fmin. After the optimization completes, we can inspect the trials variable to see what settings hyperopt selected for the first five trials as follows:

As you can see, the trials instance is essentially a list of dictionaries where each dictionary contains detailed data about one iteration of the hyperparameter optimization. This is not a particularly easy format to manipulate, so we’ll convert the relevant bits of data into a Pandas dataframe where each row of the dataframe contains information for one trial:

Let’s take another look at the first five trials in this new format:

This is much more manageable than the list of dictionaries we had before.

Plotting trial number vs. loss with Plotly Express

One useful way to visualize the trial iterations is to plot the trial number vs. the loss to see whether hyperparameter optimization converged over time as we expect. Using Plotly’s high-level Express interface makes this easy; we simply call the scatter method on our dataframe and indicate which columns we want to use as x and y values:

This generates the following plot:

One interesting feature of this plot is that there is a clear separation between the bottom row of dots with “loss” values in the range ten-to-twelve and the rest of the dots. We need more information to understand what causes this separation. One hypothesis is that the separation is caused by different model types; for example, the bottom row of dots might all be gradient boosting regressor models and the rest of the dots might all be random forest regressor models.

Let’s color the dots for each model type differently to see if this is the case by adding a color argument to the call of the scatter method as follows:

With this, we get the following plot:

Interestingly, we see that model type does not completely explain the gap between the bottom row of dots and the rest of the dots since gradient boosting regressor models show up in the rest of the dots as well.

We can add more depth to the information in our visualization by creating an interactive visualization such that when we hover over each point we can see the set of hyperparameters that resulted in the loss for that point. At first, it looks like we should be able to achieve this by simply including a value for the hover_data argument of the scatter method. But, since we only want to include the hyperparameters that are relevant to each model type for each point, we will need to call the update_trace method after our call to scatter to add the hover data as this allows us to filter which hyperparameters are shown for each point. Here is what that looks like:

Finally, we get the following plot (recommended: view interactive version of this plot here):

What can we tell with this additional detail? Well, hovering over the points corresponding to the best models on the bottom row reveals that the max_depth parameter is set three for each of those points. Additionally, hovering over points outside of that row reveals that the max_depth parameter is set to values other than three, such as two, four, and five. This indicates that there may be something special about a max_depth of three for our dataset. For example, this may indicate that model performance is primarily driven by three of the features. We will want to further investigate why max_depth of three works so well for our dataset and we will likely want to set max_depth to three for the final model that we build and deploy.

Creating contour plots between features

Another visualization that can improve our intuition about the hyperparameter settings is a contour plot of the “loss” values in terms of the hyperparameters. Contour plots are particularly powerful because they reveal how the interaction between different hyperparameter settings affects the loss. Generally, we will want to generate a separate contour plot for each pair of hyperparameters. To keep things simple in this case, we’ll fix the value of the max_depth hyperparameter to three and plot the learning_rate vs. n_estimators loss contour for that slice of the data. We can create this contour plot with Plotly by running the following:

This results in the following plot (recommended: view interactive version of this plot here):

One takeaway from this plot is that we may want to experiment with increasing the maximum value n_estimators hyperparameter since the areas of lowest loss appear at the top of this plot.

Wrapping up

In this post, we’ve covered how to convert the data contained in the trials object into a Pandas dataframe so that we can easily analyze the history of hyperparameter settings. Once we have the data in a dataframe, we can easily create visualizations that allow us to better intuition about why a particular set of hyperparameter settings is best. In particular, we’ve shown that adding depth to the visualization by creating a simple interactive visualization with Plotly can reveal a number of interesting trends about the hyperparameter settings that work well for our problem. In addition, we’ve shown that contour plots are useful for indicating adjustments we may want to make to the hyperparameter search space.

Originally published at https://www.statestitle.com.