Using Optuna to Optimize PyTorch Hyperparameters

Crissman Loomis
PyTorch

--

This post uses PyTorch v1.4 and optuna v1.3.0.

PyTorch + Optuna!

Optuna is a hyperparameter optimization framework applicable to machine learning frameworks and black-box optimization solvers. PyTorch is an open source machine learning framework use by may deep learning programmers and researchers. Let’s see how they can work together!

Creating the Objective Function

Optuna is a black-box optimizer, which means it needs an objectivefunction, which returns a numerical value to evaluate the performance of the hyperparameters, and decide where to sample in upcoming trials.

In our example, we will be doing this for identifying MNIST characters. In this case, the objective function looks like this:

Notice that the objective function is passed an Optuna specific argument of trial. This object is passed to the objective function to specify which hyperparameters should be tuned. This returns the accuracy of the model, which is used by Optuna as feedback on the performance of the trial.

Defining the hyperparameters to be tuned

Similar to how PyTorch uses Eager execution, Optuna allows you to define the kinds and ranges of hyperparameters you want to tune directly within your code using the trial object. This saves the effort of learning specialized syntax for hyperparameters, and also means you can use normal Python code for looping through or defining your hyperparameters.

Optuna supports a variety of hyperparameter settings, which can be used to optimize floats, integers, or discrete categorical values. Numerical values can be suggested from a logarithmic continuum as well. In our MNIST example, we optimize the hyperparameters here:

The optimizer itself is chosen from trial.suggest_categorical(“optimizer”, [“Adam”, “RMSprop”, “SGD”]), which chooses among Adam, RMSProp, and Stochastic Gradient Descent optimizers.

The learning rates for these optimizers varies by orders of magnitude, so trial.suggest_loguniform('learning_rate', 1e-5, 1e-1) is used, which will vary the values logarithmically from .00001 to 0.1.

For the definition of the model itself, Optuna leverages eager mode to allow normal Python looping to determine the number of layers and nodes in each layer with trial.suggest_int(“n_layers”, 1, 3)for the layers and trial.suggest_int(“n_units_l{}”.format(i), 4, 128) for the number of nodes in each layer.

Running the Trials

The default sampler in Optuna Tree-structured Parzen Estimater (TPE), which is a form of Bayesian Optimization. Optuna uses TPE to search more efficiently than a random search, by choosing points closer to previous good results.

To run the trials, create a study object to set the direction of optimization (maximize or minimize). Then, run the study object with study.optimize(objective, n_trials=100) to do one hundred trials.

Each trial is chosen after evaluating all the trials that have been previously done, using the TPEsampler to make smart guesses where the best values hyperparameters can be found. The best values from the trials can be accessed through study.best_trial. Other methods of viewing the trials, such as formatting in a dataframe, are also available.

Pruning — Early Stopping of Poor Trials

Pruning trials is a form of early-stopping which terminates unpromising trials so that computing time can be used for trials that show more potential. In order to do pruning, it’s necessary to provide intermittent feedback to Optuna from the objective function on how the trial is going. With this information, it can compare the progress with the progress of other trials, decide whether to stop the trial early, and receive messages from Optuna when the trial should be terminated. Then the objective function can smoothly terminate session after recording the results if necessary.

trial.report is used to communicate with Optuna about the progress of the trial. In this example, the objective function communicates the current epoch and accuracy. trial.should_prune() is how Optuna communicates to the objective function if it should terminate early.

To the Future, and Beyond!

Plot Contour Visualization

For those interested, Optuna has many other features, including visualizations, alternative samplers, optimizers, and pruning algorithms, as well as the ability to create user-defined versions as well. If you have more computing resources available, Optuna provides an easy interface for parallel trials to increase tuning speed.

Give Optuna a try!

--

--