Scaling up Optuna with Ray Tune

Published in

Optuna

4 min readOct 15, 2020

By Kai Fricke, Crissman Loomis, Richard Liaw

Optuna and Ray Tune are two of the leading tools for Hyperparameter Tuning in Python. Optuna provides an easy-to-use interface to advanced hyperparameter search algorithms like Tree-Parzen Estimators. This makes it an invaluable tool for modern machine learning engineers or data scientists and is a key reason for its popularity.

Optuna excels at single machine workloads, but parallelizing these workloads requires manual operation, possibly on multiple machines, and monitoring ability is not included. This can make operation especially challenging if you want to leverage GPUs as efficiently as possible. You not only need sensible choices for your parameters, but also a way to organize the execution. And this is where Ray Tune shines.

Ray Tune takes your training function and automatically parallelizes it, takes care of the resource management, and can even distribute it across a cluster of machines. And all you have to do is run a single script! In a perfect world, we would be able to use both the great algorithms from Optuna with the great scaling capabilities of Ray Tune.

Fortunately, this world exists! Ray Tune integrates with many hyperparameter searching algorithms and frameworks — and Optuna is one of these frameworks! Scaling up your Optuna hyperparameter search is just a matter of wrapping it in a Ray Tune run and making minor changes to your training function.

Even better, if you use Ray Tune as the computation backend, you can leverage advanced scheduling algorithms like Population Based Training that are currently not available in Optuna. You really get the best of both worlds!

How it works

Let’s take a look at the integration of Ray Tune with Optuna.

As you will see, using Optuna search algorithms with Ray Tune is as simple as passing search_alg=OptunaSearch() to the tune.run() call!

All you need to do to get started is install Ray Tune and Optuna:

pip install "ray[tune]" optuna

In this blog post we will use this PyTorch model to train an MNIST classifier from the Ray Tune examples. Here’s how the code would look like in Optuna:

If we want to do the same thing in Ray Tune, the code is very similar:

As you can see, the objective function is almost identical. Ray Tune automatically converts the search space to be compatible with Optuna and uses Optuna’s search algorithms for hyperparameter suggestions. Per default, these are the Tree-Parzen Estimators, but you can specify any search algorithms available in Optuna.

How it performs

Let’s do a simple comparison between the single-threaded Optuna implementation and the implementation with Ray Tune.

When we run the Optuna example, we get these (or similar) results:

[...]Trial 3 finished with value: -0.14375 and parameters: {‘lr’: 0.00020805933350346437, ‘momentum’: 0.3128649844613489}. Best is trial 1 with value: -0.94375.Time taken: 37.56 seconds.Best config: {‘lr’: 0.007383934499860177, ‘momentum’: 0.8961504427219955}

Training 10 trials in parallel for 20 epochs each took us about 37 seconds on my machine.

Let’s see how Ray Tune performs:

Current best trial: 776691b8 with mean_accuracy=0.809375 and parameters={‘lr’: 0.00255096035425189, ‘momentum’: 0.644771077808181}[…]Time taken: 12.29 seconds.Best config: {‘lr’: 0.00255096035425189, ‘momentum’: 0.644771077808181}

As you can see, the training run with Ray Tune finished about three times faster than with single-threaded Optuna. This is because Ray Tune was built from the ground up to be scalable and to leverage parallelism as much as possible.

Even better, because of Ray’s distributed computing model, scaling our tuning up to a cluster of tens, hundreds, or even thousands of nodes is just a matter of using the Ray Autoscaler — and you’ll still just have to call this same single script, with no code changes at all.

Conclusion

The two leading tools for Hyperparameter Tuning in Python, Ray Tune and Optuna, can be used together to get the best of both worlds. Ray Tune offers great parallelization and scalability across clusters of hundreds of machines, as well as advanced scheduling algorithms like Population Based Training. Optuna on the other hand brings state-of-the-art search algorithms to the table, allowing it to find the best hyperparameter combinations more efficiently.

Combining these two libraries only has benefits — and really brings hyperparameter optimization to the next level.

Are you using Optuna and trying to scale up your hyperparameter tuning with Ray Tune? We’d love to hear from you! Join our Slack here!

Scaling up Optuna with Ray Tune

How it works

How it performs

Conclusion

Written by Kai Fricke