Cutting edge hyperparameter tuning with Ray Tune

Ray Tune is a Python library that accelerates hyperparameter tuning by allowing you to leverage cutting edge optimization algorithms at scale.

Richard Liaw
riselab
7 min readAug 20, 2019

--

Behind most of the major flashy results in machine learning is a graduate student (me) or engineer spending hours training a model and tuning algorithm parameters. This is the tedious boring work that makes the headlines possible.

Here in the RISELab, we’re finding it more and more necessary to leverage cutting-edge hyperparameter tuning tools to keep up with the state of the art. Advancements in deep learning performance are becoming more and more dependent on newer and better hyperparameter tuning algorithms such as Population Based Training (PBT), HyperBand, and ASHA.

Population-based Training improves DeepMind’s algorithms on many domains by significant margins. Source: https://deepmind.com/blog/population-based-training-neural-networks/

These algorithms provide two critical benefits:

Yet, we see that a vast majority of researchers and teams do not leverage such algorithms.

Why? Well, most existing hyperparameter search frameworks do not have these newer optimization algorithms. And once you reach a certain scale, most existing solutions for parallel hyperparameter search can be a hassle to use — you’ll need to configure each machine for each run and often manage a separate database.

Practically speaking, implementing and maintaining these algorithms requires a significant amount of time and engineering.

But it doesn’t need to be this way. We believe there’s no reason why hyperparameter tuning at scale needs to be this hard. All AI researchers and engineers should be able to seamlessly run a parallel asynchronous grid search across 8 GPUs and even scale out to leverage Population Based Training or any Bayesian optimization algorithm on the cloud.

In this blog post, we’ll introduce Tune, a powerful hyperparameter tuning library built on Ray designed to remove the friction from scaling and setting up experiment execution and hyperparameter tuning.

Tune scales your training from a single machine to a large distributed cluster without changing your code.

Tune is a powerful Python library that accelerates hyperparameter tuning. Here are some core features:

  • Tune provides distributed asynchronous optimization out of the box by leveraging Ray.
  • You can scale a hyperparameter search from a single machine to a large distributed cluster without changing your code.
  • Tune offers state of the art algorithms including (but not limited to) ASHA, BOHB, and Population-Based Training.
  • Automatically visualize results with TensorBoard or MLFlow.
  • Tune integrates with many optimization libraries such as Ax/Botorch, HyperOpt, and Bayesian Optimization and enables you to scale them transparently.
  • Tune supports any machine learning framework, including PyTorch, TensorFlow, XGBoost, LightGBM, and Keras.

Beyond Tune’s core features, there are two primary reasons why researchers and developers prefer Tune over other existing hyperparameter tuning frameworks: scale and flexibility.

Note for Search Algorithms: as of 8/12/2019, HpBandSter supports HyperBand, Random Search, and BOHB. KerasTuner supports Random Search, HyperBand, and Bayesian Optimization. Optuna supports Median (Percentile) Stopping, ASHA, Random Search, and Bayesian Optimization (TPE). HyperOpt supports Bayesian Optimization and Random Search. Tune supports PBT, BOHB, ASHA, HyperBand, Median Stopping, Random Search, Bayesian Optimization (TPE, etc), and numerous others due to library integrations.

Tune simplifies scaling.

Leverage all of the cores and GPUs on your machine to perform parallel asynchronous hyperparameter tuning by adding fewer than 10 lines of Python.

If you run into any issues, please post in the comments!
https://twitter.com/MarcCoru/status/1080596327006945281

With another configuration file and 4 lines of code, launch a massive distributed hyperparameter tuning cluster on the cloud and automatically shut down the machines (we’ll show you how to do this below).

With Tune’s built-in fault tolerance, trial migration, and cluster autoscaling, you can safely leverage spot (preemptible) instances and reduce cloud costs by up to 90%.

Tune is flexible.

Tune integrates seamlessly with experiment management tools such as MLFlow and TensorBoard.

Tune provides a flexible interface for optimization algorithms, allowing you to easily implement and scale new optimization algorithms.

You can use Tune to leverage and scale many cutting edge optimization algorithms and libraries such as HyperOpt (below) and Ax without modifying any model training code.

Using Tune is Easy!

Let’s now dive into a concrete example that shows how you to leverage a popular early stopping algorithm (ASHA). We will start by running an example hyperparameter tuning script with Tune across all of the cores on your workstation. We’ll then scale out the same hyperparameter tuning experiment on the cloud with about 10 lines of code using Ray.

You can download a full version of the hyperparameter tuning code in this blog here (distributed experiment configuration here).

We’ll be using PyTorch in this example, but we also have examples for Tensorflow and Keras available.

Tune is packaged as part of Ray. To run this example, you will need to install the following: pip install ray torch torchvision

We first run some imports :

header of `tune_script.py`

Let’s write a neural network with PyTorch:

To start using Tune, add a simple logging statement to the PyTorch training below function.

Notice that there’s a couple helper functions in the above training script; you can see their definitions here.

Running Tune

Let’s run 1 trial, randomly sampling from a uniform distribution for learning rate and momentum.

Now, you’ve run your first Tune run! You can easily enable GPU usage by specifying GPU resources — see the documentation for more details. We can then plot the performance of this trial (requires matplotlib).

Parallel execution and early stopping

Early stopping with ASHA.

Let’s integrate ASHA, a scalable algorithm for early stopping (blog post and paper). ASHA terminates trials that are less promising and allocates more time and resources to more promising trials.

Parallelize your search across all available cores on your machine with num_samples (extra trials will be queued).

You can use the same DataFrame plotting as the previous example. After running, if Tensorboard is installed, you can also use Tensorboard for visualizing results: tensorboard --logdir ~/ray_results

Going distributed

Setting up a distributed hyperparameter search is often too much work. Tune and Ray make this seamless.

Launching the cloud with a simple configuration file

Launch a cluster and distribute hyperparameter search without changing your code

First, we’ll create a YAML file which configures a Ray cluster. As part of Ray, Tune interoperates very cleanly with the Ray cluster launcher. The same commands shown below will work on GCP, AWS, and local private clusters. We’ll use 3 worker nodes in addition to a head node, so we should have a total of 32 vCPUs on the cluster — allowing us to evaluate 32 hyperparameter configurations in parallel.

tune-default.yaml

Putting things together

To distribute your hyperparameter search across the Ray cluster, you’ll need to append this to the top of your script:

Given the large increase in compute, we should be able to increase our search space and number of samples in our search space:

You can download a full version of the script in this blog here (as tune_script.py).

Launching your experiment

To launch your experiment, you can run (assuming your code so far is in a file tune_script.py):

This will launch your cluster on AWS, upload tune_script.py onto the head node, and run python tune_script localhost:6379, which is a port opened by Ray to enable distributed execution.

All of the output of your script will show up on your console. Note that the cluster will setup the head node first before any of the worker nodes, so at first you may see only 4 CPUs available. After some time, you can see 24 trials being executed in parallel, and the other trials will be queued up to be executed as soon as a trial is free.

To shut down your cluster, you can run:

And you’re done 🎉!

Learn more:

Tune has numerous other features that enable researchers and practitioners to accelerate their development. Other Tune features not covered in this blogpost include:

For users that have access to the cloud, Tune and Ray provide a number of utilities that enable a seamless transition between development on your laptop and execution on the cloud. The documentation includes:

  • running the experiment in a background session
  • submitting trials to an existing experiment
  • visualizing all results of a distributed experiment in TensorBoard.

Tune is designed to scale experiment execution and hyperparameter search with ease. If you have any comments or suggestions or are interested in contributing to Tune, you can reach out to me or the ray-dev mailing list.

Code: https://github.com/ray-project/ray/tree/master/python/ray/tune
Docs: http://ray.readthedocs.io/en/latest/tune.html

Thanks to Allan Peng, Eric Liang, Joey Gonzalez, Ion Stoica, Eugene Vinitsky, Lisa Dunlap, Philipp Moritz, Andrew Tan, Alvin Wan, Daniel Rothchild, Brijen Thananjeyan, Alok Singh, Robert Nishihara (and maybe others?) for reading through various versions of this blog post!

Originally published at https://towardsdatascience.com on August 20, 2019.

--

--

Richard Liaw
riselab

PhD Student at UC Berkeley — BAIR and RISELab