Cutting edge hyperparameter tuning with Ray Tune
Ray Tune is a Python library that accelerates hyperparameter tuning by allowing you to leverage cutting edge optimization algorithms at scale.
Behind most of the major flashy results in machine learning is a graduate student (me) or engineer spending hours training a model and tuning algorithm parameters. This is the tedious boring work that makes the headlines possible.
Here in the RISELab, we’re finding it more and more necessary to leverage cutting-edge hyperparameter tuning tools to keep up with the state of the art. Advancements in deep learning performance are becoming more and more dependent on newer and better hyperparameter tuning algorithms such as Population Based Training (PBT), HyperBand, and ASHA.
These algorithms provide two critical benefits:
- They maximize model performance: e.g., DeepMind uses PBT to achieve superhuman performance on StarCraft; Waymo uses PBT to enable self-driving cars.
- They minimize training costs: HyperBand and ASHA converge to high-quality configurations in half the time taken by previous approaches; population-based data augmentation algorithms cut costs by orders of magnitude.
Yet, we see that a vast majority of researchers and teams do not leverage such algorithms.
Why? Well, most existing hyperparameter search frameworks do not have these newer optimization algorithms. And once you reach a certain scale, most existing solutions for parallel hyperparameter search can be a hassle to use — you’ll need to configure each machine for each run and often manage a separate database.
Practically speaking, implementing and maintaining these algorithms requires a significant amount of time and engineering.
But it doesn’t need to be this way. We believe there’s no reason why hyperparameter tuning at scale needs to be this hard. All AI researchers and engineers should be able to seamlessly run a parallel asynchronous grid search across 8 GPUs and even scale out to leverage Population Based Training or any Bayesian optimization algorithm on the cloud.
In this blog post, we’ll introduce Tune, a powerful hyperparameter tuning library built on Ray designed to remove the friction from scaling and setting up experiment execution and hyperparameter tuning.
Tune is a powerful Python library that accelerates hyperparameter tuning. Here are some core features:
- Tune provides distributed asynchronous optimization out of the box by leveraging Ray.
- You can scale a hyperparameter search from a single machine to a large distributed cluster without changing your code.
- Tune offers state of the art algorithms including (but not limited to) ASHA, BOHB, and Population-Based Training.
- Automatically visualize results with TensorBoard or MLFlow.
- Tune integrates with many optimization libraries such as Ax/Botorch, HyperOpt, and Bayesian Optimization and enables you to scale them transparently.
- Tune supports any machine learning framework, including PyTorch, TensorFlow, XGBoost, LightGBM, and Keras.
Beyond Tune’s core features, there are two primary reasons why researchers and developers prefer Tune over other existing hyperparameter tuning frameworks: scale and flexibility.
Tune simplifies scaling.
Leverage all of the cores and GPUs on your machine to perform parallel asynchronous hyperparameter tuning by adding fewer than 10 lines of Python.
With another configuration file and 4 lines of code, launch a massive distributed hyperparameter tuning cluster on the cloud and automatically shut down the machines (we’ll show you how to do this below).
With Tune’s built-in fault tolerance, trial migration, and cluster autoscaling, you can safely leverage spot (preemptible) instances and reduce cloud costs by up to 90%.
Tune is flexible.
Tune provides a flexible interface for optimization algorithms, allowing you to easily implement and scale new optimization algorithms.
Using Tune is Easy!
Let’s now dive into a concrete example that shows how you to leverage a popular early stopping algorithm (ASHA). We will start by running an example hyperparameter tuning script with Tune across all of the cores on your workstation. We’ll then scale out the same hyperparameter tuning experiment on the cloud with about 10 lines of code using Ray.
We’ll be using PyTorch in this example, but we also have examples for Tensorflow and Keras available.
Tune is packaged as part of Ray. To run this example, you will need to install the following:
pip install ray torch torchvision
We first run some imports :
Let’s write a neural network with PyTorch:
To start using Tune, add a simple logging statement to the PyTorch training below function.
Notice that there’s a couple helper functions in the above training script; you can see their definitions here.
Let’s run 1 trial, randomly sampling from a uniform distribution for learning rate and momentum.
Now, you’ve run your first Tune run! You can easily enable GPU usage by specifying GPU resources — see the documentation for more details. We can then plot the performance of this trial (requires
Parallel execution and early stopping
Parallelize your search across all available cores on your machine with
num_samples (extra trials will be queued).
You can use the same DataFrame plotting as the previous example. After running, if Tensorboard is installed, you can also use Tensorboard for visualizing results:
tensorboard --logdir ~/ray_results
Setting up a distributed hyperparameter search is often too much work. Tune and Ray make this seamless.
Launching the cloud with a simple configuration file
First, we’ll create a YAML file which configures a Ray cluster. As part of Ray, Tune interoperates very cleanly with the Ray cluster launcher. The same commands shown below will work on GCP, AWS, and local private clusters. We’ll use 3 worker nodes in addition to a head node, so we should have a total of 32 vCPUs on the cluster — allowing us to evaluate 32 hyperparameter configurations in parallel.
Putting things together
To distribute your hyperparameter search across the Ray cluster, you’ll need to append this to the top of your script:
Given the large increase in compute, we should be able to increase our search space and number of samples in our search space:
You can download a full version of the script in this blog here (as
Launching your experiment
To launch your experiment, you can run (assuming your code so far is in a file
$ ray submit tune-default.yaml tune_script.py --start \
This will launch your cluster on AWS, upload
tune_script.py onto the head node, and run
python tune_script localhost:6379, which is a port opened by Ray to enable distributed execution.
All of the output of your script will show up on your console. Note that the cluster will setup the head node first before any of the worker nodes, so at first you may see only 4 CPUs available. After some time, you can see 24 trials being executed in parallel, and the other trials will be queued up to be executed as soon as a trial is free.
To shut down your cluster, you can run:
$ ray down tune-default.yaml
And you’re done 🎉!
Tune has numerous other features that enable researchers and practitioners to accelerate their development. Other Tune features not covered in this blogpost include:
- A simple API for running distributed fault-tolerant experiments
- Distributed Hyperparameter search over Distributed Data Parallel training for PyTorch
- Population-based Training
For users that have access to the cloud, Tune and Ray provide a number of utilities that enable a seamless transition between development on your laptop and execution on the cloud. The documentation includes:
- running the experiment in a background session
- submitting trials to an existing experiment
- visualizing all results of a distributed experiment in TensorBoard.
Tune is designed to scale experiment execution and hyperparameter search with ease. If you have any comments or suggestions or are interested in contributing to Tune, you can reach out to me or the ray-dev mailing list.
Thanks to Allan Peng, Eric Liang, Joey Gonzalez, Ion Stoica, Eugene Vinitsky, Lisa Dunlap, Philipp Moritz, Andrew Tan, Alvin Wan, Daniel Rothchild, Brijen Thananjeyan, Alok Singh, Robert Nishihara (and maybe others?) for reading through various versions of this blog post!
Originally published at https://towardsdatascience.com on August 20, 2019.