30x Faster Hyperparameter Search with Ray Tune and RAPIDS
Increasing the accuracy of a machine learning model can translate into significant cost reductions, revenue increases, or even lives saved. But optimizing a model manually can be a time-consuming, labor-intensive process, particularly for models with a large number of different hyperparameters to tune by hand. Hyperparameter optimization (HPO) automates this process, using intelligent algorithms to explore model configurations to find one that maximizes accuracy.
While most data scientists are aware of options for hyperparameter optimization, they often find it impractical to incorporate HPO into everyday model-building for two key reasons:
- HPO can require significant development work, and
- It can increase model training time dramatically.
Ray Tune and RAPIDS have teamed up to address both concerns. Ray Tune is a scalable HPO library that allows the optimization to be performed in a distributed manner. It provides various search algorithms along with smarter ways to schedule them in order to arrive at the optimal solution quickly and efficiently. Tune is built on Ray, a system for easily scaling applications from a laptop to a cluster. RAPIDS is a suite of GPU-accelerated libraries for data science, including both ETL and machine learning tasks.
Many thanks to both Michael Demoret from NVIDIA for the original notebook and the team from AnyScale for the help with content review and feedback.
In this post, we will show how to both increase the accuracy of our Random Forest Classifier by 5% AND reduce tuning time by 30x. We’ll do this by walking through an end-to-end example of how to perform hyperparameter optimization with a Random Forest Classifier. The intended audience for this content includes any data scientist or data engineer who wants to run HPO experiments faster and more easily.
Scaling with RAPIDS and Ray Tune
Complex models like XGBoost and RandomForest can provide excellent accuracy. However, training large datasets with these models can take hours and sometimes days on CPUs, as they rely on several hyperparameters that need to be tuned.
Moving data between CPU and GPU based workflows is frequently a bottleneck. RAPIDS set out to eliminate these transfers by loading data, performing some ETL tasks, and training the model, all while staying entirely within GPU memory. By keeping the whole workflow on the GPU, processing times are greatly reduced.
Now, let’s look at how to use both Ray Tune and RAPIDS together to leverage their advantages.
Example: HPO with Ray Tune + RAPIDS
For this demo, we use the Airline dataset, which contains historical departure and arrival time data for millions of flights from the FAA. The aim of the model is to predict each flight’s arrival delay. To do this, we’ll make use of RAPIDS cuML RandomForestClassifier. The cuML library is part of the RAPIDS project, which implements machine learning algorithms. It enables users to run ML models on GPUs without knowing the details of CUDA. You can learn more about the library and contribute to the development here.
We will walk through a Jupyter Notebook to explain the approach we have taken. You can find the full details in the notebook here.
Below is a brief summary of the steps taken in the notebook:
- Download the dataset to a local directory and load it through cuDF into the GPU.
- Prepare the dataset for the problem by selecting columns we are interested in and discarding the rest. In this step, we also introduce a field called
“ArrDelayBinary”which is set to True if the airlines arrive beyond the
“delayed_threshold”and False otherwise. This turns it into a binary classification problem.
- Set up Tune training with the Trainable API.
- Define the experiment parameters and run the experiment.
Setting up the Trainable API
One way to run experiments in Tune is by using the Trainable class and defining functions within it to implement our experiment. We subclass
BaseClassTransformer. To do this, we will define functions
_setup, _build, _train, reset_config to create and run the experiment, and
_restore for checkpointing.
Let’s take a closer look at
_train to see how we are creating the model and evaluating the performance. In this notebook, we allow the possibility of choosing “CPU” mode, but it is not recommended to run this on larger ranges and data sizes. This is provided to study the performance.
To keep a clean separation between static configuration and varying hyperparameters, we will then wrap our
BaseTrainTransformer as follows:
class WrappedTrainable(BaseTrainTransformer): def __init__(self, *args, **kwargs): self._static_config = static_config super().__init__(*args, **kwargs)
Ray Tune provides various hyperparameter search algorithms to optimize the model efficiently. In this demo, we will have the option of choosing between 2 search algorithms:
BayesOpt in Ray Tune is powered by Bayesian Optimization, which attempts to find the best performing parameters in as few iterations as possible. The optimization technique is based on Bayesian inference and Gaussian processes. It attempts to find regions in the hyperparameter space that are worth exploring. At each step, a Gaussian Process is fitted to the known samples, and the posterior distribution, combined with an exploration strategy is used to determine the next point that should be explored. Eventually, it finds the combination of parameters that yield results that are close to the optimal results.
Scikit-optimize is a sequential model-based optimization technique. It is built on NumPy, SciPy, and scikit-learn.
These options can be selected in the notebook with
“SkOpt” to run the appropriate optimizer. It is worth noting how these two differ in performance in terms of finding the optimal parameters within a search space. Figure 3 and Figure 4 below illustrate the difference in performance between the optimizers.
In addition to search algorithms, Ray Tune also provides Trial Schedulers which allow early trial stopping, perturbing parameters to obtain the optimal parameters quicker. These make the search resource-efficient. We’ve included two options for scheduling in the demo:
This method stops a trial if its performance falls below the median performance of other trials at similar stages.
This enables early stopping using the HyperBand optimization algorithm, which divides the trials into brackets of varying sizes. Within each bracket, the low-performing trials are stopped early periodically. Ray Tune also provides an implementation of standard HyperBand. We recommend the asynchronous version because it provides more parallelism and avoids straggler issues. You can use these options as
Setting Up and Running the Experiment
The notebook cell under “setting up the experiment” has variables that define how many trials should be run, the number of rows to be selected for the run, the cross-validation folds, the search and scheduling algorithms, and the parameter ranges. Have a close look at this and select appropriate values before starting the experiment.
Once that is selected, we are now ready to run our experiment. The code is shown.
Notice how the config is defined to take the parameter ranges as Tune objects.
Results and Next Steps
From this experiment, we can see that HPO easily boosts the model performance. With minimal effort, we were able to achieve an improvement in accuracy from 72% to 77% with just 2.5M rows from 115M in the dataset. The total runtime for the experiment with 50 trials was just under half an hour, whereas the CPU version with just 25 trials took over 17 times longer. Ray Tune provides various powerful options for metrics, optimization, and scheduling algorithms to help arrive at the optimal solution efficiently. The next step would be to experiment with different search and scheduling algorithms and different RAPIDS cuML models on other data science problems.
GTC Digital Live Webinar
Hear Josh Patterson discuss more on the RAPIDS and Ray collaboration during the upcoming GTC Digital live webinar State of RAPIDS: Bridging the GPU Data Science Ecosystem [S22181] on May 28th at 9AM PDT.
Here are some links to learn more about the concepts discussed in the post:
- Algorithms for Hyperparameter Optimization
- 5 Classification Metrics
- 11 Important Model Evaluation Metrics
- Practical Bayesian Optimization of Machine Learning Algorithms
- Find RAPIDS and Ray Tune on Github
- Anyscale has a series of online events this Summer, called Ray Summit Connect, where you can learn more about Ray. For information, visit the events page.