Hyperparameter Tuning: Ray[Tune] Framework
How to make your optimization process faster
Ever seen a company or startup boasting about the SOTA results they got? How on the same algorithm or Neural Network you’re getting different results than the state-of-art as mentioned in the research paper. The answer you’re looking for is Hyper-parameter Optimization. Behind every flashy result, you see is a long tedious work of optimizing and tweaking the model parameters to get the best result possible out of that algorithm.
There are plenty of frameworks in place that you can employ to get this task done. In this blog post, we’ll focus on Ray-Tune Framework and how to use it on a tabular dataset.
What is Ray-Tune?
Ray is a general-purpose framework for programming a cluster. Ray enables developers to easily parallelize their Python applications or build new ones, and run them at any scale, from a laptop to a large cluster. Ray provides a highly flexible, yet minimalist and easy to use API.
The motive behind Ray Framework is to parallelize your current python applications and run them on any scale as stated above. Ray provides a simple API that can be used with your existing applications as well. You can go through their API documentation here.
Let’s see how Ray can drastically decrease run-times by using a test function in python
Now imagine how employing this parallelization technique into computing different hyperparameters for your application. Yes, it will in-turn drastically reduce the run-times. This is where Ray-Tune kicks in. Tune is a hyperparameter optimization library built on top of Ray Framework. Think of it as seamlessly running a parallel asynchronous grid search across 8 GPUs.
Tune is a powerful Python library built on top of the Ray framework that accelerates hyperparameter tuning.
Core features include — but not limited to:
- Launch a multi-node distributed hyperparameter sweep in less than 10 lines of code.
- Supports any machine learning framework, including PyTorch, XGBoost, MXNet, and Keras. See examples here.
- Natively integrates with optimization libraries such as HyperOpt, Bayesian Optimization, and Facebook Ax.
- Choose among scalable algorithms such as Population Based Training (PBT) (used at DeepMind), Vizier’s Median Stopping Rule, HyperBand/ASHA.
- Visualize results with TensorBoard.
How to download?
The authors have developed a Python API which can be downloaded through pip
pip install ray filelock
I suggest creating a new virtualenv or utilizing an existing one from the project. See venv to build one
Implementation on Iris Dataset using TensorFlow
As you already read above that we’re going to implement Ray-Tune on a tabular dataset and see how optimization works step-by-step.
Note: In this blog, we’ll be using Tune’s function-based API.
We’ll follow the following steps through our implementation:
- Visualizing the data
- Creating a model training procedure (using Keras)
- Tuning the model by adapting the above model training procedure to implement Tune.
- Analyzing and Comparing the model optimized by Tune vs Vanilla Training procedure.
Let’s start with importing essential modules
Visualizing the Iris Dataset
Let’s first take a look at the distribution of the dataset.
The Iris data sets consist of 3 different types of iris flowers’ (Setosa, Versicolour, and Virginica) petal and sepal length, stored in a 150x4 NumPy array. The rows being the samples and the columns being: Sepal Length, Sepal Width, Petal Length, and Petal Width. The goal is to have a model that can accurately predict the true label given a 4-tuple of Sepal Length, Sepal Width, Petal Length, and Petal Width.
Model that can accurately predict the true label given a 4-tuple of Sepal Length, Sepal Width, Petal Length, and Petal Width.
Creating a model training procedure (using Keras)
In this section, we’ll create a function create_model which contains the code for model definition in Keras (I’m using the Tensorflow’s Keras API). The model will be a vanilla MLP with 1 Hidden layer, 1 Input, and 1 Output layer. We will train it on the iris dataset and see the results without using any hyperparameter optimization with Tune. This will help us to see the difference between an optimized and un-optimized result.
Training MLP with 1 hidden layer on Iris without using Tune.
Note: make sure you download TF2.0
Output: Loss is 0.4202
Accuracy is 0.7368
Integration with Tune
In this section, we’ll create integrations for the above code with Tune to implement hyperparameter optimization on Iris. This will include two steps, Modifying the training function to support Tune and Configuring Tune.
Using Tune to optimize a model that learns to classify Iris. This will happen in two parts — Modifying the training function to support Tune and then configuring Tune.
Let’s first define a callback function to report intermediate training progress back to Tune.
Integration Part 2: Configuring Tune to tune hyperparameters.
In this part, we’ll configure the hyperparameter space with the variable parameters we want to try the model on.
But you might have a few questions…
How does parallelism work in Tune?
num_sampleswill run a total of 20 trials (hyperparameter configuration samples). However, not all of them will run at once. The max training concurrency will be the number of CPU cores on the machine we’r running on. For a 2-core machine, 2 models will be trained concurrently. When one is finished, a new training process will start with a new hyperparameter configuration sample.
Each trial will run on a new Python process. The python process is killed when the trial is finished.
How do I debug things in Tune?
error filecolumn will show up in the output. Run the below cell with the
error file pathpath to diagnose the issue.
! cat /home/ubuntu/tune_iris/tune_iris_c66e1100_2019-10-09_17-13-24x_swb9xs/error_2019-10-09_17-13-29.txt
Analyzing the Best-tuned Model
In this final section, we’ll see the best-tuned model given by Tune. It will be interesting to see a comparison of metrics (accuracy) between the results from Tune after optimizing and before optimizing.
Loss is 0.1361
Tuned accuracy is 1.0000
The original un-tuned model had an accuracy of 0.9211
You can see here the plots of Ground Truth, Un-Tuned Model predictions, and Tuned Model Predictions. Although it’s hard to differentiate or see the difference in predictions because of close prediction accuracies. As dataset size grows and so does entropy you’ll end up needing to optimize the system more.
You can also see the training history and other visualizations of Training and Testing by initiating a tensorboard instance for the above application using the following command (add “%” when using in jupyter notebook/lab)
tensorboard --logdir ~/ray_results/tune_iris
- We learned about the Ray framework and how the Tune library can be utilized in hyper-parameter optimization. What number of algorithms are implemented in Tune.
- We also implemented Tune for a tabular dataset Iris and saw how it can be used with existing model definitions.
- We did a step-by-step implementation to analyze the differences in performance between the non-optimized and optimized set of hyperparameters of a model. Although for the sake of keeping the code concise we only fiddled with learning rate, dense1, and dense2 parameters but you can tinkle with any adjustable parameter with any DL model in your system or a big cluster.
Hyperparameter Optimization is an area of research in itself. Just to give you perspective, Ray is a product of UC Berkeley research. So, just like the area of building models around your datasets, Optimizing the result also becomes equally important if you want to achieve significant results. You can start with a small vanilla network and work your way through a model with a few 100 million parameters. It’s all about how you can utilize it to make something better!
Fabiana Clemente is Chief Data Officer at YData.
Making data available with privacy by design.
YData helps data science teams deliver ML models, simplifying data acquisition, so data scientists can focus their time on things that matter.