Automated Hyperparameter Tuning with Keras Tuner and TensorFlow 2.0
Building deep learning solutions in the real world is a process of constant experimentation and optimization.
Unlike any other type of software application, deep learning applications don’t have a linear graph lifecycle and rely on the fact that models need to be constantly refined, optimized and tested.
In a nutshell, model optimization is directly proportional to model robustness!
Being a deep learning practitioner, you cannot deny the fact that choosing the correct hyperparameters for your model is a very critical and painful task.
So, Google’s TensorFlow created an awesome framework to solve the pain points of performing a hyperparameter tuning and optimization.
The Keras Tuner is a library that helps you pick the optimal set of hyperparameters for your real world Deep Learning applications.
In this article we will see, how we can use the Keras Tuner and TensorFlow 2.0 to choose the best hyperparameters for our model!
Before starting with the awesomeness of Keras Tuner, let’s warm up with some critical concepts to move smoothly with this blog.
What are Hyperparameters?
Hyperparameters are the variables that govern the training process and the topology of an ML model. These variables remain constant over the training process and directly impact the performance of your ML program.
Hyperparameter Optimization is the process of tuning hyperparameters to gain better model-learning and training.
Hyperparameters are of two types:
- Model hyperparameters which influence model selection such as the number and width of hidden layers
- Algorithm hyperparameters which influence the speed and quality of the learning algorithm such as the learning rate for Stochastic Gradient Descent (SGD) and the number of nearest neighbours for a k Nearest Neighbours (KNN) classifier
Why Keras Tuner?
To explain to you with a more “Deep Learning” intuition, I will explain certain hyperparameters which are considered important for your model optimization.
Let us consider a simple Convolutional Neural Network. It is influenced by many hyperparameters like-
- Number of Hidden Layers — The size of the hidden layer is normally between the size of the input and output-. It should be 2/3 times, the size of the input layer added to the size of the output layer. It is very important for regulating model training.
- Learning Rate — The godfather of all hyperparameters, learning rate quantifies the training progress of a model and optimizes its learning capacity.
- Batch Size — A very important concept of hyperparameter update, the batch size is the number of sub-samples given to the network. It is a hyperparameter of gradient descent that controls the number of training samples to work through before the model’s internal parameters are updated.
- Momentum — This hyperparameter helps to know the direction of the next step with the knowledge of the previous steps. It helps to prevent oscillations.
Each of these hyperparameters plays a huge role in generalizing your model for ultimate robustness. At the same time, picking up the right value of a hyperparameter requires a massive amount of trial-and-error process.
Do you want to spend your entire lifetime tuning a deep-learning model?
Well, that feels like a nightmare.
You probably don’t have to!
Keras Tuner makes it easy to define a search space and leverage included algorithms to find the best hyperparameter values. Keras Tuner comes with Bayesian Optimization, Hyperband, and Random Search algorithms built-in, and is also designed to be easy for researchers to extend in order to experiment with new search algorithms. — TensorFlow Blog
Let us discuss the 3 different types of Keras Tuners and then, we will get into a code walkthrough to show you how it works!
Overview of available Keras Tuners
There are 3 types of Keras Tuners available, as of now.
- Random Search Keras Tuner
The basic and least efficient approach, Random Search doesn’t learn from previously tested hyperparameter combinations. It simply samples hyperparameter combinations from a search space randomly.
2. HyperBand Keras Tuner
A Hyperband tuner is an optimized version of random search tuner which uses early stopping to speed up the hyperparameter tuning process. The main idea is to fit numerous models for a few epochs and to only continue training for the models achieving the highest accuracy on the validation set.
3. Bayesian Optimization Keras Tuner
Bayesian Optimization works the same as Random Search, by sampling a subset of hyperparameter combinations. But there is a key difference between them which makes this Bayesian guy smarter than Random Search.
The key difference is that it doesn’t sample hyperparameter combinations randomly; it follows a probabilistic approach. It picks already tested combinations and uses this data to sample the next combination.
Note: You can get familiar with all the syntax and methods of Keras Tuner with the Documentation.
Now, it’s time to get hands-on with Keras Tuner!
To make you understand the Tuner Search Loop in a much more intuitive way, I will provide a code walkthrough, where we will be using Random Search Keras Tuner, on our famous Fashion MNIST dataset.
Note — You don’t have to download this dataset from Kaggle, as it is already available with Keras. Also, I would like to recommend using Google Colaboratory for running the code, as it gives you free GPU usage and requires no setup.
So, grab your coffee, and let’s begin! 🥤
First, let us install the Keras Tuner.
Note : Before running this cell, in your colab environment, go to Runtime>Change Runtime Type>Hardware Accelerator>GPU>Save.
This will successfully install the Keras Tuner in your Colab environment.
Now, let us import the dependencies of this lab.
Now we will load our Fashion MNIST data from Keras.
Now we will scale down the image size. Since 255 is the maximum value for RGB image, dividing by 255 expresses a 0–1 representation.
You will receive an output of (28,28) which says that our image is of 28 pixels.
Most convolutional neural networks are designed in a way so that they can only accept images of a fixed size. The common practice to overcome this limitation is to reshape the input images so that they can be fed into the networks.
Great! We have pre-processed our data!
Now we will build a model using a build_model function, and use hp as our hyperparameters class instance.
Further inside the function, we will define a set of hyperparameters for tuning and a range of min-max values for the Keras Tuner to pick from.
As you can see from the above code, we have created a Convolutional Neural Network with 2 convolution layers, 1 flatten layer, and 2 dense layers.
For each layer except the final output dense layer, we have set a min-max range (min_value and max_value) from which the Keras Tuner will pick random values.
Now we are all set to call the Keras Tuner. So let us import our RandomSearch tuner from Keras. Next, we will make use of our RandomSearch tuner and take 5 trials, with 3 epochs, to see what is the best validation accuracy we achieve so far.
While you run this cell, your RandomSearch tuner will perform 5 trials and at the end of 3 epochs for each trial it will achieve some accuracy scores. Then it will output the best validation accuracy amongst the 5 trial accuracies.
Let us examine the output —
So we can see that our tuner has reported the best val_accuracy so far as 91.4%.
Now, coming to the best part, we will use an awesome function that comes with this tuner, i.e. the ‘tuner_search.get_best_models()’ function, that will report you a summary of the best model to use for your image classification data!
We want the best model for our data; hence we set the num_models to 1, and we want the 1st model as it will be the best, hence we used index [0], which will output a summary report of the best model!
Let us look at the report generated!
Cool! We got our best model report!
Now we will re-train our dataset, considering our “best model”!
Note: Remember to set initial_epoch to 3, as we already trained our images previously with 3 epochs. So this time, we will start with the 4th one!
Finally, we will train our best model with 10 epochs and see what accuracy we achieve now.
Let us look at the accuracy scores now. You will be amazed. 😛
At the 10th epoch, we achieved an accuracy score of 99.02%!
Massive results in just 18 lines of code!
⚠ Make sure you save your model in a pickle file as scores changes everytime you run the cell.
Conclusion
Hope you had fun exploring Keras Tuner with this article. If you are also a deep learning practitioner like me, I would recommend you to visit the Keras Tuners documentation and try out the other two tuners as well, with any dataset of your choice!
If you are a beginner in Data Science and Machine Learning and have some specific queries with regard to Data Science/ML-AI, guidance for Career Transition to Data Science, Interview/Resume Preparation or even want to get a Mock Interview before your D-Day, feel free to book a 1:1 call here. I will be happy to help!
You can download the code provided above from my GitHub, or directly run it in Colab.
To discuss more on Deep Learning, connect with me via LinkedIn.