Hyper parameter tuning for Keras models with Scikit-Learn library

Published in

The Startup

5 min readDec 13, 2019

Keras is a neural-network library for the Python programming language capable of running with many deep learning tools such as Theano, R or TensorFlow and allowing fast iteration for experimenting or prototyping neural-networks.

Whether you are prototyping a neural network model in Keras to get a feel for how it will perform the required task or fine tuning a model you have build and tested, there are many parameters to consider for your model. These model parameters are referred to as hyper parameters. The activation function of used in your layers is an example of a hyper parameter. The number of layers in the model, number of neurons per layer or the size of the kernel in a CNN can all be considered hyper parameters.

There is no magic formula to choose the right parameters and different problems will require different approaches. Changing each parameter of your model may affect its performance, and only experimentation will determine which combination works best for your model and data.

In this article we will look at steps required to perform hyper parameter tuning using another machine learning library, Scikit-Learn, to optimize a Keras model. We will build a simple neural network and look for the best optimizer, batch size and the activation using the RandomizedSearchCV object from the Scikit-Learn library.

Before we begin

The libraries we will be using in our example are TensorFlow, which includes Keras, and Scikit Learn. We will be using the following functions:

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense,Flatten
from tensorflow.keras.datasets import mnist
from tensorflow.keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import RandomizedSearchCV

We will also use numpy and matplotlib libraries for some support functions:

import numpy as np
import matplotlib.pyplot as plt

Prepare data

To start, lets get a dataset to work with, format it and build our model. Here, we are loading the dataset with a train/test split, normalizing it and printing its shape to ensure we use correct input for the model:

(X_train, y_trn), (X_test, y_tst) = mnist.load_data()
X_trn = X_train[..., np.newaxis].astype(np.float32) / 255.
X_tst = X_test[..., np.newaxis].astype(np.float32) / 255.
print(X_train.shape,y_trn.shape)
print(X_test.shape,y_tst.shape)

The mnist dataset is a set of 28x28 pixel pictures of hand-written numbers.
Our data looks like this:

def preview(data,result):"""Shows 12 elements of picture dataset"""fig = plt.figure()
for i in range(12):
plt.subplot(2,6,i+1)
plt.imshow(data[i], interpolation='none')
plt.title("label:{}".format(result[i]))
plt.xticks([])
plt.yticks([])
preview(X_train[12:],y_trn[12:])

Build Model

In order to tune the parameters of our Keras model using scikit-learn we need to be able to rebuild our model using different parameters. To do this, we create a function to build the model based on our hyper parameters:

def build_model(var_activation='relu',var_optimizer='adam'):
  """ Uses arguments to build Keras model. """
  model = Sequential()
  model.add(Flatten(input_shape=[28, 28, 1]))
  model.add(Dense(64,activation=var_activation))
  model.add(Dense(32,activation=var_activation))
  model.add(Dense(16,activation=var_activation))
  model.add(Dense(10,activation='softmax'))
  model.compile(loss="sparse_categorical_crossentropy",
                optimizer=var_optimizer,
                metrics=["accuracy"])
  return model

This is how our model looks with default parameters:

model_default = build_model()
model_default.summary()

Set variables

We want to test model’s performance using Adam algorithm and Stochastic Gradient Descent as well as test different activation functions for the layers and batch sizes for training the model. Let’s create lists of our parameters and store them as a dictionary. The keys in the dictionary are the names of variables that are used in our model:

_activations=['tanh','relu','selu']
_optimizers=['sgd','adam']
_batch_size=[16,32,64]
params=dict(var_activation=_activations,
            var_optimizer=_optimizers,
            batch_size=_batch_size)
print(params)

Note that ‘batch_size’ is not a variable in the build_model function, but rather the variable that would be used later in the .fit() call to train the model we create.

Create a scikit learn estimator from the Keras model

Now that we have the data, the function to build our models and the parameters we want to test, we can use the sklearn library to test different models based on our function and hyper parameters. We can use the GridSearchCV or the RandomizedSearchCV objects from the sklearn.model_selection module to iterate through different combinations of our hyper parameters and output the model with the best score. The GridSearchCV object will iterate through all possible combinations of hyper parameters, while the RandomizedSearchCV object will randomly sample a number of possible combinations to train the model. While using randomized search may not always provide best possible model, it is much faster and less resource intensive since not all combinations of parameters are considered. This makes randomized model search very useful for testing and prototyping. To use RandomizedSearchCV we first need to make our Keras model compatible with sklearn library and we will use keras wrapper for scikitlearn: KerasClassifier.

model = KerasClassifier(build_fn=build_model,epochs=4,batch_size=16)

Before fitting our RandomizedSearch object we set the random seed with the numpy.random.seed(). Setting the seed to the random number generator will make our model weights initialization the same for each iteration making our search more meaningful. If our hyper parameters include number of layers or number of nodes in a layer however, this will be of little help because we will be comparing entirely different models.

np.random.seed(42)

Use RandomizedSearchCV

Once we have created the KerasClassifier, we then create the RandomizedSearchCV object and use the .fit() method to start searching for the best model. RandomizedSearchCV allows us to explicitly control the number of combinations to try using the parameter n_iter.

rscv = RandomizedSearchCV(model, param_distributions=params, cv=3,     n_iter=10)
rscv_results = rscv.fit(X_trn,y_trn)

Here are the results of our search:

print('Best score is: {} using {}'.format(rscv_results.best_score_,
rscv_results.best_params_))

Conclusion

Hyper parameter tuning can be used to fine tune a selected model or to search for a model best suited for the task. It can also help evaluate how fast the model is learning. The approach above can be expanded further to include a more exhaustive search using GridSearchCV object from scikit-learn library, or by adding parameters for the structure of our model such as number of layers. Callbacks can be added to prevent over-fitting tested models.

You can find more information on Keras here: https://keras.io/
You can find more information on Scikit-Learn here: https://scikit-learn.org/