How to build your first Deep Learning model with optimized Hyperparameters

Shaheel Khan
4 min readSep 21, 2020

--

Are you confused about how many layers you should use when building your neural network??Or Are you confused about what Activation Function you should choose in your hidden layers??

Then you are at the right place.Building an optimized Deep Learning model is more an ART than a Science.Through this post I’ll give you an idea about how to perform Hyperparameter Optimization in Deep Learning.

In this post we’ll try to optimize ANN parameters using a wonderful library called ‘Keras Tuner’.You can read more about this Keras Tuner here.

Some of the parameters we are going to optimize are

  1. How many layers to use in our model?
  2. How many neurons to use in each of the hidden layers?
  3. What Activation Function to use?

Full code can be found in Github

So let’s get started:

Make sure you have Python 3.6 or later and TensorFlow 2.0

Also install keras-tuner

You can !pip install keras-tuner

For this post I’ll be using the MNIST handwritten digits data.

We can load our train and test using load_data() in Keras

(x_train,y_train), (x_test,y_test) = mnist.load_data()

Here each image in the dataset is of 28x28 pixels. All are gray scale images. So in order to give these as input to our network we need to flatten it because we cannot give input as image for a Dense network.So we need to convert them into vectors. This can be done by

x_train = x_train.reshape(len(x_train),784)
x_test = x_test.reshape(len(x_test),784)

784 is nothing but 28*28*1 ie the flattened image

Lets see how an image looks like.

Number at x_train[456:457]

Hyperparameter Tuning

Now that our dataset is ready to train. But before that lets do Hyperparameter tuning.We’ll define a function to choose our optimum Hyperparameter

def hyper_model(hp):
model = tf.keras.Sequential()
model.add(tf.keras.layers.Flatten(input_shape=(28, 28)))
for i in range(hp.Int('num_layers:',1,3)):
model.add(tf.keras.layers.Dense(units = hp.Int('num_units_'+str(i),min_value=30,max_value=40,step = 2),
activation = hp.Choice('activation_' + str(i), ['sigmoid','relu','tanh'])))
model.add(tf.keras.layers.Dense(10,activation='softmax'))
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])
return model

To create our ANN model layer-by-layer we’ll use Sequential API.We’ll use a for loop creating a tunable number of layers. For each loop over i, we’ll tune the number of neurons to be used(denoted by num_units_) and also we’ll tune what activation function to use in each layer(denoted by activation_).

In summary these are the tuning parameters

Number of layers: [1,2,3]

Number of neuron to use in each layer: [30,32,34,36,38,40]

Activation Function to use in each layer: [‘sigmoid’,’relu’,’tanh’]

Now that we have defined the function for what all parameters to tune,but how can it be done?Again our Keras-tuner has the ammunition for that, which is the RandomSearch tuner. RandomSearch tuner will perform hyperparameter tuning by randomly sample hyperparameter combinations and test them out. Then we can use the best combination to build our main model.

tuner = RandomSearch(hyper_model,objective = ‘val_accuracy’,max_trials = 10,directory = ‘digits3’,project_name = ‘digits_hp3’)

Here objective variable represents the function that needs to be optimized. RandomSearch automatically infers whether to minimize or maximize it based on its value. max_trials variable represents the number of hyperparameter combinations that will be tested.

Now that our tuner is set up if we want to see summary of the task, we can use

tuner.search_space_summary()

Let’s start the the search for best hyper parameters

tuner.search(x_train,y_train,epochs=3,validation_split = 0.1)

Now that we have completed our search, to retrieve the best model, we can use

tuner.get_best_models(num_models=1)[0]

These are the hyperparameters that we can use in our model:

Number of Layers to use: 2

Number of neurons in 1st layer: 38

Activation function in 1st layer: sigmoid

Number of neurons in 2nd layer: 38

Activation function in 2nd layer: tanh

Let’s build our model using these params

#Initializing the model
model = tf.keras.Sequential()
#Adding the first hidden layer along with input
model.add(tf.keras.layers.Dense(units = 38, activation = 'sigmoid', input_shape = (784,)))
#Adding the first hidden layer
model.add(tf.keras.layers.Dense(units = 38, activation = 'tanh'))
#Adding the output layer
model.add(tf.keras.layers.Dense(units = 10, activation = 'softmax'))
#Compile the model
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'] )
#Train the model
model.fit(x_train,y_train, batch_size = 128, epochs=10, verbose = 1)
#Testing the model
score = model.evaluate(x_test,y_test, verbose = 1)
print('Test Loss:',score[0])
print('Test Accuracy:', score[1])

Test Accuracy is 96.35%

Now let’s use a random single input and check the model

prediction = model.predict(x_test[104:105])
prediction = prediction[0]
#print('Predicted Probability\n',prediction)
print('\nOutput\n',(prediction > 0.5)*1)

Output

[0 0 0 0 0 0 0 0 0 1]

It was predicted as number 9.

Let’s compare it with actual image

plt.imshow(x_test[104:105].reshape(28,28), cmap = ‘gray’)
plt.show()

Now let’s use a digit outside the dataset to check how accurate our model is.For this I drew a number in MS Paint and used it to check.

Lets check whether our model can correctly predict it or not

prediction = model.predict(image_gray_reshape.reshape(1,784))
prediction = prediction[0]
#print(‘Predicted Probability\n’,prediction)
print(‘\nPredicted Digit\n’,(prediction > 0.5)*1)

Predicted Digit

[0 0 0 0 0 0 1 0 0 0]

Awesome we got it correct.Our model is accurate enough to predict a digit outside the dataset.

Saving the Model Parameters for later use

model_json = model.to_json()
with open(‘model.json’,’w’) as json_file:
json_file.write(model_json)
model.save_weights(‘model.h5’)

Conclusion

We have seen how to perform Hyperparameter tuning in ANN using Keras Tuner. For demonstration I’ve shown how we can choose optimum layers, neurons, activation function etc. Using this technique we can tune more hyper parameters like optimizers and also other deep learning models like CNN, RNN etc.

Happy Learning

--

--