Deep Learning for dummies

roby sidoti
7 min readApr 26, 2020

--

Easy example that shows how to create a DNN model for mutiplicative tables using Python and Tensorflow 2.0

Deep Learning is becoming one of most discussed argument on IT blogs; very often people want to know and learn most important concepts about it. In this post we are going to see how it can be applied and how to use a dominant framework such as Tensorflow 2. In order to introduce those concepts, at an entry level difficulty, we will define a simple objective: define a model using a Deep Neural Newtwork that is able to multiply two numbers; in other words this model will learn Multiplicative Tables (that ones kids learn at primary school). You can find the Jupyter notebook here.

Let’s read the dataset

I’ve create a csv file where there are 37 rows; each row has three numbers representing a multiplication. For instance 4,7,28 means 4*7=28 and 5,7,35 means 5*7=35 and so on.

dataset on csv

Train and test datasets

Now that we have read data we have to split train and test data that we will use to train the DNN and let the model to learn

Test dataset

Now it’s time to scale this data; remember: Tensorflow likes scaled data. Scaling our data we will transform them to values between 0 and 1. In order to do that we will use MinMaxScaler.

This code shows how it is simple to scale training and test data, last two print statements are useful to get useful informations: of starting 37 rows, we will work with 29 rows for training pourpose and with just 8 rows to test the model; another useful (but already known) number is that we are going to work with two features (A,B).

Let’s build the model

It’s time to build the model with a Deep Neural Network (DNN). Here below related code split into four phases

dasa

First phase is just where we import required tensorflow stuff; for the moment let’s skip what an EarlyStopping object is and move on the Phase3. Here a Sequential is instantied, a Sequential is an “easy to use” structure to define a Neural Network where we can simply add a layer with “add” method. How you can see we used an input layer with two nodes and and activation function of type “relu” (Rectified Linear Unit). After that, we added three hidden layers of 30, 500 and 30 nodes (with same activation function), and at the end an output layer of just one node. Some explanation about those choices:

  1. We’ve used only two nodes in the input layer because the input shape is 2; we’ve seen and we know that we will use A and B columns for input.
  2. We’ve used only one node for output layer because we have to define a model where output for each prediction is a number.
  3. We’ve used relu activation function because in the 99% of cases is the best activation function when we are facing a regression problem.
  4. We’ve used three hidden layers with 30,500,30 nodes following a simple empiric approach

On Phase4 we just use the compile method to finalize the model build; we’ve used “adam” optimizer because in many cases it is the best choice and “mse” (Mean Squared Error) because, for regression models, it is a good choice to use in the training step as loss function.

It’s time to train the model!

We’ve already seen that Tensorflow 2 keras APIs are very straightforward, fit method is enough to train a model.

How you can see x and y input parameters are used to define training features and labels; validation_data is a tuple input parameter used to define test features and labels. “batch_size” is the size used for batch input; in other words it is the amount of samples it feeds the model. “epochs” is the max number of epoch that the model will use for its training phase. Now it’s time to introduce the usage of “early_stop”

early_stop = EarlyStopping(monitor=’val_loss’, mode=’min’, verbose=1, patience=100)

It is used during training phase to anticipate the end of this phase if trend of val_loss doesn’t improve (after 100 consecutive cases of not improved epochs). It is very useful because very often that trend drops quickly: thus it is a way to early stop the training if this phase doesn’t need all available epochs.

Epoch 1/5000 1/1 [==============================] - 0s 171ms/step - loss: 1133.5464 - val_loss: 742.7161 
Epoch 2/5000 1/1 [==============================] - 0s 36ms/step - loss: 1129.7095 - val_loss: 740.7455
Epoch 3/5000 1/1 [==============================] - 0s 35ms/step - loss: 1127.0126 - val_loss: 738.8472
Epoch 4/5000 1/1 [==============================] - 0s 36ms/step - loss: 1124.4393 - val_loss: 736.9773
Epoch 5/5000 1/1 [==============================] - 0s 39ms/step - loss: 1121.9463 - val_loss: 734.9907
. . .
. . .
Epoch 1323/5000 1/1 [==============================] - 0s 36ms/step - loss: 0.0750 - val_loss: 0.6313
Epoch 1324/5000 1/1 [==============================] - 0s 37ms/step - loss: 0.0748 - val_loss: 0.6320
Epoch 1325/5000 1/1 [==============================] - 0s 44ms/step - loss: 0.0746 - val_loss: 0.6328
Epoch 1326/5000 1/1 [==============================] - 0s 36ms/step - loss: 0.0744 - val_loss: 0.6321
Epoch 01326: early stopping

This is what you will see once train command has been run; details about loss (value of loss function on training data) and val_loss (value of loss function on validation data) are shown for each epoch. It is possible to plot these values with this code:

Now it’s clear that the training has not reached a case of overfitting and it has been stopped in time.

Persisting the model only if it is a champion!

Due to random initialization of starting weights, model training doesn’t reach always same levels of optimization, an easy strategy is to repeat more times this last step until loss function reaches an acceptable value. That’s why we should be able to compare the level reached by our model and, if it is the best never reached, save it. Ok, let’s see how accurate is current model.

In this code we’ve produced a prediction with current model and calculated its “mean average error”. This value is a statistic index that gives an idea about how much error there is between a prediction and a real value. Ok once we know how good is current model, we should compare it with the best we’ve already saved on a file.

In order to do that we need to define a Pythin function that receives current model and test data, executes predictions with both current model and “champ” model and at the end it has to compare mae values returning the best model. In this function we have to save and load model from file; the function shows that it is very easy with medel kears APIs.

Test best model

Now we have the best model, then it’s time to test it and see how much accurate it is. First of all, we need to produce a prediction on test data and evaluate with a diagram how good it is.

The result is this below

Red line is ideal prediction, blue dots show how good is our model from a graphic point of view. They seem to be good “enough”, because they are always very close to ideal prediction. But remember, this is a set of predictions using test data and test data are used by the model during “train” phase. But what happens if the model execute a predictions with “never seen” data? We can evaluate this scenario in a very easy way, we need just to analyze our train dataset and figure out a set of multiplications that is not there. Furthermore we need an evaluation function and execute these predictions using it.

8 * 2 = 15.0 
7 * 2 = 13.0
5 * 3 = 15.0
8 * 7 = 56.0
8 * 5 = 40.0
7 * 4 = 28.0
10 * 4 = 40.0

Output results are not so bad! Obvsiouly it’s not perfect and you need some iterations to reach this level of accurancy; but I’m happy with it; remember: the model doesn’t know what a multiplication is and it has never seen these pairs of data.

End of work

Ok, we are at the end of our first experience with Tensorflow 2 and its Keras API. We’ve seen that it’s very easy to use them and that they can be integrated with other powerful API provided by Python world.

Ciao, see you ASAP!

--

--

roby sidoti

Technical Architect@Walgreens Boots Alliance; Data Science appassionate. Certified SAFe Agilist