Machine Learning & Deep Learning Guide

Published in

Analytics Vidhya

8 min readNov 26, 2019

Welcome to part 4 of the Machine Learning & Deep Learning Guide where we learn and practice machine learning and deep learning without being overwhelmed by the concepts and mathematical rules.

Part 1: Key terms, Definitions and starting off with Supervised Learning (Linear Regression).
Part 2: Supervised Learning : Regression (SGD) and Classification (SVM, Naïve Bayes, KNN and Decision Tree).
Part 3: Unsupervised Learning (KMeans,PCA), Underfitting vs Overfitting and cross validation.
Part 4: Deep Learning: Definitions, Layers, Metrics and Loss, Optimizer and Regularization

Learning Objective

In part 4 of the tutorial, we will be discussing Deep Learning. First, we will have definitions of deep learning and neural network. Then we talk about two main Neural Network architectures. After that we will list the main error and metrics function used along with optimization function. Finally, we will write a deep learning example.

Definitions

Deep Learning

In part 1 of this guide, we defined Deep Learning as a subset of Machine Learning that works similar to our brain, using a mesh network, technically termed as Deep Neural Network.
Just like our brain identifies patterns to classify things, and learn from mistakes — Deep Learning does it too. It compares the unknown data with the known data to classify it accordingly.

In part 3 we saw an example of Unsupervised learning where we did some feature extraction. Deep learning, with the ability to learn multiple layers of representation, is one of the few methods that helps us with automatic feature extraction. The lower layers can be assumed to be performing automatic feature extraction, requiring little or no guidance from the programmer.

Neural Networks
Neural networks are a class of models that are built with layers. Commonly used types of neural networks include convolutional and recurrent neural networks.

Input Layer: represents the data we feed into our neural network.
Output Layer: it generates the outputs of our neural network. It can out
Hidden Layers: are the core of our neural network. It could output a binary values (Binary classifications), probabilities of being in a class (Multi-Class Classification) or continuous values (Regression). In case we have one (or few layers) it is called a shallow network. Otherwise it is called Deep Neural Network (DNN)

A single neuron might look as follow

In each layer, you have input data, you weight it, and pass it through the function in the neuron that is called activation function. It is the sum of all of the values after comparing it with a certain value. If you fire a signal, then the result is (1) out, or nothing is fired out, then (0). That is then weighted and passed along to the next neuron, and the same sort of function is run. Once we reach the output layer we generate our output and compare to match your desired output and calculate loss/cost.

The process of propagating the data from one neuron to the next one (in order) is called feed-forward.
Based on the loss, we go backward and start updating the weights and biases to minimize the loss/cost. This process is called back-propagation.

Deep learning can be used in supervised, unsupervised, or RL. Source: Fridman et al. | MIT Deep Learning

Now that we have seen the structure of neural network. Let us view some examples of Neural Networks.

Convolutional Neural Network (CNN)

The idea behind convolutional neural networks is the idea of a moving filters (Convolutions) which passes through the image. We then apply down sampling (Pooling) where we select a region and apply average or maximum of the values in the region. The last layer (output) is a fully connected layer to generate the outputs .

CNN is mostly used for machine vision projects. But still, it can be used in other applications.

Recurrent Neural Networks

Recurrent nets are a type of artificial neural network designed to recognize patterns in sequences of data, such as text, genomes, handwriting, the spoken word, or numerical times series data emanating from sensors, stock markets and government agencies. These algorithms take time and sequence into account, they have a temporal dimension.

**Long-term dependency problem, each node represents an rnn cell.**source:Github

Types of layers and function:

Let us consider the important and most used layers:

Input Layer − It takes the raw data as it is.
Convolutional Layer − This layer is the core building block of Convolutional neural network (CNN) that does most of the computations. This layer computes the convolutions between the neurons and the various patches in the input.
Pooling Layer − Pooling helps us to keep only the important parts as we progress in the network. Pooling layer operates independently on every depth slice of the input and resizes it spatially. It uses the MAX function.
Fully Connected layer (Dense) − This layer computes the output scores in the last layer. The resulting output is of the size 𝟏×𝟏×𝑳 , where L is the number training dataset classes.
LSTM − Long Short Term Memory networks — usually just called “LSTMs” — are a special kind of RNN, capable of learning long-term dependencies.

As mentioned earlier, there is an activation function between hidden layers applied to the output of the previous layer. It adds non-linearity to the network so that it can generalize well to any type of function. Here are the most common ones:

a. Sigmoid b.Tanh c.Rectified Linear Unit (ReLU) d.Leaky ReLU

Metrics and Loss:

Same as Machine Learning, in Deep Learning we use loss functions to evaluate our model’s error and we use metrics to evaluate the performance.

Here are main Loss functions used in Deep Learning:

For Classification :

Binary Classification: Binary Cross-Entropy
Multi-Class Classification: Categorical Cross-Entropy and Sparse Categorical Cross-Entropy

For Regression:

Mean Squared Error (MSE)
Mean Absolute Error (MAE)

Here are the main Metrics in Deep learning:

Accuracy
Mean Absolute Error (MAE)

Optimizer:

It is the function used back-propagation to update the weights. We mainly use the following optimizers.

ADAM: Adaptive Momentum
RMSPROP: Root-Mean-Square Propagation
SGD: Stochastic gradient descent optimizer

Regularization:

Help the network generalize to data it hasn’t seen. It is used to solve the overfitting problem:

Dropout: Randomly remove some nodes in the network (along with incoming and outgoing edges)
Early Stopping: Stop training (or at least save a checkpoint) when performance on the validation set decreases
Available penalties:
L1 Penalty: Penalize absolute weights.
L2 Penalty: Penalize squared weights

Enough definitions…Start Coding

We will implement Convolutional Neural Network (CNN) model Keras.

You can download the complete Kaggle notebook from here

1. Data Definition: We will use MNIST dataset. We will define some parameters as follow:

from __future__ import print_function
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K# Set few parameters to be used
batch_size = 128
num_classes = 10
epochs = 12
# input image dimensions
img_rows, img_cols = 28, 28#Load MNIST dataset the split it between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

2. Perform Preprocessing :

a. Normalize the training and testing inputs

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

Result :
x_train shape: (60000, 28, 28, 1)
60000 train samples
10000 test samples

b. Convert class vectors to binary class matrices

y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

3. Build Model

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

4. Plot the model

from keras.utils import plot_model
plot_model(model)

5. Configures the model for training

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

6. Trains the model for a fixed number of epochs (iterations on a dataset)

history = model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))

7. Evaluate the model by getting loss value & metrics values in test mode

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Result :
Test loss: 0.02984398379799968
Test accuracy: 0.9915000200271606

Impressive we got accuracy of 99.19%

8. Plot an image and the label provided by our model

import matplotlib.pyplot as plt
image_index = 8855
plt.imshow(x_test[image_index].reshape(28, 28),cmap='Greys')
pred = model.predict(x_test[image_index].reshape(1, img_rows, img_cols, 1))
print(f"Label predicated by model: {pred.argmax()}")

Result :
Label predicated by model: 5

9. Plot Accuracy and loss values for training and testing

# Plot training & validation accuracy values
plt.plot(history.history['accuracy'])
plt.plot(history.history['val_accuracy'])
plt.title('Model accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()# Plot training & validation loss values
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper left')
plt.show()

Recap

We have reached the end of the 4th and last part of our series for Machine Learning and Deep Learning guide. In this part, we discussed Deep Learning: Definitions, Layers, Metrics and Loss, Optimizer and Regularization. Then we had a full example for Convolutional Neural Network (CNN).

What to do next,

Our main goal of the whole guide is to help programmers and software engineers to get started with ML/DL. This is just the entry point, in case you want to get deep into the domain I suggest that you check the references mentioned in each tutorial — especially Scikit-Learn and Keras. And the most important thing to do is to practice on your own.

Thanks for reading!