Deep Learning — Introduction

Dejan Jovanovic
aihive
Published in
7 min readFeb 22, 2019

Part I

Introduction

In the last couple of years AI has been a subject of interest for the industry. Large and small companies have been involved in the development of different solutions based on AI.

Work on artificial neural networks, commonly known as “neural networks,” has been motivated right from its inception by the recognition that the human brain computes in an entirely different way from the conventional digital computer. The brain is a highly complex, nonlinear, and parallel information processing system. It has the capability to organize its structural constituents, known as neurons.

Deep Learning is just a subset of Machine Learning, and it presents the big comeback of Neural Networks. One of the definitions for Deep Learning is “neural networks with more then two layers”. Neural networks are nothing new and have been around since the 1980s. The “deep” in Deep Learning does not refer to any kind of deeper, more meaningful understanding achieved by the approach but presents the idea of successive layers of representations. The number of layers to contribute to the model is called the depth of the model.

There are a few known and used neural networks models that I will be describing in this series. The examples that I’m going to present are written in Python using Keras framework and Tensor Flow library, but before we go there let’s just clarify what neural networks are. The following picture shows the most granular part of a neural network, a neuron.

Neuron

A neuron is the lowest level information processing unit that is fundamental to the operation of a neural network. In mathematical terms, we may describe the neuron k by writing presented equations. Here is example of how a network could look like:

Example of Neural Network

And here is how this is implemented using Keras library.

from keras.models import Sequential
from keras.layers import Dense
# create model
model = Sequential()
model.add(Dense(4, input_dim=4, activation='relu'))
model.add(Dense(2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

Neural Networks Architectures

There are a number of different Neural Networks models that scientist have been researching; some of the most popular and foundational for more advance models are:

  1. Multilayer Perceptrons (MLP). Multilayer Perceptrons Network also known as feedforward neural network is commonly used for simple logistic and linear regression problems.
  2. Recurrent Neural Networks (RNN). Recurring Neural Networks are a good fit for cases where we have sequential data input. By design it allows the network to discovers dependencies in the history of data. This model is useful for prediction cases.
  3. Convolutional Neural Networks (CNN). Convolutional Neural Networks are good for multi-dimensional data such as images or video. CNN model exceed in data classification, segmentation, generation and other purposes.

Despite that these three models are presented above separated, you will find that they are often combined together in order to take advantage of the strength of each model.

Activation Function — What is it and when to use it?

The activation function -or also called transfer function- establishes the bounds for output of the neurons (please refer to figure: Neuron). Neural network can use many different activation functions. The most common ones are:

a. Linear Activation Function

Linear activation function is the most basic activation function. It is usually used on the output layers of regression neural networks.

b. Step Activation Function

Step activation function is yet another simple activation function and it is usually used on output layers.

c. Sigmoid Activation Function

Sigmoid activation function is also called logistic activation function, and it is a very common choice for feedforward neural networks (Multilayer Perception) that need to output only positive numbers.

d. Hyperbolic Tangent Activation Function

In case that output values are expected to be in the range between -1 and 1, the hyperbolic tangent activation function should be used. Hyperbolic tangent activation function is very similar to Sigmoid activation function.

e. Rectified Linear Units (ReLu) Activation Function

ReLu has become the standard activation function for the hidden layers of a deep learning network. In addition to ReLu, deep learning neural networks will use a linear activation function or softmax activation function.

f. Softmax Activation Function

Along with linear activation function softmax, activation function is usually found in the output layer of a neural networks. The softmax activation function is used in classification neural networks. The neuron that has the highest value claims the input as the member of its class. The softmax activation function forces the output of the neural network to represent the probability that the input falls into each of the classes.

Six Steps for Building Neural Networks

In order to be successful when developing systems that are powered by artificial intelligence, a different software development process needs to be adopted. Getting the right model in place requires experimentation, research and understanding of the business as well as available data. CRISP-DM Phases.

Testing Datasets

As you can see, in order to build your model you will need a dataset. For your training here are few very useful links with some great datasets that I will be including in my examples.

Example Project

For my example I’m going to use Pima Indians Onset of Diabetes Dataset. The dataset has following attributes:

  1. Number of times pregnant
  2. Plasma glucose concentration a 2 hours in an oral glucose tolerance test
  3. Diastolic blood pressure (mm Hg)
  4. Triceps skin fold thickness (mm)
  5. 2 Hours serum insulin (mu U/ml)
  6. Body mass index
  7. Diabetes pedigree function
  8. Age
  9. Class, onset of diabetes within five years

The model is defined as sequence of layers. In this case we will create a sequential model shown on the figure below.

There is no specific rule about how to get to how many layers the model needs to have. Best results are usually found through experimentation. So here is what our code at a high level is going to do:

  1. Load data from dataset.
  2. Create neural networks model
  3. Compile the model
  4. Train the model
  5. Evaluate the model

The example also has a few additional features:

  1. Model and weights are saved at the end of the training
  2. Graph presentation of accuracy of the model
  3. Graph presentation of loss of the model

Here is how the anatomy of our network looks like:

Finally, here is the code that will perform everything described above.

import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.callbacks import ModelCheckpoint
import matplotlib.pyplot as plt
# dataset file
datasetFileName = "pima-indians-diabetes.csv"
# initialize random number generator
seed = 7
numpy.random.seed(seed)
# load data
dataset = numpy.loadtxt(datasetFileName, delimiter=",")
# split dataset into input and output variables
X = dataset[:, 0:8]
Y = dataset[:, 8]
# define base model
def create_model():
# create model
model = Sequential()
model.add(Dense(12, input_dim=8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(8, kernel_initializer='uniform', activation='relu'))
model.add(Dense(1, kernel_initializer='uniform', activation='sigmoid'))
# compile model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
# create the model
model = create_model()
history = model.fit(X, Y, validation_split=0.33, epochs=150, batch_size=10, verbose=0)# evauate the model
scores = model.evaluate(X, Y)
print("\nAccuracy: %.2f%% \n" % (scores[1]*100))
# list all data in the history
print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# save the model as JSON file
# serialize model to JSON
model_json = model.to_json()
with open("model.json", "w") as json_file:
json_file.write(model_json)
# save the weights
# serialized weight to HDF5
model.save_weights("model.h5")
print("\nSaved model to disk.")

The outcome of the code is:

32/768 [>………………………..] — ETA: 0s
768/768 [==============================] — 0s 12us/step
Accuracy: 76.30%
dict_keys([‘val_loss’, ‘val_acc’, ‘loss’, ‘acc’])
Saved model to disk.

And also plotted the following diagrams:

Github

The code and the dataset can be found here:

https://github.com/CryptoBlockTeam/DeepLearning-comebackOfNeuralNetworks

Summary

Hope you enjoyed this reading. This was just introduction to Deep Learning and the next articles are going to go deeper into the subject.

References

  1. Deep Learning with Python, By Francois Chollet, ISBN 9781617294433
  2. Artificial Intelligence for Humans Volume 1: Fundamental Algorithms, By Jeff Heaton, ISBN978–1493682225
  3. Artificial Intelligence for Humans Volume 3: Deep Learning and Neural Networks, By Jeff Heaton, ISBN978–1505714340
  4. Develop Deep Learning Models on Theano and TensorFlow Using Keras, By Jason Brownlee
  5. Deep Learning, By Ian Goodfellow, Yoshua Bengio and Aaron Courville, ISBN 9780262035613
  6. Neural Networks and Learning Machines, By Simon Haykin, ISBN 9780131471399

--

--

Dejan Jovanovic
aihive
Editor for

Seasoned executive, business and technology leader, entrepreneur, blockchain and smart contract expert