A hands-on intro to ANNs and deep learning — part 1

Lima Vallantin
Sep 9, 2018 · 6 min read

Artificial neural networks (ANNs) are not new, but now they became the next big thing. And it didn’t happen by chance: ANNs are practical and they usually do a great job when it comes to solve classification and regression problems. So, today, let’s do a brief introduction to the deep learning theme by implementing a simple neural network.

Known by their short name, the ANNs can also be called as connectionist systems. They are inspired by the way as the brain works, since they try to reproduce the structure of neurones and synapses. An ANN is capable of learning how to solve a new problem only by analysing older examples of the same problem. So, we can say that they somehow ‘learn’ with old information.

These networks simulate the neurones and their connections. The machine creates a series of artificial neurones who are capable de send information through the network. Each neurone evaluates the problem, takes a decision and move the information forward in a process we call network training. During the training, the network also adjusts itself in order to ameliorate future predictions.

On today example’s, we’ll revisite the Titanic data set. I will reuse some of the data preprocessing steps of the article below, so I won’t talk about the preprocessing steps.

Now that our data is preprocessed, we can start to work on the Deep Learning theory. Before move on, let’s try to understand what’s deep learning.

Light dive on deep learning

Traditional machine learning approach has a learning limit (it doesn’t matter if you add more data, it will stop learning at some point), but ANNs don’t suffer of the same faith. They require more computational power, however the results are far better than the old machine learning standards. They are also capable of doing feature extraction without human intervention.

There are other reasons why deep learning became so important nowadays. You can read more about it on the following article:

Now that you understand why we will use another approach rather than traditional ML algorithms for the Titanic problem, let’s move on.

The first thing we’ll do is to import the Keras library. By the way, in order to follow this tutorial, you need to have Keras, Theano and TensorFlow installed on your machine.

In short, Keras is a high-level neural networks API that runs on top of TensorFlow and is ideal for fast prototyping. TensorFlow is a machine learning framework created by Google that makes numerical computation easier. You can use TensorFlow not only for Deep Learning, but also for traditional ML. Lastly, there’s Theano, which is a library that makes work with multi-dimensional arrays faster.

After importing Keras, you will create a Sequential() object. This is a model that defines a sequence of layers that will be used to build our network.

After creating this object, is time to define our input layer and the first hidden layer. Every ANN is composed by an input layer, a hidden layer (or many of them) and an output layer, which will gives us the results of our predictions. The structure of an ANN can be represented like this:

Picture of http://www.texample.net/tikz/examples/neural-network/

We have to choose how many neurones we want to use in our input layer — defined by the parameter units- but there’s no formula for this. In general, this is a trial and error matter. Let’s use 7 here.

You also have to choose an activation_function. The activation function receives an input and decides how the neurone will deal with this entry. If the information received is considered relevant for the neurone, it will be activated. After a positive evaluation, the neurone passes the signal forward.

There are a couple of activation functions, but we’ll use the Rectified Linear Unit (ReLu), that gives us an output from 0 to infinity. This function is one of the most used on networks today, but you have other options. To know more about other activation functions, read:

Then, we have the kernel_initializer parameter. This corresponds to the starting weights for the synapsis. In neural networks, the weights exist between every two layers. The liner transformation of the weights is submitted to a non linear activation function in order to create the values that will be sent to the next layer.

There are many ways to initialise the weights on an ANN, including a zero and a random approach. The truth is that the importance of the weights parameter was for a long time ignored. Here, I will be using “he_normal” as value.

Lastly, we need to inform how many layers exist before the first layer. It corresponds to the number of input variables we have on the data set, so let’s put 12 here.

For the hidden layers, we do the same thing, but we don’t have to specify the imput_dim argument, since the algorithm is capable of recognise the number of dimensions coming from the previous layer.

We need to finish the architecture of our network by adding the output layer. In this case, we have to change the unit values to 1, because we
have one binary output. For the activation, we need a sigmoid function because we need a probability between 0 and 1 since we are working on a binary classification problem.

When you finish doing this, it’s time to compile the network. The optimizer argument asks for the gradient descent method we’ll use and the loss argument corresponds to the loss function. The metrics argument measures the model performance.

After doing this, we can train our ANN. It takes some time and we can use the fit method for this. Fit takes batch_size as argument, which is used to define the number of observations used by the algorithm to train the network and adjust the weights per propagation.

Next argument is epoch. One epoch is when an entire data set is passed forward and backwards through the ANN. The goal here is that after each iteration, the accuracy increase and the loss decrease.

Once this step is done, let’s do some predictions!

Let’s compare the actual result with the previous one, when we used a traditional approach. Today we got a 82% accuracy score. Before, we got 83% score.

Think about it: we didn’t have to do any feature selection and we almost didn’t tweak our model. We didn’t do any cross validation, we didn’t play with the number of hidden layers…

So, there’s a lot of room for improvement here. We can do much more to obtain better results, but doing almost nothing was already enough to make us almost beat the old model. So, do you understand the potential of ANNs?

On the part 2, write how we can improve the ANN performance. Check it here:

By the way, be careful about the accuracy score we used for our predictions. Know why here:

Lima Vallantin

Written by

Data scientist working on solutions for sustainable fashion. Let’s talk about Machine Learning and fashion. Message me! | linkedin.com/in/limavallantin/

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade