Artificial Neural Network cookbook

Step by step illustration of how to implement ANN using Keras

Arun Ramji Shanmugam
Analytics Vidhya
10 min readFeb 2, 2020

--

Firing Neurons

What is Neural Network ?

Neural network is one of the classification algorithm in machine learning which is actually inspired by human brain neuron cells ,to predict the probability of an event or dependent variable. It is more like a logistic regression but with many layers of activation function to make sure that accuracy of prediction is higher.

if you are familiar with the basic regression technique, you may wonder why not use logistic regression for classification problem? The problem is, simple logistic regression wouldn’t work much efficiently to compute non linear hypothesis, hence we are using ANN for lot of real time complex classifications

Neural Network is, more layers of logistic functions or some other activation functions which can generate more complex non linear decision boundaries.

I know if you are totally new to neural network ,this explanation wouldn’t be enough to understand the full background of it, if you are interested to know the mathematical background of this, I would highly encourage you to see the Prof.Andrew Ng’s video lecture and for further mathematical reference you can read a book named “Make your own neural network” .

Now let’s get started and see how we can actually implement ANN using popular python library “Keras” .

Use Case

For our below example on ANN implementation, Our goal is to predict , when customer service representative call the customer, will they subscribe for a bank term deposit or not?

I have got the data set from: https://archive.ics.uci.edu/ml/datasets/Bank+Marketing

With out wasting anymore time, let’s start how we can build an ANN model from a scratch.

Building ANN Model

Step 1 — Import Dataset

Same as any other machine learning model, first step will be importing the dataset that we are going to use and do the necessary preprocessing if required.

Let’s import the dataset

you can get some glimpse of basic stats about dataset from it.

Step 2 — Check for Null Values

Another very important preprocessing step is, identifying missing value in our dataset and remove or replace it with some reasonable value(if not those missing value may cause poor biased model prediction).

Let’s check any missing value in our dataset.

Missing Values

If there were any missing values, we might have got bars in the graph for each variable, but in our data, no such missing values hence we may not need to do any imputation .

Step 3 — Encoding categorical variable

Since most of the parametric ML algorithm will get the input vector only in numeric, we have to convert the categorical variable in to numeric one before building the model.

a. Label Encoding

For example, if we have categorical variable: Sex {Male, Female}, we should convert this variable in to {0,1} before using it as a variable for the model, this technique is known as ‘label encoding’ .

Converting the categorical variable value in to some numerical variable with integer values is know as ‘label encoding’ or ‘integer encoding’

b. Encoding for ordinal variable

Label Encoding it self is enough for a categorical variable with only two possible values (ex: gender), but for a variable with more than two possible values with some order in it should be encoded with the same order.

For example, ordinal variable has values→ good, average, bad

If we are going to use label encoding, it will just randomly give number as below,

good: 0 , average :2 , bad: 1

This doesn’t make sense for ordinal variables, since good > average > bad , we should encode that in that same order as below(which is the right one)

good : 2 , average : 1 , bad : 0

categorical variable with values which has some sort of order in it should be encoded in the same order, so that when we are using it for an algorithm, it will correctly interpret the significance of it .

Although there are few ways we can achieve the label encoding for categorical variables using sklearn, I often use Find/Replace for encoding ordinal variables as we are having more control on this approach to encode each values.

Let’s see what are all the categorical variable we have,

Categorical variable

Among all of them when you glance at the dataset, you can find — job , education and poutcome has some order in it , though it may not obvious at first , it indirectly tells something. for eg, a person with high paying job may tend to open term deposit than a person with low paying job .

the same applies other two variables, so let’s start encoding them using find/replace according to the order we infer from it.

Now we have created dictionary with associate encoding values, notice that I have done the same for depended variable y too.

Okay, now let’s encode the rest of the nominal variable(variable with no order in it) using sklearn.

Which will convert all the remaining categorical variable in to integers.

Let’s get glimpse of what we have done,

Sample data set after label encoding

Okay, fine! so we have encoded all the categorical variables (both nominal and ordinal , so will it be enough?

when you notice ‘marital’ variable you can see that our sklearn label encoder, converted it into integers (1,2,3..) , as we discussed earlier even though there is no order with in the variable it has converted in a way the one integers which has order in it, if we use this directly to our algorithm , martial status 2 may values more than 1 , to avoid that we have to follow another approach called “One Hot encoding” .

The one-hot encoder creates one column for each value to compare against all other values. For each new column, a row gets a 1 if the row contained that column’s value and a 0 if it did not .

Let’s apply one hot encoder using pandas,

we are using drop_first = True in above code to avoid ‘dummy variable trap’ which is something with underlying concept of ‘multi collinearity’ .

After one hot encoding

You can notice that , for ‘marital’ variable, it has created three unique columns and for each values if the value is present, then it will be 1 otherwise 0 .

Cool! Probably we are done with required data preprocessing and now its time for the interesting part, everyone, let’s build the ANN .

Step 4 — Split data set

Same as all the other learning algorithm, we have to assign the dependent and independent variable and split the dataset in to training and test.

Step 5 — Feature Scaling

This is one of the significant step, we are reducing the magnitude of all the variable and making it similar to each other, so that our algorithm run much faster and provides reasonable parameter.

Step 6 — Build ANN model

Let’ s import the keras packages

Let’s initialise the ANN model

Let’s add our first layer

Below parameter explanation,

  1. units — number of nodes for first layer
  2. activation — activation function we use hidden layers
  3. kernel_initializer — initiating weight as close as 0
  4. input_dim — number of independent variable in our dataset

How many layers and nodes that we need to add is totally depends on us, we can play different architecture and come up with a effective one , but a general approach is using nodes as

number of dependent variable + number of independent variable / 2

But it is not a mandatory, you can choose different number of nodes too.

drop out is a technique to randomly ignore units during training phase which was chosen randomly

Let’s add second layer

Let’s add third and final layer, since it is a binary classification one final node is enough.

As a next step, let’s add the type of parameter, cost function and metrics we use for model.

Perfect!! Now we have built the skeletons of neural network, now it is time to pass the input from training set and train the model 😃

batch_size — number of sample it takes for each iteration

epochs — number of iteration to optimise the model

Once you run the above step, we will see the each iteration as below until its complete, beautiful isn’t it !!

Fine!! Now our accuracy for model in training set is 90.21 % , which means if you give the same training set data to model it will predict correctly on 90.21% of the data.

Step 7 — Prediction

Let’s feed the test set data and see how it predicts.

y_pred array contains boolean value of whether dependent variable has more than 50% chance of being ‘yes’ or not.

If ‘yes’ it will be True, if ‘no’ it will be ‘False’ .

What is the accuracy on test set ?

Accuracy of a model on test set : among all the actual samples in dataset , how many we predicted correctly

True Positive + True Negative / total sample

We can use the confusion matrix to tabulate the number of misclassifications.

Generally confusion matrix would look like this for our target variable 'y'

Confusion Matrix structure

Let’s tabulate confusion matrix for out actual test set classes and predicted one.

Confusion Matrix
  • True Positives (TP): People who had not opted term deposit and were also predicted to have not opted term deposit.
  • True negatives (TN): People who have opted term deposit and were also predicted to have opted term deposit.
  • False negatives (FN): People who had not opted but the prediction says they opted.
  • False positives (FP): People who opted term deposit but the prediction says they don’t.

Let’s calculate accuracy,

True Positive + True Negative / TP+TN+FP+FN

7174 + 237 / 7174+105+722+237 = 0.89

Accuracy on test set is 89%

Accuracy would often give general idea of how model would perform. If our training set has more or less equal number of positive and negative classes then we can believe the accuracy as standard of model performance.

However, our model has unequal distribution of positive and negative classes of dependent variable hence we may have to use ‘sensitivity’ and ‘specificity’ to evaluate further.

Count of positive and negative class in training set:

Positive Class ‘0’[client who not opted deposit] : 36548

Negative Class ‘1’[client who opted deposit] : 4640

Sensitivity is used to evaluate the confidence of model in predicting positive class

Sensitivity = True positive / True positive + False negative

7174 / 7174 + 722 = 0.9085

Sensitivity of model against test set is 90.85 %

This means if you give a new unknown test set in the model, it will highly likely predict ~90% of the positive class correctly.

Specificity is used to evaluate the confidence of model in predicting negative class

Specificity = True Negative / True Negative + False Positive

237 / 237 + 105 = 0.692

Specificity of model against test set is 69.2 %

Which means if we use our model for unknown sets, only 69% of the time only it will predict negative class correctly.

Hence, as we have in our example our aim was to predict the customers who will possibly answer the call and accept to subscribe for term deposit is a negative class ‘yes’ , model will predict this with accuracy of 69.2% only.

If correctly identifying positives is important for us, then we should choose a model with higher Sensitivity. However, if correctly identifying negatives is more important, then we should choose specificity as the measurement metric.

Since our model performs not so great at predicting negative class, still there are space for model improvement, you can achieve that in multiple ways like, adding more layers, adjusting drop out rate, using different neural network architecture and so on.

End Note

I hope this might have given you some basic idea about cooking the Neural network with keras tool and evaluation metrics, I would highly encourage you play with this same data and try to improve the accuracy .

If you find this is useful or have any suggestions, please leave a comment below.

Keep hustling for a better future !!

--

--

Arun Ramji Shanmugam
Analytics Vidhya

Seeking spirituality in science. I write about tech , life and AI .