Building Your First Brain : Learn how to create a Neural Network from Scratch

10 min readMar 7, 2024

In this article, we will be learning how to perform churn analysis by building an Artificial Neural Network (ANN) using Keras in a Jupyter notebook.

Overview

Before we dwell deep into this article, do you know what churn analysis is?

If not, let me help you understand the term. Churn analysis is the process of identifying why a person has stopped using your service or why an employee has resigned from an organization or why a subscriber unsubscribed from Netflix and so on. The person must have a valid reason to do so, like, for example, person ‘x’ gave up his/her YouTube premium membership because he/she could not afford the fee anymore.

So, with churn prediction, we can try to retain our customers who are at high risk of leaving, by taking proactive actions like in the case of our YouTube example, reducing the YouTube premium fees for 3rd tier cities.

Now, that we know what churn is, we will now be writing a Jupyter notebook to perform churn analysis and predict whether a customer will be exiting the bank or not.

Dataset: Churn_Modelling.csv

The dataset has 14 columns.

Dependent feature/target = ‘Exited’ column which has a value of 1 or 0 which says if a candidate has exited the bank or not.
Independent features/predictors = “CreditScore”, “Geography”, “Gender”, “Age”, “Tenure”…… which will play a major role in predicting the exit.

Our motive — If we are given a new row of predictors, we should be able to predict if the candidate will be exiting the bank or not.

Let us begin:

Importing necessary libraries and packages

### Importing the nencessary libraries and packages 
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Read the dataset

### Read the dataset
data = pd.read_csv('Churn_Modelling.csv')
data.head()

We can see here that important features like CreditScore, Balance, NumOfProducts, HasCrCard and others are some of the important features that can help us in predicting if the user will be exiting or not.

Split the data into predictors and target variable

X = data.iloc[:,3:13] #taking all the rows(:) of the columns from 3rd (CreditScore) to 12th(EstimatedSalary)
y = data.iloc[:,13:14] #taking all the rows(:) of the last column (Exited)

Preprocessing

If you notice the dataset, we have 2 categorical features — ‘Geography’ and ‘Gender’. We would have to convert these categorical features into dummy features.

What are dummy features?
Suppose you have a categorical column of say gender having 2 values ‘Male’ and ‘Female’. When you execute pd.get_dummies(‘Gender’) , you will get 2 columns namely ‘Male’ and ‘Female’. Okay , now you have 2 columns ‘Male’ and ‘Female’ which are basically just the 2 categories of the categorical column. Now , you go through each row and if in that row the gender columns orignally was written ‘Male’ , then you just put a 1 in the current ‘Male’ column that just got created after the get_dummies.

geography = pd.get_dummies(X['Geography'])
gender = pd.get_dummies(X['Gender'])

Now, next step…. drop those ‘Gender’ and ‘Geography’ columns and replace ‘geography’ and ‘gender’ in their place.

X.columns ## columns before

Columns after dropping ‘Geography’ and ‘Gender’ and concatenating ‘geography’ and ‘gender’ in X.

X = X.drop(['Geography','Gender'],axis=1)
X = pd.concat([X,geography,gender],axis=1)
X.columns #columns after

ALRIGHT!!! We have done quite some preprocessing here, now it is time to do some modeling!

Let's start with Train-Test Split.

What is train-test split?

It is as simple as splitting the dataset into training and test data. Now if you have a question of why we split it, allow me to explain it with a story.

Storytime alert!
Imagine you’re studying for a test. You have a textbook with all the answers, but you don’t want to just memorize those answers (like how we actually did in school sometimes :)) ,you also want to understand the material so that you can answer new questions in the exam.
So, what do you do? You split your study time. You spend some time learning from the textbook (that’s like training your brain), and then you take a random practice test with new questions without looking at the textbook (that’s like testing what you’ve learned).
Why? Because if you only study from the textbook and then take the test using the same textbook, you might feel like you did great, but you won’t know if you really understand the material or if you just memorized the answers. So you need to test your understanding with new questions.

Similarly, in Machine Learning, we split our dataset into two parts: one part to train the model (like studying from the textbook) and another part to test the model (like taking the practice test).

#train_test split
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2) #test split is 20% of the whole dataset

X_train.shape

NOTE : test_size = 0.2 meaning if we have 1000 records

train data size = 80% of 1000 = 800 rows
test data size = 20% of 1000 = 200 rows

Next is Feature Scaling

Why feature scaling ?

When we create deep neural network , we have input layer, hidden layer and output later. Now we know that first step is we multiply the inputs with weights (along with addition of a bias).

Now in this case , suppose the magnitude of my input is a huge number , isn’t it obvious that the computation of multiplication of inputs and weights will take longer time ? Yes !

Soo , wouldn’t it be better if we just scale down the inputs to a certain lower range, to make the calculations quicker ? Yes !

That’s what we will be doing :)

# Feature Scaling
#Note, Standard Scaler scales the data such that the mean is 0 and the standard deviation is 1.
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.fit_transform(X_test)

As I mentioned, this is deep learning jupyter notebook, we will be experimenting with keras libraries and packages.

import keras
from keras.models import Sequential
from keras.layers.core import Dense
from keras.layers import LeakyReLU,PReLU,ELU
from keras.layers.core import Dropout

Steps to build a neural network :

Now, that we have our libraries and packages in place, we are ready to proceed with the modelling.

Step 1: Take a deep breath.

Step 2: Initialize an empty neural network.

It is like an aura figure of a person. We know the person is there and we are initializing its outline, but we don’t know how many bones, organs are we giving that person.

Well, in this case, we don’t know how many layers or how many neurons to each layer are we setting/initializing.

classifier = Sequential()

Step 3: Adding input layer and first hidden layer

classifier.add(Dense(units =6,kernel_initializer='he_uniform',activation = 'relu',input_dim=13))

Too many terms? Let’s understand one by one (it’s not that difficult, to be honest!)
Dense = Dense layer to create the first hidden layer. It simple layer of neurons in which each neuron receives input from all the neurons of the previous layer (input layer for 1st Hidden layer), thus called as dense.
units = In my first hidden layer, I want 6 hidden neurons.
kernel_initializer = How do you want your weight to be initialized I am using ‘he_uniform’ to initialize weight here.
input_dim = how many input features are connected to this hidden layer (since we have 13 features in training data)
Note: For Relu, ‘he_uniform’ and ‘he_normal’ works well!

Step 4: Adding second hidden layer

classifier.add(Dense(units =6,kernel_initializer='he_uniform',activation = 'relu'))

In short:
- 6 hidden neurons in second layer
- weight initializer techinque used is he_uniform
- activation technique used is ReLu

Step 5: Adding output layer

classifier.add(Dense(units =1,kernel_initializer='glorot_uniform',activation = 'sigmoid'))

In short:
- 1 hidden neuron
- weight initializer technique used is glorot_uniform
- activation technique used is Sigmoid (since we are doing binary classification, we use sigmoid. If the output value is greater than 0.5, the output will be 1)

Step 6: Compiling the ANN

Background: As we know, in the first epoch, weights and inputs get multiplied (added with bias) and after that an activation function is applied for every layer. Now, when you get final output, you calculate the loss function, and you use an optimizer which will reduce the loss function.

Here we will use:
- optimizer = Adamax
- loss function = binary_crossentropy (when o/p is 0 or 1, we prefer using this loss function)
- metric = accuracy

classifier.compile(optimizer='Adamax',loss = 'binary_crossentropy',metrics = ['accuracy'])

Step 7: Fitting the model to training data

Now, it is finally time to fit the model. There are a few things we need to understand first ..

validation split — Why is validation needed? It allows you to monitor the performance of your model during the training process. By evaluating the model on a validation set at the end of each training epoch, you can track how well it is learning and detect potential issues like overfitting early on.

batch_size — Why batch size is needed? so that computation power becomes less. We don’t have to load a huge amount of data at once (the size of datasets could even to go 1M in real-life datasets). As a result, our RAM will be relatively free to function other activities.

Storytime alert (validation split importance)!
Imagine you’re learning to play a new game. You practice with some example challenges and keep track of how well you’re doing after each practice round.
Here’s the catch: Sometimes, you might try different ways to play the game. You need a separate set of challenges (validation set) to see which strategy works best. This helps you decide which strategy to use in the real game. If you keep practicing with the same challenges over and over, you might get really good at those specific challenges but struggle with new ones. The validation set helps you see if you’re truly improving overall, not just on the easy stuff.
Conclusion: So, in simple words, the validation split is like checking your progress, testing different strategies, making sure you’re learning everything, and ensuring you’re not cheating by memorizing specific examples. It’s essential for learning effectively and improving your skills.

model = classifier.fit(X_train,y_train,validation_split=0.33,batch_size=10,epochs=100)

So, what do we understand from this?

My training accuracy is about 85.8% and my validation accuracy is about 86.4%. Both have pretty good accuracy percentages.

Also please notice, that we see quite a good accuracy on validation data, so can we say that it would work equally well on the test data, that we still have waiting for us? Well, I can’t say anything firmly at the moment, but let’s try it out!!

FINAL STAGE: Predicting on the test data

y_pred = classifier.predict(X_test)

y_pred = (y_pred>0.5)

y_pred

Accuracy

Now that we have y_pred (which is our predicted value) and y_test (actual value), let us calculate the accuracy percentage.

from sklearn.metrics import accuracy_score
score = accuracy_score(y_pred,y_test)
print(score)

The accuracy percentage is approximately 86.8%, which is not that bad. The accuracy percentage can be improved further using hyperparameter tuning.

Additional: Let us use confusion matrix for better understanding of the output..

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test,y_pred)
cm

Okay, the confusion matrix can be tricky to understand at times (PS: most of the times :P)

I will try to explain in my way and let’s see if that is understandable!

In the test data -

This was fun ,but while I was coding, I realized, I have so many more questions, few being:

How do I know how many layers to use?
How do I decide how many neurons should go in each layer?
How do I decide which optimizer is best for my neural network?
What are the different types of weight initialization techniques?
What are the different loss functions?

and many more…

Aand…that’s it for now! Thank you for reading!!!

Please feel free to drop your comments/reviews. It would help me in enhancing my knowledge, which is always good :)