Neural Network From Scratch — Tumour Diagnosis

Published in

Coinmonks

12 min readJul 13, 2018

image was taken from http://nerdreactor.com/wp-content/uploads/2015/11/cancercells.jpg

Intro

This article is written to illustrates how to build a simple 2 layer neural network from scratch! I have wanted to do this for a long time now as I noticed that when I was learning about neural networks I had a tendency to skip straight to the cool stuff.

Using great libraries like tensorflow and keras lets you build really cool projects with neural networks quickly and easily, which is great. However I noticed that there were big gaps in my understanding of neural networks and I realised I didn’t really have a solid foundation in what was happening under the hood.

This project was a great exercise in really understanding the low level concepts of neural networks and I hope that others will find it usefull as well. Any feedback or constructive criticism would be appreciated.

The Project

In this project we will be trying to diagnose whether a tumour found in breast cancer is malignent or benign. This will be using the Breast Cancer Wisconsin data set found here.

This is a relatively small dataset with 569 entries. These datapoints contain 30 features that we will learn from and a binary classifcation of malignent “M” or benign “B”. Like all ML models, this project is split up into 2 parts, data preparation and the building of the model.

You can quickly follow along with this tutorial by running this project as a template on FloydHub. Just click the button below, everything is already set up and ready to go!

If you want to learn how to get started with building your own models on the cloud with FloydHub check out my previous article:

Building Your First Neural Network On The Cloud

In this article I am going to walk you through how to easily build an image classifier that can distinguish between 10…

medium.com

Data Prep

All ML projects are different. Even if we use the same model, the data being used is rarely the same. Its also worth noting that good data is more important than a good model. Much like in nutrition, you can’t out exercise a bad diet. If you’re putting bad food into your body, your gonna see bad results. In ML it doesn’t really matter how amazing and complicated your model is if you are feeding it bad data. Because of this, it is important to pay attention to our data preparation phase. This notebook just carrys out very basic data preparation but the Kaggle contributer Kaan Can has a very comprehensive tutorial that covers all the necessary skills for data science and has a whole section on cleaning data. The tutorial can be found here

Now on to the actual code! The first thing we need to do is import any libraries such as pandas and then load in our data set.

As you can see from the snapshot of our dataset we have 569 columns with 33 rows. After a quick glance through our data we can see that we don’t need all of it. The last column is simply empty data so the first thing we will do is remove it.

The next step is to seperate our feature data and our targets. The features is the data we want to learn from in order and the targets is the diagnosis, whether or not the tumor is malignent.

If we look at our target labels, they are represented as either ‘M’ or ‘B’. Now this makes the data easily readible for humans, but its actaully harder for a machine to interpret this. In order to make our labels more machine friendly we are going to change the labels to be either 1 or 0. Because I’m lazy, I’m going to use the sklearn libaray to quickly do this

The next thing we will look at is splitting up our data. Whenever we are training our models we need to divide our data into 2 sets, a training set and a testing set. To do this we randomly sample from our entire dataset and split the data with roughly a 70:30 ratio, but this can vary depending on the project.

You might be thinking “well, we dont have much data to begin with, why dont we just train our model with all of our data and get the most out of what we have?” and that’s a good question. The reason is that we don’t want our model to overfit to the training data.

Overfitting means that our model has gotten REALLY good at understanding and predicting the training data and can have a really high training accuracy. However, the model has ONLY learned about the training data and hasn’t learned how to generalise. So when we introduce some new data, the model will perform poorly. It’s like when you are studying for a test, but you have only studied the past exam papers. You have gotten really good at answering those specific questions, but when you are asked questions that didnt come up in the past exam papers, you can’t answer them because you haven’t really learned the topic.

In order to identify if this is happening to our model we always train the model on a sub section of the data and then test the model on another portion of the dataset to ensure that it is learning correctly.

Also, it is important that the data is randomly sampled, otherwise you might get data sets that contain highly correlated data and can effect your results.

The final task we need to carry out is called feature scaling. At the minute our data is looking good and it makes a lot of sense to look at if you’re a scientist or a doctor, but remember earlier we talked about how it is difficult for machines to understand human readable values? Although the data makes sense, we can convert our data so that all values are scaled to be within the same range. This makes it much faster for our network to train.

We are going to use sklearns StandardScaler function to do this. This function will transform each variable of our data to have a mean of 0 and a standard deviation of 1.

Building The Network

So now that we have finally cleaned up our data we are ready dive into the good stuff! Neural Networks are an exciting area of machine learning and have grown massively over the past several years. However they can also be quite complicated and involve a lot of maths. This notebook will illustrate the building of the network and the theory behind, but wont go into too much detail on the low level maths. For more information on the math side of things checkout 3 blue 1 browns great video

Below is the full project code. After the code is run and the results are displayed the notebook will explain each of the sections.

What Is A Neural Network?

Like the name suggests, neural networks are designed after the architecture of our brains. Eletrical signals are sent through layers of neurons in our brains that are all connected together and eventually forms an output that our brain can understand and act on. Just like our brains architecture, our network has several layers made up of perceptron nodes(neurons).

Each of these nodes simply holds a number. These nodes then have their numbers multiplied by the weights connecting them to the next layer. The data is inputed into the first layer of the network, there is a node for every feature in the data set. These inputs are passed through each layer of the network with the layers weights applied to them. After the inputs make it through the entire network we are left with our output. In this case we have a single output node that will be either 1 or 0(Malignent or Benign). This process is known as the forward pass. The next step is the backward pass. This is where the learning actually occurs. We find out how well our network predicted the outcome. We then move backward through the layers and updated the weights of the layers in order to improve our predictions. Nodes that are considered more important will end up having a higher weight value.

Building Blocks

Above is a very brief description of what is going in the neural network. Now we are going to look how to build it. The way I lik to think of implementing a neural network is to split it upon into several logical sections or building blocks. These are as follows:

Network Initialisation
Forward Pass
Backward Pass (Backpropagation)
Update Weights
Testing & Evaluation

1 — Initialisation

The first thing we need to do is initialise our network. Here we will determine the structure of the network including the number of nodes for each layer(input, hidden and output), the learning rate, the number of epochs and the initial random weights for each of our layers.

Input Nodes

This is the number of features for each entry in our data set, in this case it will be 30.

Hidden Nodes

This is the number of nodes in our hidden layer. For this project I use 12 hidden nodes. This is an arbitrary number that I decided upon through experimenting with different values.

Output Nodes

This is simply how many outputs our network will have. In this case we are carrying out binary classification so our network can either be 0 or 1. Due to this we will only use 1 node. If we were building a classifier with more than 2 types (such as classifying different types of animals) we would have several output nodes.

Learning Rate

The learning rate determines how much we adjust our weights during the back pass. A high number will let our network make progress quickly, but as the learning progresses we might adjust our weights too much, and increase the error as opposed to decreasing it. A low learning rate will gradually learn and be able to generalise better, but will take longer to converge and requires more training epochs. A learning rate that isnt too high or too low is recomended to begin with and then you can adjust the learning rates based on the needs of the network. For this example I decided on a learning rate of 0.1 which is generally a good starting point.

Epochs

The number of epochs determines how many times we will train our network on the data set.

2 — Forward Pass

Example of simple forward propagation through a single layer

As mentioned previously the forward pass is the prediction section of our network. We feed our input in through the layers of the network, applying the different weights as we go. This eventually leaves us with a prediciton. The steps in the forward pass are as follows:

1. Get the input

The input for the layer is the output of the previous layer, or in the case of the input layer it is simply the training data.

2. Apply the layer weights

Now we apply the weights by getting the dot product (multiplying) of the inputs and the layers weights.

3. Activation function

Once we apply the weights to the inputs we apply our activation function. In this project we are using the sigmoid activation function. This takes in a number and turns it into a value between 0 and 1. This function suits our needs are we are looking for a binary output, the tumour is either malignent or benign. For classification problems with more classes you would use a different activation function, like softmax.

Diagram of the sigmoid activation function. Image can be found here Binary-sigmoid-activation-function-The-limited-numeric-response-range-together-with-the.png

3 — Backward Pass

The backwards pass is the section where our network actually learns, this is known as backpropagation through gradient descent. Here we take our prediction and compare it with the actual true value. We then move backwards through our network and update the weights of our layers in order to improve our predictions. There are 2 key steps in the backwards pass of the network.

1. Find the error of the network

First we calculate how far off our prediction was. This is done by subtracting our preicted value from the true value

2. Calculate the error for each of our layers

Now that we know the error of the network we need to find the error for each of our layers. We do this by multiplying the error of the previous layer by the derivative of the current layer.

The final layer (layer 2 in this case) will be using the network error as there is no previous layer. The derivative of the current layer is calculated using the sigmoid prime function shown below. The next layer uses the dot product of the previous error and the weights for the previous layer as the error. Once again this is multiplied by the sigmoid prime of the current layer

Diagram displaying the sigmoid function and its derivative. Image can be found here https://i.stack.imgur.com/QdlcW.jpg

4 — Update Weights

Once we have completed our backward pass of the network we need to update our weights accordingly. For each layer of weights we need to carry out the following steps.

1. Calculate how much to add to the layer weights

For each layer of weights we need to calculate the adjustments to add to the weights. This is done by getting the dot product of the previous layer and the error of the current layer that we found in the back pass phase. We then multiply that dot product with our learning rate.

2. Add the adjustments to the current layer weights

This is as simple as it sounds. Just add the adjustments we just calculated to the weights.

For some of the calculations we need to use the transpose of the value (.T) this is just so the 2 matrices can be multiplied correctly. This requires some knowledge of matrix math and isn’t covered in this article. To learn more, check out this great tutorial series by Khan Academy.

Congratulations you have just built the core model for your neural network! The only thing left to do is test the accuracy of our network.

Finally, here is the completed training method.

5 — Testing and Evaluation

All we are doing here is recreating the forward pass in order to test how well our model works, except here we use our testing data instead of our training data. We also keep track of how many correct predictions we make and the list of predictions made.

For each prediction made we will compare it to the actual target and see if our model is correct, updating our statistic variables.

We then print out the accuracy of our model to the terminal.

Conclusion

I hope this writeup was able to help others learn and understand neural networks a little bit better. There is still a lot about the topic that wasn’t covered here. This project simply looked at a very basic implemtation of a deep neural network. After writing this notebook I feel that I have gained a much better understanding of how neural networks actually work. This will be a big help when I move on to more complicated models using libraries such as tensorflow.

The full code and notebook can be found on my github.

djbyrne/Neural-Network-From-Scratch-Tumour-Diagnosis