Detecting Malaria using CNNs

Shivang Mistry
Analytics Vidhya
Published in
8 min readOct 25, 2019
ROBODOCTORS!!!!

In 2017, there were 217 million cases of malaria in 87 countries. Malaria is quite a serious and common disease caused by a mosquito bite, mostly in tropical countries such as India, Philippines, and African countries. Africa accounts for 92% of the cases. If not diagnosed properly, and if the patient does not get the proper treatment, then they can die from malaria.

Diagnosing Malaria

Typically, to diagnose malaria, a blood sample is taken, then it is sent to a lab, where it is checked under the microscope, after which the results are sent back to the doctor, and finally the patient gets the required the treatment.

Long, right? What if we can just do all of this within seconds, without actually sending it to the lab? maybe we can just use our phone?

We want it so that we can show our phone/computer a picture of a blood smear and it would immediately and accurately tell us whether the patient has malaria. This is where machine learning comes in! There is this great neural network, that is able to take images as inputs, and give labels as outputs, called CNNs.

CNN? The news network or the neural network? 🤔

Convolution Neural Networks (CNNs/ ConvNets) are used to learn complex features in data. Which is why it is perfect for object detection and computer vision. They are able to recognize objects such as apples, cats, cars, street signs, and even faces. They are also good at analyzing text and sound. CNNs are powering major advances in computer vision, which have obvious applications in autonomous vehicles, drones, robots, and treatments for the visually impaired.

A CNN has 5 types of layers:

  • Input Layer
  • Convolutional Layer
  • ReLU Activation Layer
  • Pooling Layer
  • Fully Connected Layer

These layers are the building blocks for every CNN, so to have a complete understanding of CNNs, it is important to understand its layers.

“Trust the Convolution Process” — CNN Architecture

⚠️ CAUTION!! Technical Mess starts below!! ⚠️

Input Layer ➡️

It holds the raw pixel values of the image, usually accepts a matrix input. This includes the spatial form, the image width and height and has a depth representing the color channels. For our Malaria dataset, the input dimensions are 150 x 150 x 3.

Convolutional Layer

Convolutional Layers are considered the core of the CNN. Its main purpose is to extract features from the input image. The layer does this by learning from small squares of input data, by using a feature detector (a filter or a kernel are also common names).

Let’s take a look at an example to understand it better!

Let’s say we have a 5 x 5 image whose pixel values are only 0 and 1.

Source: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

Also, consider this 3 x 3 filter:

Source: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

Now the filter and the image are computed as shown below:

Source: https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

So what is happening? The orange filter is sliding over our image, moving over 1 pixel at a time. For every position each “image integer” is multiplied with the “filter integer”, and then the outputs are all added together. The final output from the image is then an element of the pink feature map (also known as activation map).

Another great real life example is shown below to help visualize:

Source:https://ujjwalkarn.me/2016/08/11/intuitive-explanation-convnets/

ReLU Activation Layer

With CNNs, we will often see the ReLU function being used. ReLU stands for Rectified Linear Unit. It is applied pixel by pixel, and it replaces all the negative pixels with 0. We can see this on the graph above, where all the negative values are 0. The reason ReLU is used is to introduce non linearity to our CNN, since most of the real world data has non linearity.

Pooling Layer 🌊

Pooling Layers are put in between Convolutional Layers. Their main job is to reduce the spatial size (width, height) of the data representation, to make it more manageable. The Pooling Layer also helps control overfitting, because it reduces the number of parameters and computations in the network.

Pooling Layers use filters to downsample the input volume. The most used setup is a 2 x 2 filter with a sliding factor of 2. This will downsample the spatial dimensions ( width , height) by a factor of 2.

Fully Connected Layer

We use this to layer to compute the class score (label) that we’ll use as the output of the network. The term Fully Connected means that each neuron in the previous layer is connected to each neuron in the next layer. Essentially, the Fully Connected Layer tells us what the image is. In our case it would tell us whether the blood smear has Malaria or not.

Training the model 🏃

Ok now that we know how Convolutional Neural Networks work, let’s get to using it for our problem. As we know, we need to show the computer what a blood smear with malaria looks like and what it looks like without malaria. In Machine learning we call this process “training the model”. The accuracy of the model depends on how well trained the model, which usually depends on the amount of data. Luckily for us we won’t have to create our dataset, because the National Library of Medicine has provided everyone with the dataset. The dataset has 2 folders — infected , and uninfected, with a total of 27, 558 images.

Before starting to train the neural network, we need to split the dataset into 2 folders that contains training data and validation data. The ‘train’ folder has 80% of the dataset. This is what the model is learning off of. The second folder which has 20% of the dataset is the ‘validation’ folder. This is what the model is evaluating itself off of. It also tells us how accurate the model is.

Results: After training it for 10 epochs, we got an accuracy of 94%, which is not bad. However it can be improved by using Transfer learning and fine tuning.

Training results for base model!!

Transfer Learning and Fine Tuning

Transfer learning is a method to get a higher accuracy for image classification problems. Essentially, what is happening is that a CNN is trained on the first task and then used again on a second, totally different task. As it would be harder to train a model with a small dataset and achieve higher accuracies, Transfer learning is used.

Now you’re wondering how do we take what the model is trained for and change it for another task? Great question!

There are 2 main scenarios:

Using Pre-trained CNN and using it as feature extractor:

We take a pre — trained CNN — such as VGG, Google’s Inception model, or Microsoft’s ResNet model — and open up the fully connected layer, which has the classification layer. We just add our own new classification layer. We use the rest of the trained network as the feature extractors, and re train the model on our dataset.

Fine Tuning with CNN:

Fine tuning is training both the Fully connected layer, and the ends of the feature extractor, so we get higher accuracy. As the name suggests we are just “tuning”/”tweaking” it.

Results: After training it again, we got an accuracy of 98% which is much better than the base model!

Done? Finally!

Now that we’ve trained the model, we can deploy onto a phone as an app, or make a web app. This tool that we created can help millions of doctors who don’t have many resources in rural areas of developing countries. They won’t need to send the blood samples to a lab. They can now just use a microscope and phone to easily diagnose whether the patient has malaria or not. This is just one application of this technology! Imagine if we could use this for detecting cancer, brain tumours, anomalies in genomes etc. The possibilities are truly endless!

Further Applications of ML

AI is being used in almost every industry whether it be finance, agriculture, or education, and it is disrupting all of them. However, something that gets me really excited is its applications in the Healthcare industry!!

Companies such as Benevolent.ai and Atomwise are using Deep Learning to completely changing how drugs are discovered! Even Google’s DeepMind is working on projects such as prediction of future acute kidney injury. This is just the beginning! People are doing some super dope things with deep learning in all branches of the Healthcare industry. In the next 5 to 10 years, the whole healthcare industry is going to change exponentially!

Check out the project and code:

Summary

  • AI is disrupting almost all the industries!
  • CNNs are used for extracting features from complicated data, such as images
  • CNNs have 5 types of layers: Input, Convolution, ReLU, Pooling, and Fully Connected
  • Input layer holds the raw pixel values of image, usually accepts a 3D input
  • Convolution layer extract features from the image
  • ReLU layer replaces all negative values with 0
  • Pooling layer downsize the spatial dimensions
  • Finally, Fully Connected Layer tells us what the image is
  • Transfer Learning and Fine Tuning is used to increase the accuracy of the model

Hey, hey, hey! If you are reading this, thank you 🙏 🙏 for making it to the end!

I’d also love to connect through LinkedIn, and learn about your thoughts on this topic!

--

--