Fighting Malaria with Machine Learning | Towards AI
Using Transfer Learning to Detect Malaria Diseases
Helping the world to fight malaria using deep learning and transfer learning with TFHub
Malaria is still one of the most common infectious diseases to-date, and global challenge to tackle with. It is caused by a parasite that is transmitted through the bite of infected female Anopheles mosquitoes. The parasite that causes malaria is a microscopic, single-celled organism called ‘Plasmodium’.
- In 2017, there were an estimated 219 million cases of malaria in 87 countries, with 435,000 deaths.¹
- In 2017, the African Region was home to 92% of malaria cases and 93% of deaths.¹
- Malaria is most commonly found in the tropical and sub-tropical areas of Africa, South America, and Asia.
- Despite its fatality, it can be cured if detected early. However, the way to diagnose malaria accurately is by taking a drop of blood, smearing it on a slide and then examining it under a microscope to look for malaria parasites inside the red blood cells.
The healthcare industry starts to turn to use machine learning and train the image classification model to help to reduce the burden of microscopists in resource-constrained regions and improving diagnostic accuracy.
I will use a pre-trained model component in TensorFlow Hub to demonstrate how we can use it for other problems, hence malaria. The article will follow the general machine learning workflow:
- Examine and understand the data by visualizing those microscope pictures
- Build the data pipeline to the model
- Compose the model
- Train the model
- Model evaluation
The data set and explanation can be found here².
The slide images of red blood cells are provided and encrypted in a
zip file. We should confirm how many images are given and what extensions are being provided.
Once we have removed unnecessary files from the list (other files which are not images). Then let’s observe the sample images from each class.
One note from plotting these sample images, we can see that the images are not equal in size and this needs fixing before feeding onto the model.
Human is good at looking and finding the patterns in the image, we can notice the infected red blood cell images have the purple dense color, which indicates the malaria infection.
Normally, when we are working with a dataset, we would load those data into the memory (i.e.
pandas data frame or
numpy array). However, when we are working with other unstructured data or large amounts of data, we cannot fit all images into the memory (maybe we can but it will cost significantly).
In this article, I will demonstrate the use of the flow_from_directory() method from the ImageDataGenerator class to feed the training image to the model.
So now we need to restructure the folders from what we currently have, below image demonstrates the thought process on this.
We ultimately only scan for file names and labels associated with the images without loading the actual images onto memory.
Transfer Learning with TFHub
In this step, I will use the feature extractors MobileNet V2, which is available here.
We load the feature extractor from the TF Hub and get the expected image size. This will be used when we generate the data pipeline to feed the train, evaluation and test dataset from each folder.
Next, we can use flow_from_directory() method in ImageDataGenerator class.
- In the training data generator, I have applied some data augmentation like zoom, shift, and flip.
- All data generators will be scaled down to within [0, 1].
- We then resize the image for all data generators to fit the feature extractor module.
- The training and validation generator will be shuffled because we want to feed the images randomly.
We can then wrap the feature extractor with the classifier layer on top, specifically for our malaria classification task.
We can start training the network with our data set, in the callbacks, I have included some callback functions to fine-tune and stop the training if there’s no improvement.
Use the history to plot the progress and information during the training’s epochs.
Finally, we can use the model to predict the test data generator.
This simple transfer learning can achieve good performance across the metrics.
The AUC score: 0.9405, accuracy score: 0.9405, f1 score: 0.9413
What else can we improve?
Based on the existing MobileNet V2 model, we can unfreeze the weight and train several upper layers (not all). This will allow the upper layers, which are specialized in detecting specific patterns to fit into malaria images.
In this article, I have used the pre-trained feature extractor from TFHub to predict malaria from the slide image data. With this approach, it enables reusable machine learning modules and rapid data product development.
For the full code, please visit the following GitHub repository for a full notebook explanation: