Modern RecSys

COVID-19 Case Study with CNN

We will cluster COVID-19 X-ray images based on severity with our CNN RecSys flow using transfer learning, Spotify’s Annoy, and PyTorch

Kai Xin Thia
Mar 21, 2020 · 5 min read

This work is meant as a proof-of-concept of how we can apply the same framework we set up in the previous CNN chapter onto a completely different domain.

We will swap out the training data and employ a more powerful pre-trained model (Resnet152); the rest of the code remains identical to the one we used for DeepFashion images. We aim to identify clusters of X-ray images with similar severity in infection using Approximate Nearest Neighbors.

This work is not intended as medical research nor representative of how we can use CNN to detect COVID-19.

This is part of my Modern Visual RecSys series; feel free to check out the rest of the series at the end of the article.

The COVID-19 Data

From left to right: Healthy, infected, seriously-infected X-ray images of patients. Source: COVID-19 image data collection by Joseph Cohen

Intuition of why CNN will be able to work well on this data set:

As outlined in the previous chapter, the strength of CNN is in the convolutional filters. These filters are very good at detecting shapes, lines, boundaries within the image. From the X-ray images, we see that as the infection worsens, the image blurs with more white areas and the rib cage becomes less visible; these are visual cues that CNN will be able to pick up and learn.

Cleaning the data

  • As there are less than 25 samples of CT scans and only 1 CT scan for healthy patients, I decided to remove CT scans and only keep X-rays.
  • After the cleaning, we have 102 COVID X-rays and 1,584 healthy X-rays.

Model training

  • Convert images to embeddings
  • Conduct Transfer Learning from ResNet152
  • Use Fastai hooks to retrieve image embeddings from step 2
  • Use Approximate Nearest Neighbors to obtain the most similar images based on the embeddings from step 3.

Analysis

Healthy X-ray scan with 36 most similar scans generated by our model. Source: COVID-19 image data collection by Joseph Cohen

For healthy X-ray scans, our model can pick up 36 most similar X-rays that are all healthy. The model can identify and cluster healthy scans.

Infected X-ray scan with 36 most similar scans generated by our model. Source: COVID-19 image data collection by Joseph Cohen

For infected X-ray scans, our model usually picks up a mix of 80% infected X-ray scans and 20% healthy scans. Depending on the degree of infection, the model finds it challenging to differentiate between the lightly infected scans and healthy scans.

Seriously-Infected X-ray scan with 36 most similar scans generated by our model. Source: COVID-19 image data collection by Joseph Cohen

For the seriously infected X-ray scans, our model can pick up 36 most similar X-rays that are all infected. The model can identify and cluster seriously infected scans.

Potential use case of this work

The Code

What have we learned

Explore the rest of Modern Visual RecSys Series

Series labels:

  • Foundational: general knowledge and theories, minimum coding experience needed.
  • Core: more challenging materials with code.
  • Pro: Difficult materials and code, with production-grade tools.

Further Readings

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Kai Xin Thia

Written by

Snr Data Scientist at Refinitiv Labs, M.S. CS Georgia Tech. 9+ years in data, found ❤️ in RecSys, NLP, Computer Vision, Applied R&D. linkedin.com/in/thiakx

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store