A Complete Guide to Applications of CNNs

10 min readDec 31, 2022

If you’ve ever broken a bone, your first instinct was probably to see a doctor. They likely told you to get an X-ray done, before putting on a cast to examine the break.

In a perfect world, the doctor to patient ratio would be 1:1 and this process would be easy, but that’s obviously not the case. In our world, the doctor to patient ratio is significantly higher, and in 2018, the United States had approximately 26.1 doctors for every 10 000 people (WHO). That means that for every doctor there was, there were roughly 385 people part of the population that would hypothetically see them in the event that anything occurred. Of course, we can’t forget that not all physicians practice, meaning that the doctor to patient ratio is probably much higher than that.

So let’s revisit the scenario where you break a bone. If your doctor had to examine and evaluate the X-ray scans of several other people, AND care for the 385+ other patients that they are responsible for, you might have to wait HOURS at the minimum before getting a firm diagnosis.

Example of a Convolutional Neural Network

Breaking a bone is a really simple example, but medical imaging including MRIs, X-Rays, and CT Scans are used for so much more than that, including disease detection. That’s where Convolutional Neural Networks (CNNs) come in. They can increase efficiency in disease detection and diagnosis by increasing speed and accuracy at the same time.

CNNs are the Perfect Deep Learning Models for Medical Imaging… here’s why

The first step here is understanding why deep learning is better than machine learning in healthcare. You can read more about that in my previous article here.

CNNs are deep learning networks that can be trained for lots of image analysis tasks including scene classification, object detection and segmentation, and image processing.

Let’s compare CNNs to traditional neural networks to better understand what sets them apart.

In simple traditional neural networks every neuron in the input layer connects to a neuron in the hidden layer(s). In a CNN only specified “regions” of the input layer neurons connect to a neuron in the hidden layer. These “regions” are referred to as Local Receptive Fields.

Just like a normal neural network, a CNN has weights and biases. These weights and biases change and update as the model learns from the data and training it’s put through. The difference is that in a CNN, the weights and biases stay the same for the neurons in its hidden layer. This ensures that the CNN detects the same features in an image. For example, only finding bicycles in a picture.

There’s one last thing that sets CNNs apart, and it’s the Activation and Pooling layers. During activation, transformations are applied to the outputs of neurons using activation functions. An example of an activation function is ReLU (Rectified Linear Unit). It changes the outputs of neurons to their highest possible values. If a number is negative, then its highest possible value becomes 0.

After Activation comes Pooling. The main purpose of pooling is to reduce the number of outputs. It works to condense the output of small regions of neurons into one single output.

Ways to Use a CNN for Image Analysis

Train it from Scratch

Very accurate but challenging
You need a large amount of data and example images to make this work

Transfer Learning

Using knowledge to solve one problem to solve another similar problem
Uses less data because it comes from a trained CNN

Feature Extraction

Uses a pre-trained CNN to extract feature to train a model
A good example is using a CNN trained to detect edges for other images since it has a wider domain of applications
Uses the least amount of data

The problem with using CNNs in healthcare is that they need a large amount of data to be reliable, as does any AI model. In this case, the data may be the images of several patients diagnosed with a given condition.

The training dataset images will need to be approved by certified health professionals in the field to ensure that the diagnosis will be accurate. For example, optometrists may evaluate the validity of a retinal disease image.

We can use CNNs to detect and diagnose diseases

AMD is an eye disease that blurs your central vision. It is typically caused by damage to the macula– a light sensitive tissue at the back of your eye that’s part of your retina.

Current AMD Diagnosis

AMD tends to be detected through a routine eye checkup, or when someone becomes symptomatic (meaning they have sight threatening late AMD).

When a doctor diagnoses AMD, they look for a substance called “drusen”.

Drusen are yellow deposits that form under the retina, but they don’t directly cause AMD, they increase your chances of having it and are usually a sign of having the condition itself.

The problem with this diagnosis procedure is that it takes too long and as a result, can delay the detection of AMD. That’s where CNNs come in.

Here’s how we fix these problems with CNNs…

Though drusen looks like a yellow substance to us, to a computer it’s just pixels. Each pixel has its own value (i.e weight), and a CNN can process these values to find what’s in the image (i.e. image recognition).

To do this the CNN might use colours, edges, shapes, and objects to generate an output. When diagnosing AMD, the output would be in 2 channels: Drusen or No Drusen.

Here’s a short walkthrough diagram of CNN. Notice the layers, as we’ll dive deeper into them later on. A key thing to note here is that though the output in this diagram consists of several options, in an AMD detecting CNN, the outputs would be pooled into just two.

Building a CNN to detect AMD from Scratch

Before diving into this, it’s important to recognize that the model and research done below is the product of Dr. Emma Pead. You can find the link to their original work here. The text was something I used to understand CNNs and its applications in eyecare.

Okay, now let’s get into it.

Step 1

The first step in the CNN would be understanding an artificial neuron.

Artificial neurons are the neurons in the neural networks we’ve been talking about this whole time. They accept inputs (i.e. weights) and output a response.

The math inside the neuron before outputting a response is summing up the weights of the input and adding a bias.

We need a bias to make sure that the sum of the weights doesn’t equal zero. If the weights summed to zero we would have no output (so the bias would be that output ≠ 0).

Activation Function = (w₀ a ₀ + w₁ a₁ + w₂ a₂ + … + wₙ a ₙ + bias)

The output response is known as the “activation function”. Earlier in this article we talked about ReLU (Rectified Linear Unit). This is a great example of an activation function. ReLU works to change the outputs of neurons to their highest possible values. If a value is negative it becomes 0.

Another popular activation function is the Sigmoid function. This function works to adjust all the values to fit between 0 and 1.

There are a few more activation functions like these two, but for now let’s stick to those.

Step 2

Now we dive into the first layer of the CNN; the convolution layer. This layer’s job is to compare an image piece by piece and assign weights to them.

Here’s how Dr. Pead chose to summarize it in a diagram:

Dr. Emma Pead’s Diagram Breakdown of the Convolutional Layer

This is a simplified example, because it’s only a 2D array. Dr. Pead explains that a real image would be much bigger and resizing the whole thing to input it into a CNN doesn’t make sense. You would lose resolution and since drusen is already so small, you would lose clarity when trying to find features. In other words, you’d be defeating the purpose of using CNN to detect AMD.

So now the problem statement becomes:

How do we input high dimensional images into a CNN without losing resolution or details?

The answer: you break it down into smaller pieces.

Step 3

To break an image down into smaller pieces we use a concept called Pooling.

It’s used to reduce and narrow down the input and output volume to a small number of probabilities, in this case, drusen or no drusen.

We use pooling to help us capture different details. For example, in early pooling layers, the model might detect the edges of drusen and in later layers may find more precise details, like blood vessels.

Step 4

The final CNN layer is known as the Fully Connected (FC) Layer.

This layer takes the output of the previous layers and flattens them into ’n’ possible number of outcomes (in this case n=2 possibilities, drusen or no drusen).

It combines the values from the previous layers and distributes them from 0 to 1 with an activation function (i.e. sigmoid).

Then it outputs the probability of the input being part of the available ’n’ outcomes.

Limitations of using CNN to detect AMD

It was starting to seem almost too good to be true right?

Though this solution seems like it would solve disease diagnosis delays in every aspect of healthcare, in reality, it still has its flaws.

One of these limitations is receiving false positives. This can happen if there are other objects in the image with a similar colour and size to drusen. Though it’s an understandable mistake, it can still be the deciding factor between starting treatment and not.

Another problem is the opposite end of that scale: receiving false negatives. This can happen when small drusen becomes difficult to separate from the ageing retina.

Limitations of CNNs in Medical Imaging

The biggest part of using any neural network in AI is training them. It’s time consuming, difficult, and hard to find large datasets that improve the model’s accuracy.

The same problem exists in CNNs because medical images to train the models are super hard to find. It’s a data privacy concern for the patients, and it’s also difficult for experts to spend all that time evaluating and verifying the medical images.

A current solution to this problem is collecting more data through crowdsourcing and using clinical reports.

A more AI based solution is to maximize the performance of a CNN using a small dataset. This means that the model becomes less time consuming to train and can be used on larger datasets as its accuracy increases.

Here are 2 ways to do that right now:

DATA AUGMENTATION: Adding new data that’s artificially crafted based off the existing data being used to train the CNN.

TRANSFER LEARNING: This is when we use a model that’s already trained in a larger dataset and apply it to a smaller dataset. You’re basically taking the knowledge that the model has to complete one task and use it to help complete another similar task.

CNNs + Medical Imaging → Improved Healthcare

Using CNNs for medical imaging might be the cheat code to faster healthcare. If physicians and radiologists no longer have to spend the time analyzing MRIs, X-rays, and CT Scans, we can speed up the process of medical imaging by a tenfold.

It might also be the cheat code to increase accuracy. With a 1:385+ doctor to patient ratio, there are bound to be medical errors. If we can train the CNN models to be more accurate and faster than physicians, disease detection and diagnosis would become as simple as taking a picture and getting results in seconds.

Either way, the key to increasing CNNs in healthcare is figuring out exactly how to maximize accuracy with minimal datasets.

Thanks for reading this article! Stay tuned for future articles and be sure to follow me on Medium to get notified of future posts.

A Complete Guide to Applications of CNNs

Written by Mrudula Arali