Generated Imaged — Adobe Firefly

I made an AI Model that can predict if you have a Brain Tumor

Walking you through my code of a Convolution Neural Network step-by-step

Alex Mathew


What do famous actor Mark Ruffalo…

Mark Ruffalo, Image Credits: “The Guardian

…and former President Jimmy Carter have in common?

Jimmy Carter, Image Credits: “

They both survived aggressive brain tumors!

These two cases are extremely rare and both of them had access to some of the best neurologists and cancer treaters in the entire world. What about people who don’t have access to these kinds of doctors? How can we help treat these kinds of cancer?

How can AI help prevent brain cancer if people don’t have access to world renowned doctors or resources that can help them?

Table of Contents

  1. Background
  2. Making the model
  3. The Steps of Building this Model
  4. Conclusion


Have you ever wondered if a computer can predict your health? In my latest project, I replicated an AI model that can predict whether an MRI image contains a brain tumor or not!

Why does it matter? Can’t a neuroscientist just read these MRIs with ease?

While having a neuroscientist interpret MRI images is valuable and important, there are several reasons why AI can complement and enhance this process rather than rendering it pointless:

  1. Speed and Efficiency: AI can analyze a large number of MRI images quickly, which is especially crucial in cases where time is of the essence, such as emergency situations or when dealing with a high volume of patient scans.
  2. Consistency: AI models provide consistent results, reducing the potential for human errors or variations in interpretation that can occur even among experienced neuroscientists.
  3. Accessibility: In regions with a shortage of specialized healthcare professionals, AI can act as a valuable first-line screening tool, helping identify potential cases for further expert evaluation.
  4. Data Enhancement: AI can extract and analyze data from MRI images at a granular level, potentially revealing patterns or insights that may not be immediately apparent to human observers. This can aid in early diagnosis and treatment planning.
  5. Research and Learning: AI can assist neuroscientists by quickly sifting through large datasets, allowing them to focus on in-depth analysis, research, and treatment planning.

Essentially, AI is not meant to replace neuroscientists but to work alongside them, augmenting their capabilities and improving the overall quality and efficiency of brain tumor diagnosis and treatment. It’s a valuable tool that can enhance the field of neurology and healthcare as a whole.

Breaking down the jargon

  • Python notebook: a way to run Python code cells one at a time to experiment and debug
  • CPU vs GPU: cenral processing unit vs graphics processing unit. CPU handles general-purpose computing tasks whereas GPU is used for large blocks of data and rendering graphics/images. Usually laptops (like mine) don’t have a GPU so they have to resort to their local CPU.
  • Kernels: a small matrix of numbers to slide over an image. As it moves across the image it applies its pattern of numbers to the image’s pixels, doing a bit of math at each spot. This changes the original image by adding effects on it through convolutions.
  • Tensor: mathematically, an algebraic object that describes a multilinear relationship between sets of algebraic objects related to a vector space. Simply, a multidimensional array that can be used for storing, representing, and changing data.
  • Convolutions: simply put, a math operation where 2 functions are combined to produce a 3rd one. In this case, it’s a math operation that combines an image with a kernel to extract certain features or apply effects to an image.
  • Neural networks: a computer replicate of the human brain. It’s made up of nodes/neurons which processes data, performs a function like adding a layer on an MRI image, and passes on the results. They are great for recognizing patterns and making predictions based on big datasets like identifying images!
  • Convolutional neural network (CNN): a type of neural network that’s particularly good at processing data with a grid-like structure (i.e. images). It uses a technique called convolution in its layers to adaptively learn spatial hierarchies of features from input images. This makes it really effective for tasks like image recognition, object detection, and even playing a role in video analysis. The convolution layers can capture patterns like edges, textures, and other visual elements, which are then used to understand and classify the images.
  • Channels: the color components of an image (usually 3 channels: RGB, red, green, blue). Each channel represents the intensity of that color in the image. In a CNN, each channel is processed separately at first because different channels can contain distinct features important for understanding the image. The information of these channels are combined later on in the network to make decisions.
  • Layers: the building blocks that process input data. They’re like different stages of an assembly line. First is the input layer which recieves the darta (takes in pixel values of image). Next are the hidden layers which do the heavy lifting of learning from the data. One of the hidden layers in a CNN is the convolutional layer that specifically processes image data. Lastly is the output layer which provides the final result. Note that each layer contains the nodes that apply the mathematical operations to process data!
  • PyTorch vs. Tensorflow: open-source libraries used for building and training machine learning models. PyTorch was made by Meta and is much more intuitive than Google’s Tensorflow (which was made for skilled technical professionals).
  • Data preprocessing: preparing and transforming raw data into a suitable format that makes it easier for ML algorithms to work with. This could be anything from resizing images to fixing the channels of the images.
  • Dataloader: an object that simplifies the process of loading data allowing easy and efficient iteration over a dataset. It minimizes CPU usage and also it can shuffle the data which is important for training ML models.
  • Activation Functions: mathematical operations applied to the output of each neuron or node of a neural network. They introduce non-linearity to the model, allowing neural networks to learn and represent complex patterns in data. Activation functions determine whether a neuron should be activated or not based on its input, influencing the information flow through the network. Common activation functions include ReLU (Rectified Linear Unit), sigmoid, tanh, and softmax, each serving specific purposes in learning non-linear relationships, handling gradients during backpropagation, and enabling the network to make predictions.

The Steps of Building this Model

Creating a machine learning model is not easy. There are a lot of complicated steps and components to it. I’m grateful to have used MLDawn’s free tutorial on building a brain tumor detector but I also added some spice to it by changing up the dataset used and some of the code.

Here was my process through building this AI model!

1. Importing the necessary libraries

import numpy as np
import torch
from import Dataset, DataLoader, ConcatDataset
import glob
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, accuracy_score
import cv2
import sys

You might be asking: Why is it necessary to import so many libraries and what do they even do?

Here’s why:

  1. They cut down time by a longshot. These libraries contain pre-defined methods and algorithms that can be easily imported and used in code.
  2. Machine learning libraries optimize for performance which means they can help build and run models at maximum efficiency
  3. Many ML libraries also have active communities so if you run into any errors or bugs, you can find people to help you out easily.

As for the libraries I used, here they are:

  • numpy: imports NumPy, a package for scientific computing Python, known for its powerful N-dimensional array object (a multidimensional array with a fixed-size, representing a collection of items of the same type. ). The “as np” part allows you to efficiently use np when calling NumPy functions.
  • torch: imports PyTorch. In this case it’s mainly used for creating the actual model (CNN class). It provides a dynamic computational graph, making it easy to define, train, and deploy neural networks. PyTorch combines allows us to express intricate model architectures with concise code. Its tensor-based operations and automatic differentiation streamline the training process, enabling efficient gradient-based optimization.
  • Dataset, DataLoader, and ConcatDataset: these specific classes are imported from PyTorch’s “data” utilities. Dataset is an abstract class for representing a dataset, DataLoader is for iterating over datasets, and ConcatDataset is for concatenating multiple datasets.
  • glob: used for retrieving and manipulating files in Python. It’s used for finding files and directories whose names match a specified pattern (can be used for locating and listing all MRI images).
  • pyplot: imports the pyplot interface from the matplotlib library for plotting graphs and visualizing data in a 2D format.
  • confusion matrix and accuracy score: imports performance evaluation metrics to use when plotting confusion matrix charts and showing accuracy of the model.
  • cv2: imports OpenCV, a library of functions aimed at real-time computer vision (object-detection, reading images, transforming images, scaling).
  • sys: a Python library/module that provides access to some variables and functions interacting with the Python interpreter (can be used for training and testing).

ALL of these crucial libraries come together to create an efficient and effective AI model.

2. Reading the Images

tumor = [] #brain with a tumor
healthybrain = [] #brain with no tumor
for f in glob.iglob("/Users/alexandermathew/Downloads/Brain_Tumor_Classifier/bimages/yes/*.jpg"): #reading in images
img = cv2.imread(f) #reading images using cv2
img = cv2.resize(img, (128,128)) #resizing images using cv2
b, g, r = cv2.split(img) #splitting images into respective channels using cv2 (128x128). result in 3 images
img = cv2.merge([r, g, b])

for f in glob.iglob("/Users/alexandermathew/Downloads/Brain_Tumor_Classifier/bimages/no/*.jpg"): #reading in images
img = cv2.imread(f) #reading images using cv2
img = cv2.resize(img, (128,128)) #resizing images using cv2
b,g,r = cv2.split(img) #splitting images into respective channels using cv2 (128x128). result in 3 images
img = cv2.merge([r, g, b])
healthybrain = np.array(healthybrain)
tumor = np.array(tumor)
#turning both into numpy arrays. .shape shows amnt of images, rows and colums of pixels, and channels of rgb
All = np.concatenate((healthybrain, tumor))

This is what’s known as data preprocessing! And here is the dataset I used (a compilation of data sets from Kaggle that I handpicked).

Here you can see 2 lists being made (tumor and healthybrain).

glob iterates over all the JPEG files in the dataset and cv2 reads them, resizes them, splits them into 3 channels, and merges the channels into a single RGB format. Then in the end each image is appended to its respective list. In the final cell we see the lusts being converted into NumPy arrays (which are more efficient for handling large datasets) and then creates a variable, All to hold all the images as a unified dataset represented as a NumPy array. This can be used for training a machine learning model which can learn to classify the difference between a brain with or without a tumor in an MRI image.

This code is used for data preparation creating a uniform style for the data which is now ready for further analysis or input into a machine learning model for classification.

3. Visualizing MRI Images

def plot_random(healthybrain, tumor, num=5): 
healthybrain_imgs = healthybrain[np.random.choice(healthybrain.shape[0], num, replace=False)]
tumor_imgs = tumor[np.random.choice(tumor.shape[0], num, replace=False)]

plt.figure(figsize=(16,9)) #making a plot
for i in range(num):
plt.subplot(1, num, i+1)
plt.title('healthy brain')

for i in range(num):
plt.subplot(1, num, i+1)
plot_random(healthybrain, tumor)

These blocks of code are used to visualize the MRI images before we start to create the actual model.

The function plot_random has three parameters—healthybrain, tumor, and num (defaulting to 5). The function will plot num random images from each of the two categories. In this function healthybrain_imgs and tumor_imgs selects num random images from their respective arrays. np.random.choice randomly picks indices from the array (because we put in replace=False), ensuring each selected image is unique.

Once these images are selected plt.figure is used to create a new figure with the specified size for plotting (16,9). The for loop iterates num times, each time creating a subplot (plt.subplot(1, num, i+1)) for one of the randomly chosen healthybrain or tumor images. The images are then displayed with the title 'healthy brain' and ‘tumor’.

In the last line theplot_random function prints out the images on the figure that we made.

Healthy brain images plotted
Brains with tumors plotted

These are the images from our dataset after the effects from the methods we used to create a uniform format that our model can read quickly and efficiently.

4. Creating PyTorch abstract dataset Class and MRI custom dataset class

claclass Dataset(object): #abstract class representing a dataset

def __getitem___(self, index):
raise NotImplementedError

def __len__(self):
raise NotImplementedError

def __add__(self, other):
return ConcatDataset([self, other]) #needs ConcatDataset to concatenate two objects
class MRI(Dataset): #inheriting Dataset class
def __init__(self): #constructor

tumor = []
healthybrain = []
# cv2 - It reads in BGR format by default
for f in glob.iglob("/Users/alexandermathew/Downloads/Brain_Tumor_Classifier/bimages/yes/*.jpg"):
img = cv2.imread(f)
img = cv2.resize(img,(128,128))
b, g, r = cv2.split(img)
img = cv2.merge([r,g,b])
img = img.reshape((img.shape[2],img.shape[0],img.shape[1])) # otherwise the shape will be (h,w,#channels)

for f in glob.iglob("/Users/alexandermathew/Downloads/Brain_Tumor_Classifier/bimages/no/*.jpg"):
img = cv2.imread(f)
img = cv2.resize(img,(128,128))
b, g, r = cv2.split(img)
img = cv2.merge([r,g,b])
img = img.reshape((img.shape[2],img.shape[0],img.shape[1]))

# our images
tumor = np.array(tumor,dtype=np.float32)
healthybrain = np.array(healthybrain,dtype=np.float32)

# our labels
tumor_label = np.ones(tumor.shape[0], dtype=np.float32)
healthybrain_label = np.zeros(healthybrain.shape[0], dtype=np.float32)

# Concatenate
self.images = np.concatenate((tumor, healthybrain), axis=0)
self.labels = np.concatenate((tumor_label, healthybrain_label))

def __len__(self):
return self.images.shape[0] # how many images = length

def __getitem__(self, index):
sample = {'image': self.images[index], 'label':self.labels[index]}
return sample

def normalize(self):
self.images = self.images/255.0
mri_dataset = MRI()

The class Dataset is an abstract class representing a dataset. It's meant to be a base class for other dataset classes. In this class we define 3 methods.

  • def __getitem__ is an abstract method that should be overridden in subclasses to retrieve a single item from the dataset at the given index.
  • def __len__ is another abstract method that should return the length of the dataset when overridden.
  • def __add__ is an abstract method enables concatenation of two Dataset objects using ConcatDataset, which combines datasets.

The MRI class inherits all the features of Dataset and creates a custom dataset for handling MRI images related to a brain tumor classification task. Let’s break it down.

Cell 1

  • class MRI(Dataset) indicates that the MRI class inherits from the Dataset class.
  • __init__ initializes the object when it's created (constructing it).
  • Inside the constructor, there are two loops using glob.iglob to iterate through the paths of MRI images in two directories (the set of images with a tumor and the ones without one).
  • For each image, it reads, resizes, and adjusts the color channels using OpenCV (cv2). The images are then reshaped to have channels first, a common format in PyTorch.
  • The images from both classes (tumor and healthybrain) are stored in separate lists (tumor and healthybrain).
  • The lists are then converted to NumPy arrays of type np.float32.
  • tumor_label and healthybrain_label are created as arrays of ones and zeros, respectively, to represent the labels for the tumor and healthy brain images.
  • The images and labels attributes of the class are created by concatenating the arrays of tumor and healthy brain images and labels using np.concatenate.
  • __len__(self) returns the length of the dataset, which is the total number of images. It's used by PyTorch's data loader during training.
  • __getitem__ is used to retrieve a specific sample from the dataset at the given index. It returns a dictionary containing the image and its corresponding label.
  • normalize normalizes the pixel values of the images by dividing them by 255.0, scaling them to a range of 0 to 1.

Cell 2

  • mri_dataset = MRI(): This line creates an instance of the MRI class, effectively calling the constructor and setting up the dataset.
  • mri_dataset.normalize(): Calls the normalize method to normalize the images.

In all, this class encapsulates the logic for handling MRI images, their labels, and the necessary methods for interaction with PyTorch’s data loading mechanisms. It prepares the data for training a machine learning model to classify brain images as either healthy or containing a tumor.

5. Creating a Dataloader

names={0:'Heathy Brain', 1:'Tumor'}
dataloader = DataLoader(mri_dataset, shuffle=True)
for i, sample in enumerate(dataloader):
img = sample['image'].squeeze()
img = img.reshape((img.shape[1], img.shape[2], img.shape[0]))
if i == 5:

The cell uses PyTorch’s DataLoader, a tool that efficiently loads and processes batches of data. This is crucial for handling large datasets during machine learning model training.

Why we use a DataLoader:

  1. Data Shuffling: Shuffling the data (shuffle=True) ensures that the model sees a diverse set of samples in each epoch during training. This helps prevent the model from learning patterns based on the order of the data.
  2. Visualization of Samples: The loop through the DataLoader allows for the visualization of a few samples from the dataset. This is significant for a quick visual check of the loaded images and their corresponding labels.
  3. Labeling Information: The titles of the displayed images indicate whether each brain image is classified as a “Heathy Brain” (class 0) or “Tumor” (class 1). This provides context and allows for a visual confirmation of the labeling.
  4. Quality Check: By breaking the loop after visualizing a small number of samples (in this case, 6), it allows for a rapid quality check to ensure that the data is being loaded and processed correctly before moving on to model training.

Data Loader Creation:

  • names={0:'Heathy Brain', 1:'Tumor'} maps the class labels to corresponding names, indicating whether the brain is healthy or contains a tumor.
  • dataloader creates a PyTorch DataLoader for the mri_dataset. Within it, it shuffles the data (shuffle=True), which is beneficial during training to ensure the model sees different samples in each epoch.

Iterating Through the DataLoader:

  • The outer for loop iterates through batches of data provided by the data loader.
  • Then the image tensor is extracted from the batch and uses squeeze to remove any singleton dimensions.
  • img reshapes the image tensor to be compatible with plotting (height, width, channels).
  • plt.title sets the title of the plot based on the label of the current sample, using the names dictionary.
  • plt.imshow displays the image using Matplotlib.
  • The if statement breaks the loop after visualizing 6 samples (index 0 to 5), providing a quick look at a few images in the dataset.

6. Creating the actual model

import torch.nn as nn
import torch.nn.functional as F

class CNN(nn.Module):
def __init__(self):
self.cnn_model = nn.Sequential(
nn.Conv2d(in_channels=3, out_channels=6, kernel_size=5),
nn.AvgPool2d(kernel_size=2, stride=5),
nn.Conv2d(in_channels=6, out_channels=16, kernel_size=5),
nn.AvgPool2d(kernel_size=2, stride=5))

self.fc_model = nn.Sequential(
nn.Linear(in_features=256, out_features=120),
nn.Linear(in_features=120, out_features=84),
nn.Linear(in_features=84, out_features=1))

def forward(self, x):
x = self.cnn_model(x)
x = x.view(x.size(0), -1)
x = self.fc_model(x)
x = F.sigmoid(x)

return x

This code cell is the actual creation of the convolutional network class.


  • First, we import torch.nn which are the necessary modules from the PyTorch library. nn is the neural network module, and functional or F provides access to various activation functions and other functional operations.
  • Within the class CNN that inherits the module nn there are two functions that are defined: __init__ and forward

Initializing Constructor:

  • __init__ is the constructor of the CNN class which inherits the nn module by using super
  • Within __init__ we define the convolutional part/layers of the model with cnn_model by using nn.Sequential which is a container module that allows you to sequentially organize and execute a series of neural network layers or operations. This part of the network uses convolutional layers to process images and identify patterns within them.

The Sequential container is used to provide a more concise way to define a neural network making it easier for the machine and user. The layers of a network are applied sequentially in the order they are added to the container making the data more readable.

  • Then we define the fully connected part/layers of our model defined by fc_model. These layers connect every neuron and are responsible for learning global patterns and relationships extracted by the convolutional layers. These layers are found towards the end of the network which help determine the output whether the image contains a tumor or not.

The Math:

I. Convolutional Layers

  • Conv2d(in_channels=3, out_channels=6. kernel_size=5)creates a convolutional layer with 3 input channels (RGB images), 6 output channels (filters for the image), and a kernel size of 5x5. This operation involves sliding the kernel over the image, performing element-wise multiplication and summing to produce feature maps.
  • Tanh() applied the hyperbolic tangent activation function to the output of the convolution. This squashes the values’ ranges to [-1, 1].
  • AvgPool2d(kernel_size=2, stride=5) adds an average pooling layer with a kernel size of 2x2 and a stride of 5. This reduces the spatial dimensions of the feature maps by taking the average of values in each pooling window.
  • These functions are repeated to create a new layer except in the second use case of Conv2d we add 10 more output channels adding to 16 in this layer.

II. Fully Connected Layers

  • view reshapes the output tensor x from the convolutional layers into a 1D tensor. The batch size is reduced to -1, which removes the remaining dimensions and flattens it to the 1D tensor.
  • Linear is used to create a fully connected linear layer. Each output feature is connected to each input feature by a weight and a bias term is added. The first time we use it we have 256 input features and 120 output features. The second time, when another fully connected linear layer is added, we have 120 input features and 84 output features. The final layer has 84 input features and 1 output feature. This last layer is used in binary classification where the output is a single value representing the probability of a positive class

III. Sigmoid Activation and Returning

  • sigmoid applies the sigmoid activation function to squash the output to the range [0, 1].
  • In the end we return x; the final output tensor after all the operations.

The “Forward” Method:

  • The forward method defines the forward pass of the neural network. It takes an input x and processes it through the layers defined in the constructor.
  • The input x is passed through the convolutional layers (cnn_model).
  • The output is flattened using view to prepare it for the fully connected layers.
  • The flattened output is passed through the fully connected layers (fc_model)
  • The final layer’s output is passed through the sigmoid activation function using F.sigmoid, converting the output to a probability between 0 and 1.

7. Evaluating a New-Born Neural Network

mri_dataset = MRI()
device = torch.device('cpu')
model = CNN().to(device)
def threshold(scores,threshold=0.50, minimum=0, maximum = 1.0):
x = np.array(list(scores))
x[x >= threshold] = maximum
x[x < threshold] = minimum
return x
dataloader = DataLoader(mri_dataset, batch_size=32, shuffle=False)
y_true = []
with torch.no_grad():
for D in dataloader:
image = D['image'].to(device)
label = D['label'].to(device)
y_hat = model(image)
outputs = np.concatenate( outputs, axis=0 )
y_true = np.concatenate( y_true, axis=0 )
accuracy_score(y_true, threshold(outputs))
import seaborn as sns

cm = confusion_matrix(y_true, threshold(outputs))
ax= plt.subplot()
sns.heatmap(cm, annot=True, fmt='g', ax=ax); #annot=True to annotate cells, ftm='g' to disable scientific notation
# labels, title and ticks
ax.set_xlabel('Predicted labels');ax.set_ylabel('True labels');
ax.set_title('Confusion Matrix');
Confusion Matrix from our Model
plt.axvline(x=len(tumor), color='r', linestyle='--')
Graphic Visualization of Model Outputs

This is the most complex part of the code which involves many complex steps.

I. Initialization

  • In the first cell we create the instance of the MRI dataset and normalize the images.
  • Then we set the device to CPU.
  • And instantiate a CNN model, CNN (while moving it to the CPU).

II. Threshold Function

  • The second cell defines a threshold function to convert model output scores into binary predictions.

III. Model Evaluation, Concatenating Outputs and True Labels

  • Sets the model to evaluation mode.
  • Creates a data loader for the MRI dataset.
  • Iterates through the data loader, making predictions and storing true labels and model outputs.
  • Concatenates the model outputs and true labels to create arrays for evaluation.

IV. Calculating Accuracy

  • Computes the accuracy score by comparing true labels with thresholded model outputs.
  • The accuracy of this specific model computes to be: around 95.6%!

V. Confusion Matrix Visualization

  • Uses Seaborn to create a heatmap of the confusion matrix, providing insights into model performance.

VI. Graphical Visualization

  • In the final cell of this part of the code, we create a plot to provide a visual representation of the model outputs
  • By using pyplot we can create a graph with the x-axis corresponding to the index of the samples and the y-axis corresponding to the model output scores
  • The red dashed line indicates the boundary between tumor and healthy brain samples. It helps us understand how the model’s prediction aligns with the two classes and if there is a clear separation between them.

8. Visualizing Feature Maps of the Convolutional Filters and Overfitting

At the end of my code you might see two different sections that I haven’t mentioned yet: “Visualizing Feature Maps of the Convolutional Filters” and “Overfitting”.

In the first section, I used Python to visualize the feature maps of the convolutional filters in my CNN. This process helps us understand what patterns or features each layer of the network is capturing. By displaying the feature maps, we can gain insights into the hierarchical representation of information as it passes through the layers. This visualization aids in model interpretation and can be useful for debugging or refining the code.

In the second section, I focused on assessing whether the model is overfitting. Overfitting occurs when a model performs well on the training set but poorly on new, unseen data (validation set). I prepared a validation set by splitting your data, and during training, monitored both training and validation losses. By observing the trend of training and validation losses, we can identify signs of overfitting. A large gap between the training and validation losses may indicate overfitting, suggesting that the model is memorizing the training data instead of learning general patterns.


Image from NightCafe

Our brains are beautiful. If we don’t treat them properly who knows what the world could become. AI models, specifically convolutional neural networks (CNNs), are revolutionizing the healthcare field and has made a real dent in this world. I had so much fun making this model and I am so excited to see what I can make in the future. Thanks for reading!

I hope you enjoyed learning more about AI and reading this article!

Here is a video of me explaining this model:

If you want to reach out to me you can find me here:



Alex Mathew

building at the intersection of artificial intelligence and human intelligence |