Intel Image Classification with PyTorch (Pt1)

Joshua Phuong Le
17 min readJan 9, 2023

--

Photo by Mateus Maia on Unsplash

I. INTRODUCTION

In this article, I present a simple implementation of the PyTorch framework forthe image classification problem. The Intel image dataset is used in this project. I wrote a short article to introduce and work on the image pre-processing steps of this dataset here. This article is a continuation to extend to the model construction, training and inference steps, with codes being refactored into a proper repository structure. You can also find the GitHub repository of this project below.

Note that I may update the repository so the content of this article may not be 100% the same as the repository codes, but the general approach should be the same.

You can check out part 2 where I improve the performance of this article with transfer learning.

II. PROJECT FOLDER ORGANIZATION

We will follow the project organization below:

  • data is where we put, you guessed it, our data. The raw training, testing, and prediction data are located in the subfolder 01_raw . Other outputs from our codes, such as the intermediate annotation csv files, trained model and the model performance reports are also deposited in data folder.
  • notebooks is where we store our experiment Jupyter notebook. The aim is to make this notebook as clean as possible: no custom functions/classes… It is meant to import all necessary libraries and custom modules and carry out experimental steps in the most straightforward fashion as possible.
  • src is where we define all our modules. config stores the necessary constants for ease of repeated use and future changes in a single place. model stores our CNN model class and dataset class, preprocessing and postprocessing stores utils files with helper functions to prepare data for training, as well as to obtain the performance results and export trained model.
  • Last but not least, the virtual environment (created with Python 3.9, hence the name) is placed outside in the root folder as a common practice (this folder is not sync to GitHub, if you need help to create your own venv, refer to my other article below).
+---data
| +---01_raw
| +---02_intermediate
| +---03_model_input
| +---04_model
| +---05_model_output
| \---06_reporting
+---notebooks
+---src
| +---config
| +---model
| +---postprocessing
| \---preprocessing
\---venv39

III. DETAILS OF CUSTOM MODULES

Now let’s take a quick look at the custom modules.

1. Configuration Module

First of all, the config file content are listed below. The first data_config.py file specifies the input shape of the images to be fed into the CNN model later, together with the training parameters: batch size and number of workers. Any revision to these constants can be made in this file and re-imported into other modules, making the maintenance process easier. The second loc_config.py file specifies different folder locations for our experiment later.

INPUT_WIDTH = 224
INPUT_HEIGHT = 224
INPUT_CHANNEL = 3
BATCH_SIZE = 32
NUM_WORKERS = 2
TRAIN_DATA_LOC = r'..\data\01_raw\seg_train'
TEST_DATA_LOC = r'..\data\01_raw\seg_test'
PRED_DATA_LOC = r'..\data\01_raw\seg_pred'
ANNOT_LOC = r'..\data\02_intermediate'
MODEL_SAVE_LOC = r'..\data\04_model'
REPORT_SAVE_LOC = r'..\data\06_reporting'

2. Preprocessing Module

Secondly, the preprocessing module helps us prepare the image dataset. The build_annotation_csv starts off by writing the column labels to a blank csv file, then goes through each folder in the specified directory (e.g., training folder) and fetches the sub-folder names as class names. All images’ full paths together with their corresponding class names (and indices) are then written to the csv file row by row. It also returns the csv as a annotation for easy using in the data.Dataset class later, and save this file in the annot_location (we will use the constant ANNOT_LOC in the experiment notebook later).

The transform_bilinear function normalizes the input image by applying a standard set of image processing steps. You can customize it as needed.

def build_annotation_dataframe(image_location, annot_location, output_csv_name):
class_lst = os.listdir(
image_location) # returns a LIST containing the names of the entries (folder names in this case) in the directory.
class_lst.sort() # IMPORTANT
with open(os.path.join(annot_location, output_csv_name), 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',')
writer.writerow(['file_name', 'file_path', 'class_name',
'class_index']) # create column names
for class_name in class_lst:
# concatenates various path components with exactly one directory separator (‘/’) except the last path component.
class_path = os.path.join(image_location, class_name)
# get list of files in class folder
file_list = os.listdir(class_path)
for file_name in file_list:
# concatenate class folder dir, class name and file name
file_path = os.path.join(image_location, class_name, file_name)
# write the file path and class name to the csv file
writer.writerow(
[file_name, file_path, class_name, class_lst.index(class_name)])
return pd.read_csv(os.path.join(annot_location, output_csv_name))


def check_annot_dataframe(annot_df):
class_zip = zip(annot_df['class_index'], annot_df['class_name'])
my_list = list()
for index, name in class_zip:
my_list.append(tuple((index, name)))
unique_list = list(set(my_list))
return unique_list


def transform_bilinear(output_img_width, output_img_height):
image_transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5]),
transforms.Resize((output_img_width, output_img_height),
interpolation=PIL.Image.BILINEAR)
])
return image_transform

3. Model Module

Next up, let’s look at the model module, containing the dataset.py , modelling_config.py and cnn_model.py files.

The dataset.py is where we define the custom dataset class called IntelDataset that inherits from the PyTorch Dataset class as explained in my previous article. I added another method visualize to output 10 random images of the dataset for illustration purpose. Finally, the function create_validation_dataset takes in the dataset object and split it up to a main dataset (to use as training set) and a validation dataset given a desired proportion. This is for us to see the train-validation loss progress along the training epochs later, and determine the stopping point to prevent over-fitting.

class IntelDataset(torch.utils.data.Dataset):
def __init__(self, annot_df, transform=None):
self.annot_df = annot_df
# root directory of images, leave "" if using the image path column in the __getitem__ method
self.root_dir = ""
self.transform = transform

def __len__(self):
# return length (numer of rows) of the dataframe
return len(self.annot_df)

def __getitem__(self, idx):
# use image path column (index = 1) in csv file
image_path = self.annot_df.iloc[idx, 1]
image = cv2.imread(image_path) # read image by cv2
# convert from BGR to RGB for matplotlib
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# use class name column (index = 2) in csv file
class_name = self.annot_df.iloc[idx, 2]
# use class index column (index = 3) in csv file
class_index = self.annot_df.iloc[idx, 3]
if self.transform:
image = self.transform(image)
# when accessing an instance via index, 3 outputs are returned - the image, class name and class index
return image, class_name, class_index

def visualize(self, number_of_img=10, output_width=12, output_height=6):
plt.figure(figsize=(output_width, output_height))
for i in range(number_of_img):
idx = random.randint(0, len(self.annot_df))
image, class_name, class_index = self.__getitem__(idx)
ax = plt.subplot(2, 5, i+1) # create an axis
# create a name of the axis based on the img name
ax.title.set_text(class_name + '-' + str(class_index))
if self.transform == None:
plt.imshow(image)
else:
plt.imshow(image.permute(1, 2, 0))


def create_validation_dataset(dataset, validation_proportion):
if (validation_proportion > 1) or (validation_proportion < 0):
return "The proportion of the validation set must be between 0 and 1"
else:
dataset_size = int((1 - validation_proportion) * len(dataset))
validation_size = len(dataset) - dataset_size
print(dataset_size, validation_size)
dataset, validation_set = torch.utils.data.random_split(
dataset, [dataset_size, validation_size])
return dataset, validation_set

The second file modelling_config.py is used to define helper functions for the modelling process. The first 3 functions are self-explanatory. The next function model_prep_and_summary moves the model to the default device (GPU or CPU) and outputs the summary of the model using the torchsummary.summary module. This is for us to check the input/output dimensions of different layers in the model.

def default_loss():
return nn.CrossEntropyLoss()

def default_optimizer(model, learning_rate = 0.001):
return optim.Adam(model.parameters(), lr = learning_rate)

def get_default_device():
"""Picking GPU if available or else CPU"""
if torch.cuda.is_available():
return torch.device('cuda')
else:
return torch.device('cpu')

def model_prep_and_summary(model, device):
"""
Move model to GPU and print model summary
"""
# Define the model and move it to GPU:
model = model
model = model.to(device)
print('Current device: ' + str(device))
print('Is Model on CUDA: ' + str(next(model.parameters()).is_cuda))
# Display model summary:
summary(model, (INPUT_CHANNEL, INPUT_WIDTH, INPUT_HEIGHT))

Next, inside the cnn_model.py file, we define the CNN model by assembling layers with the nn.Module class. A few things to note:

  • The output dimension of the convolutional layers must be calculated to use as the input to the first FC layer. As the output shape of the self.conv7 is 64*56*56, the MaxPool layer simply reduces the W and H by half without affecting the number of channels (depth). Hence the input to the FC layer is 64*28*28 with self.fc14 = nn.Linear(64*28*28, 500) . I have attached a spreadsheet to help with this calculation in the repository, and some calculations in the code comments.
  • The output dimension of the last FC layer (fc16) must be equal to the number of classes you want to classify in the dataset. Here we have 6 classes.
Image source: Wikimedia Commons
class MyCnnModel(nn.Module):
def __init__(self):
super(MyCnnModel, self).__init__()
self.conv1 = nn.Conv2d(
in_channels=3, out_channels=16, kernel_size=3, padding=1)
self.conv2 = nn.Conv2d(
in_channels=16, out_channels=16, kernel_size=3, padding=1)

self.conv3 = nn.Conv2d(
in_channels=16, out_channels=32, kernel_size=3, padding=1)
self.conv4 = nn.Conv2d(
in_channels=32, out_channels=32, kernel_size=3, padding=1)

self.conv5 = nn.Conv2d(
in_channels=32, out_channels=64, kernel_size=3, padding=1)
self.conv6 = nn.Conv2d(
in_channels=64, out_channels=64, kernel_size=3, padding=1)
self.conv7 = nn.Conv2d(
in_channels=64, out_channels=64, kernel_size=3, padding=1)

# Define a max pooling layer to use repeatedly in the forward function
# The role of pooling layer is to reduce the spatial dimension (H, W) of the input volume for next layers.
# It only affects weight and height but not depth.
self.maxpool = nn.MaxPool2d(kernel_size=2, stride=2)

# output shape of maxpool3 is 64*28*28
self.fc14 = nn.Linear(64*28*28, 500)
self.fc15 = nn.Linear(500, 50)
# output of the final DC layer = 6 = number of classes
self.fc16 = nn.Linear(50, 6)

def forward(self, x):
x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
# maxpool1 output shape is 16*112*112 (112 = (224-2)/2 + 1)
x = self.maxpool(x)
x = F.relu(self.conv3(x))
x = F.relu(self.conv4(x))
# maxpool2 output shape is 32*56*56 (56 = (112-2)/2 + 1)
x = self.maxpool(x)
x = F.relu(self.conv5(x))
x = F.relu(self.conv6(x))
x = F.relu(self.conv7(x))
# maxpool3 output shape is 64*28*28 (28 = (56-2)/2 + 1)
x = self.maxpool(x)

x = x.reshape(x.shape[0], -1)
x = F.relu(self.fc14(x))
x = F.relu(self.fc15(x))
x = F.dropout(x, 0.5)
x = self.fc16(x)
return x

The next important function is train_model() . With the default number of epochs = 5, for each batch in each epoch, it sets the model to train mode for forward/backward propagation/loss calculation/parameter updating using the training dataset, then set the model to eval mode to freeze the parameters and calculating the loss when testing on the validation dataset. The function will return the trained model itself, together with the loss history dataframe on the training and validation dataset for us to investigate. To do this, I define the function visualize_training to plot out the loss history.

def train_model(model, device, train_loader, val_loader, criterion, optimizer, num_epochs=5):
# device = get_default_device()
model = model.to(device)
train_result_dict = {'epoch': [], 'train_loss': [],
'val_loss': [], 'accuracy': [], 'time': []}

for epoch in range(num_epochs):
start_time = time.time()
train_loss = 0.0
correct = 0
total = 0
model.train() # set the model to training mode, parameters are updated
for i, data in enumerate(train_loader, 0):
image, class_name, class_index = data
image = image.to(device)
class_index = class_index.to(device)
optimizer.zero_grad() # zero the parameter gradients
outputs = model(image) # forward propagation
loss = criterion(outputs, class_index) # loss calculation
loss.backward() # backward propagation
optimizer.step() # params update
train_loss += loss.item() # loss for each minibatch
_, predicted = torch.max(outputs.data, 1)
total += class_index.size(0)
correct += (predicted == class_index).sum().item()
epoch_accuracy = round(float(correct)/float(total)*100, 2)

# Here evaluation is combined together with
val_loss = 0.0
model.eval() # set the model to evaluation mode, parameters are frozen
for i, data in enumerate(val_loader, 0):
image, class_name, class_index = data
image = image.to(device)
class_index = class_index.to(device)
outputs = model(image)
loss = criterion(outputs, class_index)
val_loss += loss.item()

# print statistics every 1 epoch
# divide by the length of the minibatch because loss.item() returns the loss of the whole minibatch
train_loss_result = round(train_loss / len(train_loader), 3)
val_loss_result = round(val_loss / len(val_loader), 3)

epoch_time = round(time.time() - start_time, 1)
# add statistics to the dictionary:
train_result_dict['epoch'].append(epoch + 1)
train_result_dict['train_loss'].append(train_loss_result)
train_result_dict['val_loss'].append(val_loss_result)
train_result_dict['accuracy'].append(epoch_accuracy)
train_result_dict['time'].append(epoch_time)

print(f'Epoch {epoch+1} \t Training Loss: {train_loss_result} \t Validation Loss: {val_loss_result} \t Epoch Train Accuracy (%): {epoch_accuracy} \t Epoch Time (s): {epoch_time}')
# return the trained model and the loss dictionary
return model, train_result_dict


def visualize_training(train_result_dictionary):
# Define Data
df = pd.DataFrame(train_result_dictionary)
x = df['epoch']
data_1 = df['train_loss']
data_2 = df['val_loss']
data_3 = df['accuracy']

# Create Plot
fig, ax1 = plt.subplots(figsize=(7, 7))
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.plot(x, data_1, color='red', label='training loss')
ax1.plot(x, data_2, color='blue', label='validation loss')

# Adding Twin Axes
ax2 = ax1.twinx()
ax2.plot(x, data_3, color='green', label='Training Accuracy')

# Add label
plt.ylabel('Accuracy')
lines = ax1.get_lines() + ax2.get_lines()
ax1.legend(lines, [line.get_label() for line in lines], loc='upper center')

# Show plot
plt.show()

Finally for this long file, we have the inference codes. The first infer() function works with a DataLoader class and carries out inference for the whole dataset, while the second function infer_single_image works on a single image. Note that we have to carry out the same pre-processing of the image as the previous IntelDataset class __getitem()__ method.

def infer(model, device, data_loader):
'''
Calculate predicted class indices of the data_loader by the trained model
'''
model = model.to(device)
y_pred = []
y_true = []
with torch.no_grad():
for data in data_loader:
image, class_name, class_index = data
image = image.to(device)
class_index = class_index.to(device)
outputs = model(image)
outputs = (torch.max(torch.exp(outputs), 1)[1]).data.cpu().numpy()
y_pred.extend(outputs)
class_index = class_index.data.cpu().numpy()
y_true.extend(class_index)
return y_pred, y_true

def infer_single_image(model, device, image_path, transform):
'''
Calculate predicted class index of the image by the trained model
'''
# Prepare the Image
image = cv2.imread(image_path) # read image by cv2
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image_transformed = transform(image)
plt.imshow(image_transformed.permute(1, 2, 0))
image_transformed_sq = torch.unsqueeze(image_transformed, dim=0)

# Inference
model.eval()
with torch.no_grad():
image_transformed_sq = image_transformed_sq.to(device)
output = model(image_transformed_sq)
_, predicted_class_index = torch.max(output.data, 1)
print(f'Predicted Class Index: {predicted_class_index}')
return predicted_class_index

4. Post-processing Module

The final module contains helper functions in the utils.py file to assist us with the model outputs.

The first function takes in the outputs of the infer() function above, together with the list of class names, and perform the metrics calculation for accuracy, precision, recall and F1 for each of the 6 classes. It also outputs the overall accuracy and F1 scores.

Note that:

  • The use of class_names is for cosmetic purpose only in the metrics presentation step.
  • We employ the torch library tools to compute different metrics such as accuracy and F1, with outputs in tensors. Hence in order to present them in a dataframe, we need to convert them to the normal numpy format.
def calculate_model_performance(y_true, y_pred, class_names):
num_classes = len(set(y_true + y_pred))
# build confusion matrix based on predictions and class_index
confusion_matrix = torch.zeros(num_classes, num_classes)
for i in range(len(y_pred)):
# true label on row, predicted on column
confusion_matrix[y_true[i], y_pred[i]] += 1

# PER-CLASS METRICS:
# calculate accuracy, precision, recall, f1 for each class:
accuracy = torch.zeros(num_classes)
precision = torch.zeros(num_classes)
recall = torch.zeros(num_classes)
f1_score = torch.zeros(num_classes)
for i in range(num_classes):
# find TP, FP, FN, TN for each class:
TP = confusion_matrix[i, i]
FP = torch.sum(confusion_matrix[i, :]) - TP
FN = torch.sum(confusion_matrix[:, i]) - TP
TN = torch.sum(confusion_matrix) - TP - FP - FN
# calculate accuracy, precision, recall, f1 for each class:
accuracy[i] = (TP+TN)/(TP+FP+FN+TN)
precision[i] = TP/(TP+FP)
recall[i] = TP/(TP+FN)
f1_score[i] = 2*precision[i]*recall[i]/(precision[i]+recall[i])
# calculate support for each class
support = torch.sum(confusion_matrix, dim=0)
# calculate support proportion for each class
support_prop = support/torch.sum(support)

# OVERALL METRICS
# calculate overall accuracy:
overall_acc = torch.sum(torch.diag(confusion_matrix)
)/torch.sum(confusion_matrix)
# calculate macro average F1 score:
macro_avg_f1_score = torch.sum(f1_score)/num_classes
# calculate weighted average rF1 score based on support proportion:
weighted_avg_f1_score = torch.sum(f1_score*support_prop)

TP = torch.diag(confusion_matrix)
FP = torch.sum(confusion_matrix, dim=1) - TP
FN = torch.sum(confusion_matrix, dim=0) - TP
TN = torch.sum(confusion_matrix) - (TP + FP + FN)

# calculate micro average f1 score based on TP, FP, FN
micro_avg_f1_score = torch.sum(
2*TP)/(torch.sum(2*TP)+torch.sum(FP)+torch.sum(FN))

# METRICS PRESENTATION
# performance for each class
class_columns = ['accuracy', 'precision', 'recall', 'f1_score']
class_data_raw = [accuracy.numpy(), precision.numpy(),
recall.numpy(), f1_score.numpy()]
class_data = np.around(class_data_raw, decimals=3)
df_class_raw = pd.DataFrame(
class_data, index=class_columns, columns=class_names)
class_metrics = df_class_raw.T

# overall performance
overall_columns = ['accuracy', 'f1_mirco', 'f1_macro', 'f1_weighted']
overall_data_raw = [overall_acc.numpy(), micro_avg_f1_score.numpy(
), macro_avg_f1_score.numpy(), weighted_avg_f1_score.numpy()]
overall_data = np.around(overall_data_raw, decimals=3)
overall_metrics = pd.DataFrame(
overall_data, index=overall_columns, columns=['overall'])
return confusion_matrix, class_metrics, overall_metrics

The final set of functions simply help us with the saving of the trained model and training report with current timestamp (to avoid overwriting older files).

def get_current_timestamp():
now = datetime.datetime.now()
return now.strftime("%Y%m%d_%H%M%S")


def save_model_with_timestamp(model, filepath = MODEL_SAVE_LOC):
filename = get_current_timestamp() + '_cnn_model' + '.pt'
filepath = os.path.join(filepath, filename)
torch.save(model.state_dict(), filepath)
return print('Saved model to: ', filepath)


def save_csv_with_timestamp(train_result_dict, filepath = MODEL_SAVE_LOC):
filename = get_current_timestamp() + '_training_report' + '.csv'
filepath = os.path.join(filepath, filename)
df = pd.DataFrame(train_result_dict)
df.to_csv(filepath)
return print('Saved training report to: ', filepath)

IV. MODEL TRAINING

With all necessary modules defined, our experiment notebook can be very simple as followed:

1. Creating and Preprocessing Input Dataset

Firstly, we create the annotation dataframes and print the class name — class index pairs out for verification. We can see that the pairs are consistent in both the training and testing dataset.

train_df = build_annotation_dataframe(image_location = TRAIN_DATA_LOC, annot_location = ANNOT_LOC, output_csv_name = 'train.csv')
test_df = build_annotation_dataframe(image_location = TEST_DATA_LOC, annot_location = ANNOT_LOC, output_csv_name = 'test.csv')
class_names = list(train_df['class_name'].unique())
print(class_names)
print(check_annot_dataframe(train_df))
print(check_annot_dataframe(test_df))
Output:
['buildings', 'forest', 'glacier', 'mountain', 'sea', 'street']
[(5, 'street'), (1, 'forest'), (4, 'sea'), (0, 'buildings'), (2, 'glacier'), (3, 'mountain')]
[(5, 'street'), (1, 'forest'), (4, 'sea'), (0, 'buildings'), (2, 'glacier'), (3, 'mountain')]

Secondly, we call the image transformation function and apply it to the training, validation (4:1 ratio with validation_proportion = 0.2), and testing datasets, which are created by the custom class IntelDataset . We also print out the size of these datasets for checking.

image_transform = transform_bilinear(INPUT_WIDTH, INPUT_HEIGHT)
main_dataset = IntelDataset(annot_df = train_df, transform=image_transform)
train_dataset, validation_dataset = create_validation_dataset(main_dataset, validation_proportion = 0.2)
print('Train set size: ', len(train_dataset))
print('Validation set size: ', len(validation_dataset))

test_dataset = IntelDataset(annot_df = test_df, transform=image_transform)
print('Test set size: ', len(test_dataset))
Output:
Train set size: 11227
Validation set size: 2807
Test set size: 3000

2. Configuring the Dataloaders

The next important step is to configure the DataLoaders from the datasets above. The main purpose is to provides an iterable over these dataset, according to the batch size. Also it is necessary to shuffle images in these datasets, especially the training set, so that the training process can be done with different image classes everytime and improves the model generalization ability. The use of num_workers can be well explained by the article linked below, but it is not a focus of this project and you can optimize it further.

train_loader = DataLoader(train_dataset, batch_size = BATCH_SIZE, shuffle=True, num_workers = NUM_WORKERS)
val_loader = DataLoader(validation_dataset, batch_size = BATCH_SIZE, shuffle=True, num_workers = NUM_WORKERS)
test_loader = DataLoader(test_dataset, batch_size = BATCH_SIZE, shuffle=True, num_workers = NUM_WORKERS)

3. Model Training and Exporting

Finally we come to the training part. We simply initialize the model with the architecture defined previously, use the default loss function and optimizer and set the number of epoch (10 in this trial). We can see that we are dealing with quite a sizable model with more than 25 millions trainable parameters (mostly from the first FC layer as expected). The Output Shape column helps with the understanding of the layer output dimensions.

# initiation
model = cnn_model.MyCnnModel()
device = modelling_config.get_default_device()
modelling_config.model_prep_and_summary(model, device)
criterion = modelling_config.default_loss()
optimizer = modelling_config.default_optimizer(model = model)
num_epochs = 10

# get training results
trained_model, train_result_dict = cnn_model.train_model(model, device, train_loader, val_loader, criterion, optimizer, num_epochs)
cnn_model.visualize_training(train_result_dict)
Current device: cpu
Is Model on CUDA: False
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 16, 224, 224] 448
Conv2d-2 [-1, 16, 224, 224] 2,320
MaxPool2d-3 [-1, 16, 112, 112] 0
Conv2d-4 [-1, 32, 112, 112] 4,640
Conv2d-5 [-1, 32, 112, 112] 9,248
MaxPool2d-6 [-1, 32, 56, 56] 0
Conv2d-7 [-1, 64, 56, 56] 18,496
Conv2d-8 [-1, 64, 56, 56] 36,928
Conv2d-9 [-1, 64, 56, 56] 36,928
MaxPool2d-10 [-1, 64, 28, 28] 0
Linear-11 [-1, 500] 25,088,500
Linear-12 [-1, 50] 25,050
Linear-13 [-1, 6] 306
================================================================
Total params: 25,222,864
Trainable params: 25,222,864
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 25.65
Params size (MB): 96.22
Estimated Total Size (MB): 122.44
----------------------------------------------------------------

The training history is plotted out as followed. Here we can see the progression of the classification losses on the training and validation sets. the point of over-fitting occurs at epoch 4 after which the validation loss starts to creep up as the training loss decreases. Hence epoch 5 could be a good point to stop the training in our next trial. Finally we save the models trained on 10 and 5 epochs to the per-determined location.

save_model_with_timestamp(trained_model, MODEL_SAVE_LOC)
Training progress for 10 epochs

V. MODEL PERFORMANCE

Finally, we will proceed to import and check the performance of the 2 models on the test set. The reason for exporting/importing instead of using directly from the trained models in the same notebook is just a personal preference as I can just run this portion of the notebook, import pre-trained models and proceed when restarting my work without retraining, which would take way too long on my poor laptop.

Here the 10-epoch-trained model is the first model in the saved location. We use the built-in .load_state_dict method of the model class to load the model blueprint with the pre-trained parameters. Finally we call the custom infer() method explained previously on the test DataLoader, and utilize the calculate_model_performance function to see the metrics.

The overall accuracy is 68.4%. We can see more details from the confusion metrix where the diagonal values represent the correct predictions. For example, glaciers (row 3) are most confused with mountains (row 4) where 136 glacier images are classified as mountains.

trained_model_list = os.listdir(MODEL_SAVE_LOC)
MODEL_10_EPOCH_PATH = os.path.join(MODEL_SAVE_LOC, trained_model_list[0])
MODEL_10_EPOCH = cnn_model.MyCnnModel()
device = modelling_config.get_default_device()
print(MODEL_10_EPOCH_PATH)
MODEL_10_EPOCH.load_state_dict(torch.load(MODEL_10_EPOCH_PATH))

# check accuracy on test set
y_pred, y_true = cnn_model.infer(model = MODEL_10_EPOCH, device = device, data_loader = test_loader)
confusion_matrix, class_metrics, overall_metrics = calculate_model_performance(y_pred, y_true, class_names = class_names)

print(confusion_matrix)
print(class_metrics)
print(overall_metrics)
Output:
tensor([[325., 15., 37., 47., 46., 76.],
[ 14., 425., 4., 3., 7., 29.],
[ 5., 0., 270., 51., 51., 7.],
[ 2., 5., 136., 366., 116., 1.],
[ 14., 0., 89., 55., 283., 4.],
[ 77., 29., 17., 3., 7., 384.]])
accuracy precision recall f1_score
buildings 0.889 0.595 0.744 0.661
forest 0.965 0.882 0.897 0.889
glacier 0.868 0.703 0.488 0.576
mountain 0.860 0.585 0.697 0.636
sea 0.870 0.636 0.555 0.593
street 0.917 0.743 0.766 0.754
overall
accuracy 0.684
f1_mirco 0.684
f1_macro 0.685
f1_weighted 0.681

Let’s take a look at the model trained with 5 epochs. We can see that the accuracy is slightly higher although the training time is halved. This is expected as the model displays over-fitting after the 4th epoch, hence its ability to generalize actually worsens with more epochs, and would perform worse on the test set.

tensor([[309.,  23.,  16.,  21.,  12.,  49.],
[ 5., 383., 0., 3., 1., 9.],
[ 8., 0., 373., 67., 63., 7.],
[ 6., 3., 24., 233., 34., 0.],
[ 23., 2., 130., 200., 392., 12.],
[ 86., 63., 10., 1., 8., 424.]])
accuracy precision recall f1_score
buildings 0.917 0.719 0.707 0.713
forest 0.964 0.955 0.808 0.875
glacier 0.892 0.720 0.675 0.697
mountain 0.880 0.777 0.444 0.565
sea 0.838 0.516 0.769 0.618
street 0.918 0.716 0.846 0.776
overall
accuracy 0.705
f1_mirco 0.705
f1_macro 0.707
f1_weighted 0.704

VI. TESTING ON A SINGLE IMAGE

Finally, let’s use the 5-epoch model to infer a single image. We simply call the infer_single_image() method of the model class and input appropriate arguments. In the code below, a random image is picked from the PRED_DATA_LOC folder.

image_list = os.listdir(PRED_DATA_LOC)
random_image = random.choice(image_list)
random_image_path = os.path.join(PRED_DATA_LOC, random_image)
print(random_image_path)

predicted_class_index = cnn_model.infer_single_image(
model=MODEL_5_EPOCH,
device=device,
image_path=random_image_path,
transform=image_transform)
print(class_names[predicted_class_index])
Output:
..\data\01_raw\seg_pred\11904.jpg
Predicted Class Index: tensor([0])
buildings

The image chosen (11904.jpg) is shown below and it is predicted as buildings class, which is correct.

11904.jpg

V. CONCLUSION

In summary, this article is my attempt to implement PyTorch for image classification, in a more organized, project-based fashion. Overall, the performance is not good but not too terrible. There are ways to improve it, most practically through transfer learning, which I will implement in the next article.

Thank you for reading and I welcome any feedback and tips.

--

--

Joshua Phuong Le

I’m a data scientist having fun writing about my learning journey. Connect with me at https://www.linkedin.com/in/joshua3112/