Implementing Faster R-CNN and YOLOv8 for the Helmet Detection Project: The Coding Aspect

Ayush Raj
15 min readFeb 21, 2024

--

This is the 5th Installment of the Series. To catch up, you can always go back and refer to: Series Introductory Blog.

Dataset used during the Whole Project

For the Dataset, we used one of the datasets publicly available on the Kaggle Website. Here’s the link:

This dataset, contains 5000 images with bounding box annotations in the PASCAL VOC format (x_min, y_min, x_max, y_max) for these 3 classes:

They’re [helmet, head & person], but mainly to predict helmet because the data is heavily imbalanced.

This picture demonstrates the imbalance in the dataset among the classes — Image by Author

As you can clearly see from the histogram plotted above, almost 75% of the occurrences in the Dataset is of helmet class which is a not good thing.

Although, this dataset was never our first choice, but this was the only decent one we could found on the Internet.

And this will also work because our main purpose is to detect helmets, for which there are sufficient data samples. I can suggest you some measures to fix this issue, if you want. This dataset is not properly sampled. So, you can use some the re-sampling techniques, primarily Undersampling and Oversampling. But Undersampling must not be used here as this is not what you’ll want. You’ll not prefer the occurrences of Helmet to be low. Remember more the data samples, the better the Results!

Additionally, there are more advanced resampling techniques, such as SMOTE (Synthetic Minority Over-sampling Technique), ADASYN (Adaptive Synthetic Sampling), and others, which generate synthetic samples for the minority class based on their nearest neighbors. These methods aim to create more diverse and representative synthetic samples compared to simple duplication.

We’ve Partitioned the whole Dataset into Train, Val & Test Sets in the proportion of 8:1:1 for all our Projects. This will be 4000, 500 and 500 images for the mentioned sets above respectively.

So, Let’s Jump into the Projects right away! The Section you all were waiting for, since we started this Object Detection Blog Series.

We’ve implemented all Projects using PyTorch Framework.

Project 1

We’ve used a pre-trained Faster R-CNN Model with Resnet50 backbone fine-tuned on this Dataset’s Train Split.

Let’s See the Code Snippets for Training The Model:

#### Function Defn for Training Iterations



# function for running training iterations
from tqdm import tqdm

def train(train_data_loader, model):
print('Training...')
global train_itr
global train_loss_list

# initialize tqdm progress bar
prog_bar = tqdm(train_data_loader, total=len(train_data_loader))

# criterion = torch.nn.BCELoss(size_average=True)

for i, data in enumerate(prog_bar):
optimizer.zero_grad() #####
images, targets = data #####

images = list(image.to(DEVICE) for image in images) #####
targets = [{k: v.to(DEVICE) for k, v in t.items()} for t in targets] #####

loss_dict = model(images, targets) #####
losses = sum(loss for loss in loss_dict.values()) #####
loss_value = losses.item()
train_loss_list.append(loss_value)
train_loss_hist.send(loss_value)
losses.backward() #####
optimizer.step() #####
train_itr += 1 #####

# update the loss value beside the progress bar for each iteration
prog_bar.set_description(desc=f"Loss: {loss_value:.4f}")
return train_loss_list
##### Function Defn for Validation Iterations



# The validation function returns a similar list containing the loss values
# for all the completed iterations

# function for running validation iterations
def validate(valid_data_loader, model):
print('Validating...')
global val_itr
global val_loss_list

# initialize tqdm progress bar
prog_bar = tqdm(valid_data_loader, total=len(valid_data_loader))

for i, data in enumerate(prog_bar):
images, targets = data

images = list(image.to(DEVICE) for image in images)
targets = [{k: v.to(DEVICE) for k, v in t.items()} for t in targets]

with torch.no_grad():
loss_dict = model(images, targets)
losses = sum(loss for loss in loss_dict.values())
loss_value = losses.item()
val_loss_list.append(loss_value)
val_loss_hist.send(loss_value)
val_itr += 1

# update the loss value beside the progress bar for each iteration
prog_bar.set_description(desc=f"Loss: {loss_value:.4f}")
return val_loss_list
#### Averager Class for loggging Training & Validation Losses.
#### Then we save model for the best epoch we'll get as we progress.
# We save this bcoz this is the one on which we'll evaluate
# Our Test Set for Results. Obviously, you want the model which has
# the best minimum loss.
#### We also save the Last Model to have a backup in case of
# in case of unexpected interruptions or errors during training,
# ensuring that we don't lose valuable progress.

class Averager:
def __init__(self):
self.current_total = 0.0
self.iterations = 0.0

def send(self, value):
self.current_total += value
self.iterations += 1

@property
def value(self):
if self.iterations == 0:
return 0
else:
return 1.0 * self.current_total / self.iterations

def reset(self):
self.current_total = 0.0
self.iterations = 0.0

class SaveBestModel:
"""
Class to save the best model while training. If the current epoch's
validation loss is less than the previous least less, then save the
model state.
"""
def __init__(self, best_valid_loss=float('inf')):
self.best_valid_loss = best_valid_loss

def __call__(self, current_valid_loss, epoch, model, optimizer):
if current_valid_loss < self.best_valid_loss:
self.best_valid_loss = current_valid_loss
print(f"\nBest validation loss: {self.best_valid_loss:.3f}")
print(f"\nSaving best model for epoch: {epoch+1}\n")
torch.save({
'epoch': epoch+1,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'scheduler_state_dict': scheduler.state_dict(),
}, '/content/drive/My Drive/helmet_dataset/savedmodel/best_model.pth')


# function to save the model after each epoch and after training ends
def save_model(epoch, model, optimizer):
"""
Function to save the trained model till current epoch, or whenver called
"""
torch.save({
'epoch': epoch+1,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
'scheduler_state_dict': scheduler.state_dict(),
}, '/content/drive/My Drive/helmet_dataset/savedmodel/last_model.pth')
# The Training Loop

# initialize the Averager class
train_loss_hist = Averager()
val_loss_hist = Averager()
train_itr = 1
val_itr = 1

# train and validation loss lists to store loss values of all
# iterations till end and plot graphs for all iterations
train_loss_list = []
val_loss_list = []

# initialize SaveBestModel class
save_best_model = SaveBestModel()

NUM_EPOCHS = 16 # number of epochs to train for
epoch=0

# start the training epochs
for epoch in range(epoch, NUM_EPOCHS):
print(f"\nEPOCH {epoch+1} of {NUM_EPOCHS}")

# reset the training and validation loss histories for the current epoch
train_loss_hist.reset()
val_loss_hist.reset()

train_loss = train(train_loader, model)
val_loss = validate(valid_loader, model)

print(scheduler.get_last_lr()) # this prints the lr used in training that specific epoch
scheduler.step()

print(f"Epoch #{epoch+1} train loss: {train_loss_hist.value:.3f}")
print(f"Epoch #{epoch+1} validation loss: {val_loss_hist.value:.3f}")

# save the best model till now if we have the least loss in the current epoch
save_best_model(val_loss_hist.value, epoch, model, optimizer)
# save the current epoch model
save_model(epoch, model, optimizer)

# save is done after scheduler.step(), as info
# related to scheduler is saved in terms of steps, if saved before using scheduler.step(), it would cause training one more epoch
# in a specific lr, here step size is 5, so when resumed training it will train one more epoch on that lr causing 6 to train using that
# particular lr, which will be wrong.

These Snippets will guide you through the whole training process used for this Project. Just to give you a brief outline, we first defined the functions for iterating through the Training & Validation Datasets. Then we defined classes for logging losses and for saving the best and last Models. And then we have the final snippet for The Training Loop. Just look at the snippets once thoroughly. I’ve tried to explain each and every step in best possible way. Hope you get all of them!

Hey, one more thing, you know na what these are and why they are used here? Just few Questions to check your understanding on how training of a deep neural network happens:

  1. Why we need Validation set? Aren’t only training and testing sets enough?
  2. This is the follow-up question for 1st. What are overfitting and underfitting, and how can we mitigate them during training?
  3. How does regularization, such as L1 and L2 regularization, help prevent overfitting? Which type of regularization we’ve used here?
  4. How does gradient descent work, and what are the different variants of gradient descent optimization algorithms?
  5. What are epochs and batch size, and how do they impact the training process?
  6. How do learning rate and learning rate scheduling affect the convergence of the training process?
  7. How to gain insights on what’s happening and why it is so, in the Training Process?

If you’re struggling with any of these questions, you need to re-visit the concepts again. Just google it, you’ll find some very good resources out there. I’ll also try to share some of them!

Challenges and Solutions during Implementation

I can’t tell you how challenging and painful it was! We experimented with the Training Process more than 25 times with different possible configurations to achieve best possible loss. And still we don’t think we’ve the best possible model with us. We had to change and try different things every time like tuning of hyperparameters, what should be the value of Learning Rate, value of weight decay, value of momentum, no. of epochs (will it underfit or overfit), which optimizer to use, should we incorporate momentum or not, should we train with or without learning rate scheduling, should we increase it or decrease the learning rate(will it overshoot)? If we do so, what will be the effect on validation loss? when to stop during training so that in next epoch, it shouldn’t start overfitting.

Wait, wait! Don’t get scared of this. These are challenging but this is also what makes the Neural Networks so powerful. This is Deep Learning, field of experimentation! You have to experiment again and again to achieve the desired outcome. But this becomes time-taking and challenging many of the times.

Now come to the Solutions Part. The Solution to these challenges is that there is no exact solution. Yes, I m not joking. There is no cast of stone to follow. It depends on the Problem at hand. But surely, there are some tips to ensure better Training of the Model. Let’s look at them:

Tips for initial learning rate

  1. Tune learning rate [Try different values on a log scale: 0.0001, 0.001, 0.01, 0.1, 1.0].
  2. Run a few epochs with each of these and figure out a learning rate which works best.
  3. Now do a finer search around this value [for example, if the best learning rate was 0.1 then now try some values around it: 0.05, 0.2, 0.3].
  4. Disclaimer: these are just heuristics … no clear winner strategy.

Tips for annealing learning rate

Step Decay:

  1. Halve the learning rate after every 5 epochs or
  2. Halve the learning rate after an epoch if the validation error is more than what it was at the end of the previous epoch.

Tips for momentum

The following schedule was suggested by Ilya Sutskever, Chief Scientist, OpenAI :

Formula for Momentum — By Ilya Sutskever

Million dollar question: Which algorithm to use in practice?

  1. Adam seems to be more or less the default choice now ( beta1 = 0.9, beta2 = 0.999 and epsilon = 1e -8 ).
  2. Having said that, many papers report that SGD with momentum (Nesterov or classical) with a simple annealing learning rate schedule also works well in practice (typically, starting with lr = 0.001, or 0.0001)

So, in conclusion (SGD + Momentum) or Adam with a scheduler might just be the best choice overall!!!

So, finally, here’s the list of hyperparameter values which i used in this Project:

SGD with Momentum is used along with Step LR Scheduling.

  1. SGD Hyperparameters
  • Learning Rate(η) = 0.001
  • Momentum (β) = 0.9
  • Weight Decay(λ) (L2 Regularization) = 0.0005

2. Step LR Scheduling Hyperparameters

  • Step Size = 4
  • Gamma (γ) = 0.5

3. No. of Epochs = 16

4. Batch Size = 4

Project 2

We’ve used the State of the Art Model YOLOv8 (Medium Version) fine-tuned on the same dataset.

See, in coding aspect, there’s nothing much for YOLO. Mostly everything is under the hood, because YOLO believes in no-code policy so that it becomes accessible for everyone.

We can implement YOLO using any of the two methods which are through CLI commands or Python Environment.

Generally, we use CLI commands for tasks like training & making predictions which is the simplest task you can do with YOLO.

For complex tasks which are built on top of YOLOv8, like just to name a few Object Counting, Tracking or Speed Estimation of Vehicles and many more, it is generally advised to use Pythonic Version for minimizing performance & bug issues. And that’s what our next Project is built for.

So, in this Project, we used CLI commands for simplicity and for keeping code look clean.

I’ll provide you some snippets for that:

!git clone https://github.com/ultralytics/ultralytics
!pip install ultralytics
# For example:

# "/content/drive/My Drive/helmet_dataset_YOLO/images/train/hard_hat_workers0.png" # image
# "/content/drive/My Drive/helmet_dataset_YOLO/labels/train/hard_hat_workers0.txt # label

import yaml

# Create configuration
config = {

"train": "/content/drive/My Drive/helmet_dataset_YOLO/images/train",
"val": "/content/drive/My Drive/helmet_dataset_YOLO/images/val",
"test": "/content/drive/My Drive/helmet_dataset_YOLO/images/test",
"nc": 3,
"names": ['helmet','head','person']
}
with open("data.yaml", "w") as file:
yaml.dump(config, file, default_flow_style=False)
!yolo task=detect mode=train data=data.yaml model=yolov8m.pt epochs=20 lr0=0.01

For training the model, first you need to convert your data into YOLO format,(x_centre, y_centre, width, height) if not done so.

Then a .yaml file is needed for YOLO which stores all the paths for all your sets (train, val & test) and the classes. This is the file that you’ll pass to the training command as argument.

Next, we have the Training command, Let’s break it one by one:

TASK : YOLOv8 is an AI framework that supports multiple computer vision tasks. The framework can be used to perform detection, segmentation, obb, classification, and pose estimation. Each of these tasks has a different objective and use case. So, the task parameter can take values like detect for Object Detection, segment for Instance Segmentation, obb for Oriented Object Detection, classification for Image Classification, and pose for Pose Estimation. We’re using it for Object Detection. So, we’ll use detect for task.

MODE :

  • Train mode: Fine-tune your model on custom or preloaded datasets.
  • Val mode: A post-training checkpoint to validate model performance.
  • Predict mode: Unleash the predictive power of your model on real-world data.
  • Export mode: Make your model deployment-ready in various formats.
  • Track mode: Extend your object detection model into real-time tracking applications.
  • Benchmark mode: Analyze the speed and accuracy of your model in diverse deployment environments.

Data= data.yaml. Here we’re passing the .yaml file we made for to YOLO model to access required path.

MODEL: As I’ve already explained you about the different model versions in my previous blog on YOLO, you know that there 5 different versions and we’re here using the medium which is a good trade-off between speed and accuracy for the Problem Statement we’ve have in our hand.

epochs and lr, as you already have understood denotes no. of epochs and learning rate respectively.

Till now, you must be wondering, where are the other things like optimizer, scheduler or batch size? There’s must be something fishy.

But there’s not! Actually what YOLO does is that it internally stores default values for all the configurations and hyperparameter values in as file named “default.yaml”. So, if we don’t pass explicitly values of any configuration, it will take the default value. Like here we specified values for epochs and learning rate but that doesn’t mean that the default yaml files don’t have values for that. It’s just that we’re not not using the default values. Just check out the “default.yaml” file once. You’ll get what i just explained.

Here’s the list of hyperparameter values we’ve used in this Project, in one-place, for a better reference.

Training Hyperparameters

  1. Epoch = 20
  2. Batch Size = 16
  3. Workers = 8 (number of worker threads for data loading)
  4. Initital Learning Rate (ηi) [lr0] = 0.01
  5. Final Learning Rate (ηf) [lr0 * lrf] = 0.01 * 0.01 = 0.0001
  6. Optimizer is set to auto, choices=[SGD, Adam, Adamax, AdamW, NAdam, RAdam, RMSProp, auto]
  7. Momentum (β) = 0.937 (SGD momentum / Adam (β1) )
  8. Weight Decay (λ) = 0.0005
  9. Linear LR Scheduler which deacys η every epoch linearly from (ηi) to (ηf) at the last epoch of training.

All hyperparameters & configurations including the above are used with their (default values) except for number of epochs whose default value is 100.

Tips for best Training Results

Most of the time good results can be obtained with no changes to the models or training settings. If at first you don’t get good results, there are steps you might be able to take to improve.

  1. Epochs : Start with 300 epochs. If this overfits early then you can reduce epochs. If overfitting does not occur after 300 epochs, train longer, i.e. 600, 1200 etc. epochs.
  2. Image size : COCO trains at native resolution of — img 640, though due to the high amount of small objects in the dataset it can benefit from training at higher resolutions such as — img 1280. If there are many small objects then custom datasets will benefit from training at native or higher resolution. Best inference results are obtained at the same — img as the training was run at, i.e. if you train at — img 1280 you should also test and detect at — img 1280.
  3. Batch size : Use the largest batch size that your hardware allows for. Small batch sizes produce poor batchnorm statistics and should be avoided.

Hyperparameter Tuning

Ultralytics YOLO uses genetic algorithms to optimize hyperparameters. Genetic algorithms are inspired by the mechanism of natural selection and genetics.

Mutation : In the context of Ultralytics YOLO, mutation helps in locally searching the hyperparameter space by applying small, random changes to existing hyperparameters, producing new candidates for evaluation.

Crossover : Although crossover is a popular genetic algorithm technique, it is not currently used in Ultralytics YOLO for hyperparameter tuning. The focus is mainly on mutation for generating new hyperparameter sets.

Project 3

As you know from the overview of the series, This Project is a tool that we’ve built on top of YOLO.

This Tool is Object Counting in Real-Time Videos. Just to reiterate,

This tool simply plots the bounding boxes of Helmets without confidence score (else it will become clumsy, just to look clean) and displays the current count of Helmets in the videos.

And you already know the applications that I’ve mentioned in the Introductory Blog of this Series.

Regarding the Coding Part, there’s nothing much. But I’ll brief it once:

!pip install ultralytics -q
import ultralytics
ultralytics.__version__
from ultralytics import YOLO

import time
import torch
import cv2
from PIL import Image
import numpy as np
import torch.backends.cudnn as cudnn
# a module that provides access to functionality related to NVIDIA's CUDA Deep Neural Network library (cuDNN)
# accelerates GPU Performance.
from ultralytics import YOLO
import cv2
import os
from tqdm import tqdm

model = YOLO("/content/drive/MyDrive/Object_Counting_YOLOv8/best.pt") # load a trained model fine-tuned on your custom dataset

class_names =[ 'helmet', 'head', 'person'] # the class objects

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

# Define the video paths
video_paths = ['/content/drive/MyDrive/Object_Counting_YOLOv8/videos/1.mp4','/content/drive/MyDrive/Object_Counting_YOLOv8/videos/2.mp4',
'/content/drive/MyDrive/Object_Counting_YOLOv8/videos/3.mp4',
'/content/drive/MyDrive/Object_Counting_YOLOv8/videos/4.mp4']

output_directory = '/content/drive/MyDrive/Object_Counting_YOLOv8/output_videos'

os.makedirs(output_directory, exist_ok = True)

for video_path in tqdm(video_paths):

cap = cv2.VideoCapture(video_path) #cv2 is pythonic librabry of OpenCV
#used to capture video frames from a video or a connected camera.

# Get the video properties
frame_width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
frame_height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)

# Define the codec and create VideoWriter object
# Define the codec to use (MP4V)
fourcc = cv2.VideoWriter_fourcc(*'mp4v')

# Define the path where the output video will be saved

output_path = os.path.join(output_directory, f'output_{os.path.basename(video_path)}')
out = cv2.VideoWriter(output_path, fourcc, fps, (frame_width, frame_height))

while cap.isOpened():
ret, frame = cap.read() #reads the next frame from the video source
#ret : boolean variable that indicates whether the frame was read successfully or not
#frame : stores the actual video frame that was read by the cap.read() method.

if not ret:
break #Exit loop if no more frames are available

og_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
#converts the color space of the input image (frame) from BGR to RGB.
#frame: This is the input image, represented as a NumPy array or tensor.
#BGR is the default color space used by OpenCV when reading images.
#While Matplotlib expects images in RGB format when displaying.

detection_results = model.predict(og_frame,classes = 0, conf = 0.6) #setting confidence detection thresshold = 0.6

boxes = detection_results[0].boxes.xyxy.cpu().tolist() # Boxes object for bbox outputs
classes = detection_results[0].boxes.cls.cpu().tolist() # Class probabilities for classification outputs
scores = detection_results[0].boxes.conf.cpu().tolist() # confidence score
# and converting all of them into lists.

for i in range(len(boxes)):
box = boxes[i]
cls = classes[i]
score = round(scores[i],2)
x1, y1, x2, y2 = int(box[0]), int(box[1]), int(box[2]), int(box[3])
cv2.rectangle(og_frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
#cv2.putText(og_frame, str(score), (x1+10, y1-5), cv2.FONT_HERSHEY_SIMPLEX,
# 0.5, (255,0,0), 1, cv2.LINE_AA)

cv2.putText(og_frame, f"Helmets Count : {len(boxes)}", (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
1, (0, 0, 0), 2, cv2.LINE_AA)
# Syntax of PutText
# (Image on which text will be drawn, Text to be drawn, Position of the text (top-left corner),Font type
# Font scale (size),Text color (in BGR format), Thickness of the text, Type of line for text rendering)

out.write(cv2.cvtColor(og_frame, cv2.COLOR_RGB2BGR)) # writes back the prediceted frames to Video File one by one

# these lines ensure that all resources associated
# with video capture, video writing, and OpenCV windows are properly released and closed,
# to free up system resources and to provide a clean exit for your program
cap.release()
out.release()
cv2.destroyAllWindows()

We’re using the best model obtained from training in the Project 2. And we’re then using it to count the no. of helmets in some sample videos i’ve provided. We read the video feed using cv2 library and then applied the model we’ve on individual frames one by one. Then, we are putting all the predicted frames back to Video file. And yeah, that’s it! This is the whole code for this code. It can also be implemented on your Webcam with some modifications. Just put it on ChatGPT and it’ll modify it for your use-case.

What’s Next?

We’ll be discussing and analyzing about the predictions and evaluation metrics in the upcoming blogs in great detail. Till then, Happy Learning!

Do Follow and Upvote for more such content!!!

List of Resources

--

--

Ayush Raj

A passionate learner who loves to break complex concepts into simpler explanations. Research Interests include Deep Learning and Computer Vision.