Analytics Vidhya
Published in

Analytics Vidhya

Tackling Class Imbalance for Object Classification Using YOLOv5 with a 2 Model Approach

This is the age of Artificial Intelligence and it is governed by deep learning specially for image classification tasks in which an image can have lots of information that is of our interest. To extract and identify this information, deep learning based techniques are used. Some of those techniques include segmentation using RCNN or fast object detection using the YOLO series of deep learning models etc.

These models do amazingly well in finding granular level information from images and giving the desired results. However, training these models requires:

  1. Lots and lots of diverse data.
  2. Maintaining class balance between different classes of data we aim to classify in images.
  3. Great Hardware i.e. high performance GPUs for training data
  4. And last but not the least, it can take days to train deep learning models.

Thus, after these steps if you want to add another class to the data then it might mean either changing the model’s last layer’s output to accommodate for the new class and retraining only last few layers OR in worse case, retraining the model with all data of new class too. And do the same for every new class(es) you want to add to previous trained model, which will be hectic.

To remedy that, in this article, I propose the use of 2 models for object classification using YOLOv5. First model will be YOLO, whose responsibility will be to just identify super classes. Second model will be an image classifier that will take the desired superclass that we want and further classify it into subclass.


Following simple steps can be followed:

  1. Train the YOLO model with images of only the super classes you anticipate to predict. Thus training data images will only be annotated with super classes. YOLO will tell us where super classes are located. For example, if we want to identify person and product, then the super classes can be Person class and Object class as shown below:

2. Now, Object is what we aim to classify if it is pepsi or coca-cola or whatever. Thus, from YOLO model’s output, we will get all the objects in the image labeled as “Object” and send them to an image classifier to get the name of that object.

3. For this, take an image classifier of choosing (Resnext etc.) and train it with the images of sub class you desire to predict. For example, if we want to identify pepsi then we can train image classifier with images of just Pepsi and so on .. as many products as we anticipate to see.

4. In the end, YOLO will tell us what super classes in image are, and Image classifier will tell us what the subclasses actually are.

Here is how workflow looks in essence:


  1. This approach gives us the liberty to collect data even from Google Images for sub class as we dont have to collect samples of the desired product in a fixed environment and deal with class imbalance. We can simply get the images of sub class from Google and augment them. For example, if in retail context, we aim to classify what product is being sold then if we were to use only YOLO, then we will have to have samples of one particular product hundreds of times so that model could learn it, and same process for all products. Thus, single model approach will become impractical where the products that we aim to observe, are many more.
  2. No need to keep retraining YOLO model for days if it does well on superclasses.
  3. Any addition in subclass will only need training of image classifier which is not as extensive as retraining the YOLO model.

Link To Code:

I used the default YOLOv5 code from Ultralytics, which is an amazing work by Mr. Glenn Jocher, and converted the repository to have 2 stage detection support for a ResNext model. Currently, the code is good for 2 cases i.e. same super class and subclass and one superclass and subclass but can be editted to provide more super class support.

Here is the link of my Github code that you can use to convert a default YOLOv5 into a 2 stage detector that i explained above.


Deep learning models are amazing for object detection but, come with the resource extensiveness, and require training for good amount of time. But using smart techniques this training time can be reduced. If the technique and code helps, then please give credit by mentioning my name and profile.

Special Thanks:

I would like to thank the people who trusted me with this task and to Glenn Jocher and the developers behind YOLOv5.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Shayan Ali Bhatti

Shayan Ali Bhatti


Avid observer of life and software & Machine Learning developer