Maneuvering Color Mask into Object Detection

Published in

Globant

5 min readAug 28, 2020

I have been playing around with object detection algorithms for a while now and in this article, I will be building a small object detection application using Python and OpenCV.

Idea

We will generate the mask for the object that we are trying to detect by identifying the color information of that object and then use that mask to detect the object in a frame. We will also draw a bounding box around the object.

Before we get hands-on, let’s clear out some basic concepts that I have used to make this happen.

HSV color model

HSV(Hue Saturation Value) is a cylindrical color model that remaps the RGB primary colors into dimensions where -

Hue specifies the angle of the color on the RGB color circle.
Saturation controls the amount of color used.
Value controls the brightness of the color.

For getting the color information of the object accurately we will convert the image into the HSV color model.

Why HSV?

Any color can be basically described in various color models(details here). The advantage of the HSV model over the RGB or BGR model is that it separates luma, or the image intensity, from chroma or the color information.

How does this help you ask?

Imagine we have a single color plane with a shadow on it. In RGB colorspace, the shadow part will most likely have very different characteristics than the part without shadows. In HSV colorspace, the hue component of both patches is more likely to be similar: the shadow will primarily influence the value, or maybe saturation component, while the hue, indicating the primary “color” (without its brightness and diluted-ness by white/black) should not change so much.

To convert our image into HSV space we can use OpenCV cvtColor function and pass cv2.COLOR_BGR2HSV enumeration as the second parameter, whereas the first parameter is the image. Like below,

cv2.cvtColor(image, cv2.COLOR_BGR2HSV)

Algorithm

Load the image
Convert the image into HSV color Model.
Identify the min and max values for the blue, green and red channel(threshold values)
Read the image or frame in which we want to detect the object
Convert the image/frame to HSV color model
Extract the mask using our threshold values for the BGR channel.
Find the contour of the object on the mask.
Pick the largest contour.
Draw the bounding box on the image using the contour information.
Display the image/frame.

Now let the code begin!

I will build our application in 2 parts -

First I will build an application to figure out the threshold values for our blue, green, and red channel.
Then I will use these threshold values in our second application to create a mask and try to find where exactly those values are in the image/frame.

Application to Identify Threshold values

We start by importing OpenCV and creating an empty function that does nothing(we will come back to this later)

Then we create a resizable window and give it a name — “Track Bars”.

Now since we need to figure out the min and max values of the blue, green, and red channel of our object, we will create 6 trackbars that we can adjust to figure out the best match.

As we know that the color values can range from 0 to 255, our trackbar has the same range.

The last parameter in the function syntax is onChange function which is called for every change in trackbar, but since we don’t want to do anything on change of positions we have passed an empty function.

Note: The second parameter takes the window name in which OpenCV will show the trackbar.

Now we read the image using cv2.imread and then optionally we can resize the image.

We then convert the resized image into the HSV model as OpenCV, by default, reads the image in the BGR model and then displays both images.

We then start an infinite loop and get the min-max values of the BGR channel by using getTrackbarPos function and passing it the name of the trackbar from which we want to read current value and the window name where it is present.

Once we get the threshold values for our object we create a mask using the inRange function of OpenCV to detect the object by identifying the range of pixels that fall under the given range of threshold values.

We then display the mask image using the imshow function.

Once we have found the correct threshold values we can exit the loop by pressing the ‘q’ key, print the values, and destroy all windows.

Top left: original image Top right: trackbars

Bottom left: HSV image Bottom right: mask image

Object Detection

Now we will create another application where we will use the above threshold values to generate mask and detecting the object in a video.

We start by importing OpenCV at line#1

Then we declare the threshold values(lines 4–9) that we generated from the previous application.

Then at line#4, we open the video using OpenCV VideoCapture function

Then we start a while loop to read each frame of the video and display it back with the detected object.

Line#3 We read a single frame(still image) using the camera.read function.

Line #5–6 we check if any frame was not returned then we simply break out of while loop

Lin# 9 we convert the image into HSV format.

Line#11 we generate the mask using the threshold values we identified.

Line#13 we find the contours that were found using the mask(You can read more about contours here)

Line# 15 Once we have an array of contours we sort that array and pass reverse = True to get them sorted in the largest first fashion.

Lin#17–23 we check if any contours were found in that frame. Then we pick up the largest contour and get the bounding box of that contour using boundingRect function. Once we get the bounding box dimensions we draw a rectangle around those coordinates in the image/frame by using OpenCV rectangle function.

Line#26 Finally, we display the frame/image using imshow function. We show the frame outside the if condition as we want to display the frame even if the contour was not found.

Line# 29–30 we read for any keypress and if the pressed key was ‘q’ we break the while loop

In the end when everything is done we release the camera resource and destroy all open windows.

And there we have it.

Now when we run this code you will get a video with a bounding box drawn around the object just like below.

Thank you for reading my article.

Maneuvering Color Mask into Object Detection

Published in Globant

Written by Khawar Jamil

No responses yet