Introduction to Motion Detection: Part 1
The easiest way to detect motion with OpenCV
Have you ever wondered how motion can be detected in a video? This series covers a few methods that can be used to detect motion and shows how to code each them from scratch in Python. The code for this series is located on GitHub, and the links to the other parts are below (each subsequent part is more advanced and more effective at detecting motion)
- Part 1 — Frame Differencing
- Part 2 — Optical Flow
- Part 3 — Background Subtraction
Introduction
In this post we will explore the easiest way to do this and it only involves basic opencv functions. The approach relies on something called Frame Differencing which is subtracting the current video frame from the previous one and noting the differences, these differences correspond to motion. The outline of the approach is given below, we will go over each of these steps in detail and code them in Python from scratch, the notebook is on GitHub.
- Compute the grayscale Frame Difference
- Threshold the Frame Difference to get the Motion Mask
- Find Contours in the Motion Mask and get their Bounding Boxes
- Perform Non-Maximal Suppression on the detected Bounding Boxes
The main idea is shown below, we threshold the frame difference to get a motion mask which provides rich information about moving objects in the scene. The motion mask, is the essence of all methods in this tutorial series.
Read on to find out how to get the motion mask and use it as a class agnostic object detector. We will do these things from scratch and it will lay the frame work for more advanced motion detectors in later tutorials.
Introduction to Motion Detection
Motion Detection is the process of detecting moving objects within a video sequence, In Computer Vision, it is the process of detecting a pixel-wise change across the video frames. We can use it to discover new objects in the real world and even perform class agnostic object detection which is extremely useful in Geospatial Analysis, Customer Analysis, Surveillance, Autonomy and other related fields.
Frame Differencing
1. Computing the Initial Frame Difference
Frame Differencing is simply subtracting the current image from the previous one in the sequence. A key assumption for this method is that the camera must be stationary, if the camera moves we will have to employ more advanced techniques such as camera motion compensation to accurately determine which objects are moving. A sample of sequential images and their grayscale frame difference is shown below.
Even though the camera is stationary this is not an easy case, their are many shadows in the images that can cause false targets. Also, the size of the moving objects will greatly vary, they will be significantly larger and with greater pixel motion at the bottom of the image. The code snippet below shows how to compute the frame difference, using the subtract function ensures that no underflows or overflows occur.
# convert to grayscale
img1 = cv2.cvtColor(img1_rgb, cv2.COLOR_RGB2GRAY)
img2 = cv2.cvtColor(img2_rgb, cv2.COLOR_RGB2GRAY)
# compute grayscale image difference
grayscale_diff = cv2.subtract(img2, img1)
This small snippet basically gives us all the information we need to detect moving objects! There’s a bit more we have to do though if we want to make an effective object detector, the image below shows why.
It turns out that the frame difference alone is not a good indicator of object motion, this becomes apparent once we scale the image to show the noise. Notice how speckles in the background show up in the frame difference even though they are not moving. These speckles come from image noise and possibly camera vibrations. In the next section we will learn how to remove these speckles and keep the moving objects.
2. Compute the Motion Mask
In order to remove the speckles, we will need to apply a threshold to the frame difference and obtain a binary motion mask. In practice we will use Blurring and Morphological Operations to improve the quality of the motion mask.
def get_mask(frame1, frame2, kernel=np.array((9,9), dtype=np.uint8)):
""" Obtains image mask
Inputs:
frame1 - Grayscale frame at time t
frame2 - Grayscale frame at time t + 1
kernel - (NxN) array for Morphological Operations
Outputs:
mask - Thresholded mask for moving pixels
"""
frame_diff = cv2.subtract(frame2, frame1)
# blur the frame difference
frame_diff = cv2.medianBlur(frame_diff, 3)
mask = cv2.adaptiveThreshold(frame_diff, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
cv2.THRESH_BINARY_INV, 11, 3)
mask = cv2.medianBlur(mask, 3)
# morphological operations
mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)
return mask
# compute motion mask
kernel = np.array((9,9), dtype=np.uint8)
mask = get_mask(img1, img2, kernel)
Notice how the shadows of some vehicles are still being detected, this is because the shadows correspond to a change in pixel intensity between the two frames. We can can not always remove them with the frame differencing approach.
Now we can see that the motion mask has most of the background removed, while the moving cars remain. Let’s see how we can use it to make an object detector.
3. Turning the Motion into Detection
To get bounding box detections, we will find the contours on the motion mask, and then draw a bounding box around them if they are large enough. Most object detectors also have a score or confidence level that usually ranges from 0–1. In this case, we don’t have either, but we will use an augmented score based on the area of each bounding box, where the larger area indicates a higher score. The main assumption for this is that the largest contour (and largest bounding box) will usually correspond to the real object or cluster of objects. While the smaller bounding boxes may correspond to other details of the object such as contours extracted from it’s internal textures or shadows.
def get_contour_detections(mask, thresh=400):
""" Obtains initial proposed detections from contours discoverd on the mask.
Scores are taken as the bbox area, larger is higher.
Inputs:
mask - thresholded image mask
thresh - threshold for contour size
Outputs:
detectons - array of proposed detection bounding boxes and scores [[x1,y1,x2,y2,s]]
"""
# get mask contours
contours, _ = cv2.findContours(mask,
cv2.RETR_EXTERNAL, # cv2.RETR_TREE,
cv2.CHAIN_APPROX_TC89_L1)
detections = []
for cnt in contours:
x,y,w,h = cv2.boundingRect(cnt)
area = w*h
if area > thresh:
detections.append([x,y,x+w,y+h, area])
return np.array(detections)
The resulting detections are drawn over the motion mask below
Now we have another issue! We have multiple false detections in the form of overlapping Bounding Boxes. We will eliminate this with Non-Maximal Suppression (NMS).
4. Non-Maximal Suppression
Object Detectors are not always perfect and sometimes require post processing to improve their performance. Non-Maximal Supression is the process of removing unlikely detections (usually duplicate detections), it is a common part of powerful object detectors such as YOLO. It also shows up in many other aspects of Computer Vision, such as the Canny Edge Detector.
We will use the following algorithm to implement non-maximal supression, this algorithm has been modified from source for our use case.
- Sort detected boxes by score and store them in array
- Remove boxes that are entirely contained within another box
- Start with the largest box
- Compare it with all other boxes
- If IOU exceeds the threshold, remove it
- Repeat for next largest remaining box until complete
If you’re not familiar with Intersection Over Union or IOU, it’s a way of determining how much overlap one bounding box has with another and has a range from 0 to 1. An example of IOU is shown below.
We can implement this NMS Algorithm in Python, the code is shown below (code was modified from source).
def remove_contained_bboxes(boxes):
""" Removes all smaller boxes that are contained within larger boxes.
Requires bboxes to be soirted by area (score)
Inputs:
boxes - array bounding boxes sorted (descending) by area
[[x1,y1,x2,y2]]
Outputs:
keep - indexes of bounding boxes that are not entirely contained
in another box
"""
check_array = np.array([True, True, False, False])
keep = list(range(0, len(boxes)))
for i in keep: # range(0, len(bboxes)):
for j in range(0, len(boxes)):
# check if box j is completely contained in box i
if np.all((np.array(boxes[j]) >= np.array(boxes[i])) == check_array):
try:
keep.remove(j)
except ValueError:
continue
return keep
def non_max_suppression(boxes, scores, threshold=1e-1):
"""
Perform non-max suppression on a set of bounding boxes and corresponding scores.
Inputs:
boxes: a list of bounding boxes in the format [xmin, ymin, xmax, ymax]
scores: a list of corresponding scores
threshold: the IoU (intersection-over-union) threshold for merging bounding boxes
Outputs:
boxes - non-max suppressed boxes
"""
# Sort the boxes by score in descending order
boxes = boxes[np.argsort(scores)[::-1]]
# remove all contained bounding boxes and get ordered index
order = remove_contained_bboxes(boxes)
keep = []
while order:
i = order.pop(0)
keep.append(i)
for j in order:
# Calculate the IoU between the two boxes
intersection = max(0, min(boxes[i][2], boxes[j][2]) - max(boxes[i][0], boxes[j][0])) * \
max(0, min(boxes[i][3], boxes[j][3]) - max(boxes[i][1], boxes[j][1]))
union = (boxes[i][2] - boxes[i][0]) * (boxes[i][3] - boxes[i][1]) + \
(boxes[j][2] - boxes[j][0]) * (boxes[j][3] - boxes[j][1]) - intersection
iou = intersection / union
# Remove boxes with IoU greater than the threshold
if iou > threshold:
order.remove(j)
return boxes[keep]
Now we can run NMS on our detections and obtain clean bounding box results!
# separate bboxes and scores
bboxes = detections[:, :4]
scores = detections[:, -1]
# Get Non-Max Suppressed Bounding Boxes
nms_bboxes = non_max_suppression(bboxes, scores, threshold=0.1)
Creating the pipeline
Now we can put it all together and create a pipeline for motion detection:
def get_detections(frame1, frame2, bbox_thresh=400, nms_thresh=1e-3, mask_kernel=np.array((9,9), dtype=np.uint8)):
""" Main function to get detections via Frame Differencing
Inputs:
frame1 - Grayscale frame at time t
frame2 - Grayscale frame at time t + 1
bbox_thresh - Minimum threshold area for declaring a bounding box
nms_thresh - IOU threshold for computing Non-Maximal Supression
mask_kernel - kernel for morphological operations on motion mask
Outputs:
detections - list with bounding box locations of all detections
bounding boxes are in the form of: (xmin, ymin, xmax, ymax)
"""
# get image mask for moving pixels
mask = get_mask(frame1, frame2, mask_kernel)
# get initially proposed detections from contours
detections = get_contour_detections(mask, bbox_thresh)
# separate bboxes and scores
bboxes = detections[:, :4]
scores = detections[:, -1]
# perform Non-Maximal Supression on initial detections
return non_max_suppression(bboxes, scores, nms_thresh)
Here are some helper functions that will enable us to make a GIF of our results.
from PIL import Image
def draw_bboxes(frame, detections):
for det in detections:
x1,y1,x2,y2 = det
cv2.rectangle(frame, (x1,y1), (x2,y2), (0,255,0), 3)
def create_gif_from_images(save_path : str, image_path : str, ext : str) -> None:
''' creates a GIF from a folder of images
Inputs:
save_path - path to save GIF
image_path - path where images are located
ext - extension of the images
Outputs:
None
'''
ext = ext.replace('.', '')
image_paths = sorted(glob(os.path.join(image_path, f'*.{ext}')))
image_paths.sort(key=lambda f: int(''.join(filter(str.isdigit, f))))
pil_images = [Image.open(im_path) for im_path in image_paths]
pil_images[0].save(save_path, format='GIF', append_images=pil_images,
save_all=True, duration=50, loop=0)
Here’s the code to make the GIF, remember that this will work for any image sequence where the camera is stationary.
for idx in range(1, len(image_paths)):
# read frames
frame1_bgr = cv2.imread(image_paths[idx - 1])
frame2_bgr = cv2.imread(image_paths[idx])
# get detections
detections = get_detections(cv2.cvtColor(frame1_bgr, cv2.COLOR_BGR2GRAY),
cv2.cvtColor(frame2_bgr, cv2.COLOR_BGR2GRAY),
bbox_thresh=400,
nms_thresh=1e-4)
# draw bounding boxes on frame
draw_bboxes(frame2_bgr, detections)
# save image for GIF
fig = plt.figure()
plt.imshow(frame2_bgr)
fig.savefig(f"temp/frame_{idx}.png")
plt.close();
# create GIF
create_gif_from_images('frame_differencing.gif', 'temp', '.png')
Let’s take a look at the results
And there you have it, we have just performed motion detection from Frame Differencing! We notice that there are many small false detections as well as bounding boxes that vary in size for the same clusters of objects. We can even notice that the truck on the right side has multiple detections, generally one for the front and one for the back. This is because there is no corresponding pixel change for the middle part of the truck! Our algorithm thinks that it is two separate objects.
In this image sequence the objects are significantly larger and with greater pixel motion at the bottom of the image. We might be able to use this knowledge to account for this and improve the performance of the detector.
Next Steps
We have learned to how utilize Frame Differencing along with some basic Computer Vision techniques such as Blurring, Morphological Operations, Contour Finding, and Non-Maximal Suppression to detect motion in a video. This is possibly the simplest technique, but it is a stepping stone to more advanced techniques that we will cover in subsequent parts of this tutorial.
- Part 1 — Frame Differencing
- Part 2 — Optical Flow
- Part 3 — Background Subtraction
Thanks for Reading! If you found this useful please consider clapping 👏