Introduction to Motion Detection: Part 1

The easiest way to detect motion with OpenCV

10 min readOct 31, 2023

Have you ever wondered how motion can be detected in a video? This series covers a few methods that can be used to detect motion and shows how to code each them from scratch in Python. The code for this series is located on GitHub, and the links to the other parts are below (each subsequent part is more advanced and more effective at detecting motion)

Introduction

In this post we will explore the easiest way to do this and it only involves basic opencv functions. The approach relies on something called Frame Differencing which is subtracting the current video frame from the previous one and noting the differences, these differences correspond to motion. The outline of the approach is given below, we will go over each of these steps in detail and code them in Python from scratch, the notebook is on GitHub.

Compute the grayscale Frame Difference
Threshold the Frame Difference to get the Motion Mask
Find Contours in the Motion Mask and get their Bounding Boxes
Perform Non-Maximal Suppression on the detected Bounding Boxes

The main idea is shown below, we threshold the frame difference to get a motion mask which provides rich information about moving objects in the scene. The motion mask, is the essence of all methods in this tutorial series.

Figure 1. Left: Original Video Sequence. Right: Motion Mask. Source: Author.

Read on to find out how to get the motion mask and use it as a class agnostic object detector. We will do these things from scratch and it will lay the frame work for more advanced motion detectors in later tutorials.

Introduction to Motion Detection

Motion Detection is the process of detecting moving objects within a video sequence, In Computer Vision, it is the process of detecting a pixel-wise change across the video frames. We can use it to discover new objects in the real world and even perform class agnostic object detection which is extremely useful in Geospatial Analysis, Customer Analysis, Surveillance, Autonomy and other related fields.

Frame Differencing

1. Computing the Initial Frame Difference

Frame Differencing is simply subtracting the current image from the previous one in the sequence. A key assumption for this method is that the camera must be stationary, if the camera moves we will have to employ more advanced techniques such as camera motion compensation to accurately determine which objects are moving. A sample of sequential images and their grayscale frame difference is shown below.

Figure 2. The left and middle are sequential video frames, the right is their gray scale image difference. Source: Author.

Even though the camera is stationary this is not an easy case, their are many shadows in the images that can cause false targets. Also, the size of the moving objects will greatly vary, they will be significantly larger and with greater pixel motion at the bottom of the image. The code snippet below shows how to compute the frame difference, using the subtract function ensures that no underflows or overflows occur.

# convert to grayscale
img1 = cv2.cvtColor(img1_rgb, cv2.COLOR_RGB2GRAY)
img2 = cv2.cvtColor(img2_rgb, cv2.COLOR_RGB2GRAY)

# compute grayscale image difference
grayscale_diff = cv2.subtract(img2, img1)

This small snippet basically gives us all the information we need to detect moving objects! There’s a bit more we have to do though if we want to make an effective object detector, the image below shows why.

Figure 3. Frame Difference scaled by 50 to show the noise. Source: Author.

It turns out that the frame difference alone is not a good indicator of object motion, this becomes apparent once we scale the image to show the noise. Notice how speckles in the background show up in the frame difference even though they are not moving. These speckles come from image noise and possibly camera vibrations. In the next section we will learn how to remove these speckles and keep the moving objects.

2. Compute the Motion Mask

In order to remove the speckles, we will need to apply a threshold to the frame difference and obtain a binary motion mask. In practice we will use Blurring and Morphological Operations to improve the quality of the motion mask.

def get_mask(frame1, frame2, kernel=np.array((9,9), dtype=np.uint8)):
    """ Obtains image mask
        Inputs: 
            frame1 - Grayscale frame at time t
            frame2 - Grayscale frame at time t + 1
            kernel - (NxN) array for Morphological Operations
        Outputs: 
            mask - Thresholded mask for moving pixels
        """

    frame_diff = cv2.subtract(frame2, frame1)

    # blur the frame difference
    frame_diff = cv2.medianBlur(frame_diff, 3)
    
    mask = cv2.adaptiveThreshold(frame_diff, 255, cv2.ADAPTIVE_THRESH_GAUSSIAN_C,\
            cv2.THRESH_BINARY_INV, 11, 3)

    mask = cv2.medianBlur(mask, 3)

    # morphological operations
    mask = cv2.morphologyEx(mask, cv2.MORPH_CLOSE, kernel, iterations=1)

    return mask


# compute motion mask
kernel = np.array((9,9), dtype=np.uint8)
mask = get_mask(img1, img2, kernel)

Figure 4. Computed Motion Mask. Source: Author.

Notice how the shadows of some vehicles are still being detected, this is because the shadows correspond to a change in pixel intensity between the two frames. We can can not always remove them with the frame differencing approach.

Now we can see that the motion mask has most of the background removed, while the moving cars remain. Let’s see how we can use it to make an object detector.

3. Turning the Motion into Detection

To get bounding box detections, we will find the contours on the motion mask, and then draw a bounding box around them if they are large enough. Most object detectors also have a score or confidence level that usually ranges from 0–1. In this case, we don’t have either, but we will use an augmented score based on the area of each bounding box, where the larger area indicates a higher score. The main assumption for this is that the largest contour (and largest bounding box) will usually correspond to the real object or cluster of objects. While the smaller bounding boxes may correspond to other details of the object such as contours extracted from it’s internal textures or shadows.

def get_contour_detections(mask, thresh=400):
    """ Obtains initial proposed detections from contours discoverd on the mask. 
        Scores are taken as the bbox area, larger is higher.
        Inputs:
            mask - thresholded image mask
            thresh - threshold for contour size
        Outputs:
            detectons - array of proposed detection bounding boxes and scores [[x1,y1,x2,y2,s]]
        """
    # get mask contours
    contours, _ = cv2.findContours(mask, 
                                   cv2.RETR_EXTERNAL, # cv2.RETR_TREE, 
                                   cv2.CHAIN_APPROX_TC89_L1)
    detections = []
    for cnt in contours:
        x,y,w,h = cv2.boundingRect(cnt)
        area = w*h
        if area > thresh: 
            detections.append([x,y,x+w,y+h, area])

    return np.array(detections)

The resulting detections are drawn over the motion mask below

Figure 5. Left: Detected Movers drawn on the motion mask. Right: Same as left, but with overlapping Bounding Boxes in green. Source: Author.

Now we have another issue! We have multiple false detections in the form of overlapping Bounding Boxes. We will eliminate this with Non-Maximal Suppression (NMS).

4. Non-Maximal Suppression

Object Detectors are not always perfect and sometimes require post processing to improve their performance. Non-Maximal Supression is the process of removing unlikely detections (usually duplicate detections), it is a common part of powerful object detectors such as YOLO. It also shows up in many other aspects of Computer Vision, such as the Canny Edge Detector.

We will use the following algorithm to implement non-maximal supression, this algorithm has been modified from source for our use case.

Sort detected boxes by score and store them in array
Remove boxes that are entirely contained within another box
Start with the largest box
Compare it with all other boxes
If IOU exceeds the threshold, remove it
Repeat for next largest remaining box until complete

If you’re not familiar with Intersection Over Union or IOU, it’s a way of determining how much overlap one bounding box has with another and has a range from 0 to 1. An example of IOU is shown below.

Figure 6. Example of Intersection Over union. Source.

We can implement this NMS Algorithm in Python, the code is shown below (code was modified from source).

def remove_contained_bboxes(boxes):
    """ Removes all smaller boxes that are contained within larger boxes.
        Requires bboxes to be soirted by area (score)
        Inputs:
            boxes - array bounding boxes sorted (descending) by area 
                    [[x1,y1,x2,y2]]
        Outputs:
            keep - indexes of bounding boxes that are not entirely contained 
                   in another box
        """
    check_array = np.array([True, True, False, False])
    keep = list(range(0, len(boxes)))
    for i in keep: # range(0, len(bboxes)):
        for j in range(0, len(boxes)):
            # check if box j is completely contained in box i
            if np.all((np.array(boxes[j]) >= np.array(boxes[i])) == check_array):
                try:
                    keep.remove(j)
                except ValueError:
                    continue
    return keep


def non_max_suppression(boxes, scores, threshold=1e-1):
    """
    Perform non-max suppression on a set of bounding boxes and corresponding scores.
    Inputs:
        boxes: a list of bounding boxes in the format [xmin, ymin, xmax, ymax]
        scores: a list of corresponding scores 
        threshold: the IoU (intersection-over-union) threshold for merging bounding boxes
    Outputs:
        boxes - non-max suppressed boxes
    """
    # Sort the boxes by score in descending order
    boxes = boxes[np.argsort(scores)[::-1]]

    # remove all contained bounding boxes and get ordered index
    order = remove_contained_bboxes(boxes)

    keep = []
    while order:
        i = order.pop(0)
        keep.append(i)
        for j in order:
            # Calculate the IoU between the two boxes
            intersection = max(0, min(boxes[i][2], boxes[j][2]) - max(boxes[i][0], boxes[j][0])) * \
                           max(0, min(boxes[i][3], boxes[j][3]) - max(boxes[i][1], boxes[j][1]))
            union = (boxes[i][2] - boxes[i][0]) * (boxes[i][3] - boxes[i][1]) + \
                    (boxes[j][2] - boxes[j][0]) * (boxes[j][3] - boxes[j][1]) - intersection
            iou = intersection / union

            # Remove boxes with IoU greater than the threshold
            if iou > threshold:
                order.remove(j)
                
    return boxes[keep]

Now we can run NMS on our detections and obtain clean bounding box results!

# separate bboxes and scores
bboxes = detections[:, :4]
scores = detections[:, -1]

# Get Non-Max Suppressed Bounding Boxes
nms_bboxes = non_max_suppression(bboxes, scores, threshold=0.1)

Figure 7. Bounding Boxes after Non-Maximal Suppression. Source: Author.

Creating the pipeline

Now we can put it all together and create a pipeline for motion detection:

def get_detections(frame1, frame2, bbox_thresh=400, nms_thresh=1e-3, mask_kernel=np.array((9,9), dtype=np.uint8)):
    """ Main function to get detections via Frame Differencing
        Inputs:
            frame1 - Grayscale frame at time t
            frame2 - Grayscale frame at time t + 1
            bbox_thresh - Minimum threshold area for declaring a bounding box 
            nms_thresh - IOU threshold for computing Non-Maximal Supression
            mask_kernel - kernel for morphological operations on motion mask
        Outputs:
            detections - list with bounding box locations of all detections
                bounding boxes are in the form of: (xmin, ymin, xmax, ymax)
        """

    # get image mask for moving pixels
    mask = get_mask(frame1, frame2, mask_kernel)

    # get initially proposed detections from contours
    detections = get_contour_detections(mask, bbox_thresh)

    # separate bboxes and scores
    bboxes = detections[:, :4]
    scores = detections[:, -1]

    # perform Non-Maximal Supression on initial detections
    return non_max_suppression(bboxes, scores, nms_thresh)

Here are some helper functions that will enable us to make a GIF of our results.

from PIL import Image

def draw_bboxes(frame, detections):
    for det in detections:
        x1,y1,x2,y2 = det
        cv2.rectangle(frame, (x1,y1), (x2,y2), (0,255,0), 3)

def create_gif_from_images(save_path : str, image_path : str, ext : str) -> None:
    ''' creates a GIF from a folder of images
        Inputs:
            save_path - path to save GIF
            image_path - path where images are located
            ext - extension of the images
        Outputs:
            None
    '''
    ext = ext.replace('.', '')
    image_paths = sorted(glob(os.path.join(image_path, f'*.{ext}')))
    image_paths.sort(key=lambda f: int(''.join(filter(str.isdigit, f))))
    pil_images = [Image.open(im_path) for im_path in image_paths]

    pil_images[0].save(save_path, format='GIF', append_images=pil_images,
                      save_all=True, duration=50, loop=0)

Here’s the code to make the GIF, remember that this will work for any image sequence where the camera is stationary.

for idx in range(1, len(image_paths)):
    # read frames
    frame1_bgr = cv2.imread(image_paths[idx - 1])
    frame2_bgr = cv2.imread(image_paths[idx])

    # get detections
    detections = get_detections(cv2.cvtColor(frame1_bgr, cv2.COLOR_BGR2GRAY), 
                                cv2.cvtColor(frame2_bgr, cv2.COLOR_BGR2GRAY), 
                                bbox_thresh=400,
                                nms_thresh=1e-4)
                                
    # draw bounding boxes on frame
    draw_bboxes(frame2_bgr, detections)
    
    # save image for GIF
    fig = plt.figure()
    plt.imshow(frame2_bgr)
    fig.savefig(f"temp/frame_{idx}.png")
    plt.close();


# create GIF
create_gif_from_images('frame_differencing.gif', 'temp', '.png')

Let’s take a look at the results

Figure 8. Motion Detection from Frame Differencing. Source: Author.

And there you have it, we have just performed motion detection from Frame Differencing! We notice that there are many small false detections as well as bounding boxes that vary in size for the same clusters of objects. We can even notice that the truck on the right side has multiple detections, generally one for the front and one for the back. This is because there is no corresponding pixel change for the middle part of the truck! Our algorithm thinks that it is two separate objects.

In this image sequence the objects are significantly larger and with greater pixel motion at the bottom of the image. We might be able to use this knowledge to account for this and improve the performance of the detector.

Next Steps

We have learned to how utilize Frame Differencing along with some basic Computer Vision techniques such as Blurring, Morphological Operations, Contour Finding, and Non-Maximal Suppression to detect motion in a video. This is possibly the simplest technique, but it is a stepping stone to more advanced techniques that we will cover in subsequent parts of this tutorial.

Thanks for Reading! If you found this useful please consider clapping 👏and following for more

Ready to join the cutting edge?

Receive Private Daily Emails

Receive my Private Daily Emails, and learn to build self-driving cars, autonomous drones, robotics, and advanced…

t.dripemail2.com