TLDR; here is the demo.
Before jumping into coding, let’s review the terminology as well as the technology we are going to use in our application.
So what is ml5.js? I hope it sounds familiar to you since I already wrote an article about it a while ago. You can read about it here. There, I cover how we use ml5.js to train model classifying images. In a nutshell, ml5.js allows non-machine learning guys to do machine learning tasks easily.
For example, to draw a canvas with blue background and boxes in it, you can do the following
and here is the result
Object detection is a computer vision technique that works to identify and locate objects within an image or video. It identifies multiple objects as well as their locations by drawing bounding boxes around the detected ones. There are two pre-trained models within the ml5 objection detection method we can use. They are YOLO and COCO-SSD. Today we will use COCO-SSD.
There are two terms in this method — COCO and SSD. First, COCO stands for Common Object in Context. It is large-scale object detection, segmentation, and captioning dataset. Next, SSD stands for Single-Shot Detector. It is the machine learning’s object detection technique proposed by Wei Liu et al. You can read more detail about the method here. So, COCO-SSD is a pre-trained model built from using the SSD technique trained with the COCO dataset. It could detect 80 different classes of objects. The classes are person, bicycle, car, cat, dog, bottle, etc.
First, let’s create index.html file
We insert the p5 and ml5 libraries in the head tag. At this time of writing, we are using version 1.4.0 and 0.7.1 respectively. In the body tag, we simply create two buttons to toggle video and detect status.
Next, let’s write app.js
Starting with declaring global variables, grabbing button elements, and setting the cursor to “wait” mode. We will change the cursor mode to the default one later once the model and video element are ready.
We create preload and setup functions. In p5 library, the preload() function got executed first when the program starts. Here, we create the detector object with cocossd as the model. Then we create a canvas element with a video element getting the source from a webcam in the setup() function. The function got executed right after the preload() function finish. On line 17, we listen to the video’s loadeddata event and set the cursor back to normal mode when it’s ready.
video is the p5 element. video.elt is the HTML element. All normal HTML methods can be called.
When the program start, there are three big blocks —the buttons, the canvas and the video element. The canvas is where we draw the result of the detected object and video is what we see in the webcam. The toggleVideo function toggles the video visibility and sets the button caption accordingly. Similarly, the toggleDetecting function toggles the detection object status whether to start or stop the detecting object from the webcam. On line 16, we invoke detect() function to start detecting. detect function of ml5’s detector object accepts 2 arguments. First is the video element input and the callback function — onDetected function. On line 30, we create the onDetected function. it is the error-first function. The first parameter is the error object if there are any and the second one is the result of detected objects. We then store the result to global variable detections to draw the result later (in the draw() function). Then we re-call detect() function again to keep on detecting.
Finally, let’s draw the result
As in the cover picture of this blog, we would like to draw bounding boxes on all detected objects. This can be done with draw() function. In the p5 library, the draw() function will run repeatedly. Taking this advantage, we check if there are any detected objects stored in global variable detections, we then invoke drawResult() function passing each detected object as an argument. Here is an example of a detected object result.
We have a label from one of the 80 classes, followed by a confidence value of how sure the model knows it is as an object and the coordinate of the detected object in the picture or video frame. We then create drawBoundingBox() and drawBoundingBox() functions to draw the bounding box and label of the detected object accordingly.
All the source code can be found here — https://github.com/yong-asial/ml5-object-detection
In this tutorial, we’ve seen how to use ml5.js and p5.js library to detect multiple objects from the computer web camera. We can do all of these easily with a big thanks to the team behind ml5.js, p5.js, COCO, and SSD. Kudos to them!