Stream and recognise people from a webcam with Go and Facebox

David Hernandez
Machine Box
Published in
5 min readAug 8, 2017

From intrusion detection, to open a gate if you automatic recognise the person that is in front of the gate, with some lines of Python, Go, and Facebox you can do it in a few lines of code.

First we are going to capture the stream from a webcam. There are multiple options in Go to do that, but many of them requires the use of CGO binding and OpenCV, and the support is quite limited, and sometimes cumbersome. For that reason and in the spirit of using the right tool for the job, we are going to use good old Python for that task.

Easily enough, in Python we can use OpenCV to stream from the webcam a Motion JPEG over the standard output.

Motion JPEG (M-JPEG) sounds really cool and fancy, but the reality is just a concatenation of all the frames as JPEG using a separator (a boundary) that you can choose. Yes! basically is like a CSV for video, is very simple, at the cost of being heavy, because is not using video compression.

M-JPEG is used by many IP Cameras, and Digital Cameras, so you may have a device that already talks this protocol, and you can use the same strategy to streaming it.

Here is the code from capture.py where we read from the camera frame by frame, and we stream it to the stdout .

#!/usr/bin/env python
import cv2
import imutils
from imutils.video import VideoStream
import time, sys
vs = VideoStream(resolution=(320, 240)).start()
time.sleep(1.0)
while(True): #read frame by frame the webcame stream
frame = vs.read()

# encode as a JPEG
res = bytearray(cv2.imencode(".jpeg", frame)[1])
size = str(len(res))
# stream to the stdout
sys.stdout.write("Content-Type: image/jpeg\r\n")
sys.stdout.write("Content-Length: " + size + "\r\n\r\n")
sys.stdout.write( res )
sys.stdout.write("\r\n")
# we use 'informs' as a boundary
sys.stdout.write("--informs\r\n")
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cv2.destroyAllWindows()
vs.stop()

Streaming to a Go http server.

Once we have the stream on the stdout we can use pipes to move the stream around, and in Go that is quite easy. Let’s make an http server to handle the stream from the webcam to the browser.

const boundary = "informs"func main() {
http.HandleFunc("/cam", cam)
log.Fatal(http.ListenAndServe(":8081", nil))
}
func cam(w http.ResponseWriter, r *http.Request){
// set the multipart header
w.Header().Set("Content-Type", "multipart/x-mixed-replace; boundary="+boundary)
// execute capture.py with the context
cmd := exec.CommandContext(r.Context(), "./capture.py")
// connect the stdout from capture to response writer
cmd.Stdout = w

err := cmd.Run()
if err != nil {
log.Println("[ERROR] capturing webcam", err)
}
}

With this handler when you hit http://localhost:8081/cam it would start a new process and it would stream on real time the webcam to the browser.

Notice the use of context.Context that it will handle the start and the cancelation of the process, when the http request is cancel the process will be terminated

Process the stream and recognize people with Facebox

Now that we have the streaming in Go code, we can read it and process it, if is necessary.

The idea is on a http handler, start the command capture.py like we did before and use a pipe to read the stream, frame by frame from the stdout .

cmd := exec.CommandContext(r.Context(), "./capture.py")// pipe the stdout 
stdout, err := cmd.StdoutPipe()
if err != nil {
log.Println("[ERROR] Getting the stdout pipe")
return
}
cmd.Start()

Now we can use a multipart.Reader to read a frame and put it on memory, so we can read it or write it as many time as we want.

// use a multipart reader to read frame by frame 
mr := multipart.NewReader(stdout, boundary)
for {
p, err := mr.NextPart()
if err == io.EOF {
break
}
if err != nil {
log.Println("[ERROR] reading next part", err)
return
}

jp, err := ioutil.ReadAll(p)
if err != nil {
log.Println("[ERROR] reading from bytes ", err)
continue
}
...

With the frame in memory we can use Facebox SDK to call check on your running Facebox instance and do the face detection and face recognition.

      jpReader := bytes.NewReader(jp)      // check if in the frame are people that I know
faces, err := faceboxClient.Check(jpReader)
if err != nil {
log.Println("[ERROR] calling facebox", err)
continue
}
// for every person do an action, like open a gate
for _, face := range faces {
if face.Matched {
fmt.Println("I know you ", face.Name)
} else {
fmt.Println("I DO NOT know you ")
}
}

Once we process the frame we can write it, again to the response, but bare in mind that doing everything in the same Go routine it would make the video lag, because face recognition is CPU intensive. If you want to keep the video streaming on realtime, one possible solution is use a new go routine to analyse the frames, we leave that as an exercise for the reader.

Here the complete code of the camFaceboxhandler, as a reference.

func camFacebox(w http.ResponseWriter, r *http.Request) {
w.Header().Set("Content-Type", "multipart/x-mixed-replace; boundary="+boundary)
cmd := exec.CommandContext(r.Context(), "./capture.py") // pipe the stdout
stdout, err := cmd.StdoutPipe()
if err != nil {
log.Println("[ERROR] Getting the stdout pipe")
return
}
cmd.Start()
// use a multipart read to read frame by frame
mr := multipart.NewReader(stdout, boundary)
for {
p, err := mr.NextPart()
if err == io.EOF {
break
}
if err != nil {
log.Println("[ERROR] reading next part", err)
return
}

jp, err := ioutil.ReadAll(p)
if err != nil {
log.Println("[ERROR] reading from bytes ", err)
continue
}
jpReader := bytes.NewReader(jp) // check if in the frame are people that I know
faces, err := faceboxClient.Check(jpReader)
if err != nil {
log.Println("[ERROR] calling facebox", err)
continue
}
// for every person do an action
for _, face := range faces {
if face.Matched {
fmt.Println("I know you ", face.Name)
} else {
fmt.Println("I DO NOT know you ")
}
}
// just streaming MJPEG
w.Write([]byte("Content-Type: image/jpeg\r\n"))
w.Write([]byte("Content-Length: " + string(len(jp)) + "\r\n"))
w.Write(jp)
w.Write([]byte("\r\n"))
w.Write([]byte("--informs\r\n"))
}
cmd.Wait()
}

Now if you hit http://localhost:8081/camFacebox you not only are going to stream the content (with some lag) to the browser, you are going to recognise the people appearing on the video, if you did Facebox teaching before.

Conclusion

You can use Go and Python combine with some pipes to have a pretty good video streaming service, and we a little bit more of effort you can include face recognition on your processing pipeline.

--

--

David Hernandez
Machine Box

@dahernan Machine Learning and Go. Making @machineboxio