DIY Doorbell face recognition with ZoneMinder

Published in

zmNinja

12 min readApr 7, 2019

Note: Early 2020: I’ve switched to the EZVIZ DB1. I find it to be a better built door bell, and it doesn’t lock up under the same conditions as dbell.

Note: Oct 8, 2019: A public CVE was recently filed here about certain vulnerabilities related to an old version of dbell. As far as I know, this does not affect recent dbell systems like the HDx2 which I have. The vulnerabilties reported by the CVE disclosure don’t apply to this model — as I had done similar tests when I first bought it and repeated it after the CVE was published. Note that I don’t have a smart relay for a door lock. I did the vulnerability tests on the login page. That being said, I don’t trust any small company with security unless they are very specialized. I’ve blocked dbell’s P2P system on my firewall a long time ago and don’t allow any external traffic to hit the bell. I don’t use their push app too)

Note Sep 2019: The dbell can be affected by overheating if you are always streaming. My bell faces the sun and ZM is constantly streaming from it. During summer (May — Aug), the combination of the sun and streaming overheats the Application Processor (AP) and it shuts down every few days. The bell works (not part of the AP), but not the stream. The problem of course, is I can’t randomly change the bell position. Most bells are hardwired for specific locations at homes.

For those who have been following my work, I developed a machine learning server for ZoneMinder. A lot of users have since installed it and have posted many interesting object detection use-cases. Some time ago, I added Face Recognition to it as well (full chops to Adam Geitgey’s face recognition module to make it so easy to use) but had a predicament — all my (8+) cameras in my house are overhead cameras and Adam’s library (actually Dlib’s face recognition library that Adam uses) doesn’t quite work with overhead faces (face landmarks are not found).

The Quest for a Door Bell Camera

So I embarked on finding a doorbell camera. The requirements were:

Needs to have RTSP (un-encrypted) and always streaming (we need this for ZoneMinder to read from it)
Should NOT stream via their cloud. I don’t want my home images going through anyone’s cloud that I cannot audit.
Needs to be able to run off existing 2 wire doorbell systems

The 2nd requirement filters out many vendors (Nest, Ring, etc.) while the first one filters out even more. Long story short, I zeroed in on two:

DoorBird, a German company
dbell, a Canadian company

I reached out to both, letting them know I am a ZoneMinder developer and wanted to use their bell so I could hook my system up. DoorBird responded within 24 hours and told me about how I could sign up to get a demo unit at 50% of their cost. This was welcome, because their bell is $350! I wrote back to DoorBird about how I go about doing that, but it took them a while to respond, so in the meantime, I contacted dbell via email.

I did not receive a response from dbell, but I called them up, and we chatted for quite a while. They got very interested in what I wanted to do. So I ordered a system from dbell first (their cost is less than half of DoorBird). dbell also includes a chime while DoorBird does not include any chimes. DoorBird states on their website that a chime costs around $150. That’s $500 retail (bell+chime). My guess is DoorBird primary market is commercial/enterprise and the consumer market is probably not of top priority to them.

All of that being said, In general I applaud companies who let their doorbell cameras work with personal NVRs.

dbell — Setting it all up

This is what dbell looks like. The shade is optional. Hardware setup was easy. They had a video explaining it.I had it done in around 15 minutes (which includes disabling my existing chimes). However, the build could be a little more solid. I broke a clip while trying to install it and hacked the back with some new screws and super glue. Software configuration wasn’t as convenient (more on this later), but I got it done.

Press here to ask for Canadian Maple Whiskey

Setting it up in ZoneMinder was easy. Their FAQ already had the RTSP URL to use

That’s it. ZM/zmNinja was streaming its video, easy peasy.

Setting up for Face Recognition

Now that I had streaming working, it was time to set up so that my app would receive notifications when people or known faces came by. People detection is very easily done. As you know, I use YoloV3 which does an amazing job figuring out people at different angles. I mean heck, it even detects a limb as a person

Challenge #1: Detection — Timing is everything, when height is your enemy

(Updated Sep 2019: I finally paid a handyman to move my bell up by a feet. Well worth it)

The nice part with dbell is they included a “wedge” in their box that lets you install it at 30 degrees left or right. This is very useful.

However, I soon realized that if an adult walked up and pressed my bell, his/her face would be out of range of the camera, like so:

I first thought this was a problem with dbell’s camera but then I realized Ring,Nest,DoorBird may have the same problem. All of them recommend a minimum of 4 feet from the ground. I believe mine is at around 3.5 feet, which is an average height in North American homes (I’ll have to measure and confirm — will update).

So I had to devise a way to use ZM zones to grab an image before they hit my doorbell call button.

The challenge, however was to be able to analyze the right frame with a face clearly visible. As you may know, I am not analyzing streaming video, due to performance reasons. So I need to make sure I get the best possible frame to analyze.

I had to experiment with ZM zones and EVENT_STATS and came up with this:

This setting would trigger an alarm frame just about when a person was almost at my doorstep, like this:

Sidenote: Apparently, dbell has plans to work on a wider angle camera — if they do, it will make life simpler for me, for sure

Challenge #2: Actually detecting a face

The face recognition library has 2 parts:

Find a face
Match that face to existing face encodings (or mark it as unknown)

The problem I next faced was that since the bell height wasn’t my friend, how do we ensure dlib finds a face? The default hog mode is good for well lit front faces. Even when the frame I grabbed had a reasonable face, the detection of a face would largely fail, due to distance, angle or shadows (backlit sun). The solution was to switch from hog to cnn for face detection. No amount of upscaling or jitter tweaks worked with hog

Challenge #3: Face recognition performance was terrible with cnn

I am running ZM on a Xeon 3.6GHz/4core , 32GB RAM machine. It is reasonably old and not meant for graphics computing applications (no GPU too).

Changing face detection to cnn resulted in a 20 second detection timeframe (for a 600px wide image). hog mode was 0.4s. Holy cow.
Yolo (object detection) was around 2–3s which was acceptable to me because I was not doing realtime video. It is only after ZoneMinder detects an event (via pixel based motion detection) that my detection module kicks in. And I am only analyzing the “alarmed” frame or the “max score” frame (ZM terminology here. Alarmed frame = the frame that started the alarm. Max score frame = frame that matched your zone match criteria the closest)

Fortunately, I realized I never really did set up openblas — my mistake. It’s a must have.

That was an easy fix:

sudo -H pip uninstall dlib
sudo -H pip uninstall face-recognition
 sudo apt-get install libopenblas-dev liblapack-dev libblas-dev # this is the important part
 sudo -H pip install dlib --verbose --no-cache-dir # make sure it finds openblas
sudo -H pip install face_recognition

And boom:

Face detection went from 20 seconds to an average of 4–6 seconds.
Yolo improved too (but that was never my concern, it was reasonable enough)

Challenge #4: For the drop and run

A lot of delivery folks don’t walk up. They drop stuff quickly and walk away. This wasn’t really a challenge because my system allows me to chain models.

Easily solved with this setting for my doorbell:

models=face,yolo

With that, if face detection fails, person detection takes over. Which is very useful for such images:

Our USPS guy is great. Always delivers with care, never throws (hint: UPS)

Future Challenge: More real time performance

You do need to add a GPU. I don’t have real time performance needs today. But I may tomorrow, or later today ;-)

You don’t need to blow mega-bucks. I asked around and you can spend as less as $60 to get 10FPS with cnn face-recognition or some more to get 4x the performance. Two cards that were recommended:

GTX 770 4GB (around $50–$60 in eBay)
GTX 1050 Ti 4GB (around $180 in Amz, < $120 in eBay)

Both of these cards are CUDA compatible and Dlib will support them. You will have to install cuda & re-install dlib though, which is easy.

Tweaks to my machine learning server

Now that I actually had a good use-case for face recognition, I tweaked my machine learning event server with some optimizations. First, here is my config section for my doorbell (Monitor 13)

[monitor-13]
#doorbell
detect_pattern=(person)
# try face, if it works, don't do yolo
detection_mode=first
models=face,yolo
frame_id=bestmatch
resize=600
face_model=cnn#if you hard code a frame, you need to make sure it is created
#before we acess it. wait (sec) helps
#frame_id=32
#wait=3

My doorbell is configured to try face recognition first, then yolo
I added detection_mode — when set to first, it will stop at the first successful algorithm. So, if face recognition succeeds in finding a face, it won’t waste expensive CPU cycles on person detection.
Next up,I added a wait clause. This makes the system wait a few seconds before it downloads a frame. This is useful if you hardcoded your frame ID and ZM hasn’t yet written that frame to disk (obviously, if you hardcode frames, you need to enable Frames storage in ZM, not just video).
Finally, I added an optimization to the face recognition model (credit). Loading an image and creating a face encoding takes times. So instead,I load it and dump the encoding to disk instead of trying to find encodings each time. This saves a lot of time, especially when images add up (I shaved off 2 seconds just with this)

Results: The Proof of the Pudding is in … Detecting it?

I’ve been very satisfied with the results:

As you see above, my DoorBell (dbell) camera detected me when I walked up. dbell’s app also rang my phone.

So overall, very cool.

Night Time IR

My doorbell camera has another IR camera in its field of view which makes IR detection a bigger challenge (the IR bulbs add to flare). That coupled with dbell’s IR performance makes face detection impossible at night. But that is where the Yolo fallback works well. Person detection to the rescue:

Should you buy dbell or XYZ?

dbell is a startup and they are trying hard to survive in a space where big companies are flooding the market. I’d like to support them. I found them interactive, responsive and engaged. Again, this was my experience, having reached out to them as a developer doing interesting things.

I’m not recommending one over the other (and nor have I tried any product besides dbell). dbell is an interesting product at the right price.

This post is about Doorbell+ZM+Face recognition integration. It’s not an endorsement for any one bell product

That being said, as of today, their product has some areas of improvement according to me:

The camera angle of view isn’t wide enough for a 3.5 ft doorbell as I wrote earlier. This is not unique to them. Even DoorBird recommends a min of 4.5ft. I moved my bell location to 4.5ft recently (Sep 2019) and now faces are detected, pretty much 99% of times
Their IOS app is buggy (crashes every once in a while, but starts up again. Fortunately, their app is not something I need, but I can imagine many others will want their app).
The microphone/speaker output is not great. dbell later told me their HW was cell phone quality. Basically when we tested calls they were choppy and delayed and often hard to understand often (I have excellent wifi coverage) so maybe it was an app issue and not HW. We tried this at least 20 times, so I know its not a ‘once in a while issue’. Not sure what audio stack they are using, but in general, I’ve used WebRTC native stack in the past and performance has been good.
IR quality is average (My Dahua and HikVision camera IRs are better)
Their camera setup using the new app was not convenient at least for me— it needs you to be on 2.4GHz wifi. I mean the phone which you use to discover dbell. With modern systems like Google wifi mesh (which I have), you can’t force connect to 2.4GHz if your phone supports 2.4 AND 5 GHz. Neither my iPhone, nor my Motorola Android device had a way to latch to 2.4GHz either (some phones do). So I had to walk away from my house to disconnect from 5GHz and latch on 2.4GHz (5GHz range is less) and then setup was easy. Several other cameras also operate on 2.4GHz but dbell’s setup needed my phone to also be on 2.4G to discover the dbell. That was the inconvenient part. I’d have preferred an alternate discovery method/fallback. If you have a router with selectable bands you won’t have this issue.

On the positive side:

Their system was at $149 when I bought it, and dbell threw in an extra chime/extender. So a bell, 2 chimes+1 extender. Nice. Compare this to DoorBird, that is less than half and DoorBird doesn’t add a chime at that price!
Zoneminder integration was a breeze
The doorbell seems to work reliably so far and several settings (chroma/contrast/frame rate etc.) are customizable without needing IE activeX (all these cameras use a common Chinese OEM camera SDK)
The quality of the HDx2 Live camera is decent
dbell seems to be invested/interested in what people are doing with it. I contacted them to provide an HTTP hook when the bell is pressed — so you can start recording/integrate it with other automation suites etc. and they did. Very cool.

Conclusion

It’s been a lot of fun hacking a real life solution together. I can confidently say Dlib’s face recognition + YoloV3 does a far better job than most in-camera face recognition/object detection systems I’ve seen.

Where to next?

I hope dbell releases a version with a wider camera view
dbell has promised me a hook for when the bell is pressed — that will be useful to me. I can use that time to process pre/post frames.
I’d like to try DoorBird, to be honest, but given their bell + chime goes way above $300 (including the deep partner discount), it is not encouraging for any consumer, let alone a developer. That being said DoorBird specs are nice (but even their camera might have the same issue of minimum height)

DIY Doorbell face recognition with ZoneMinder

Written by oZoneDev