Build a Face Recognition System for $60 with the New Nvidia Jetson Nano 2GB and Python

Using Python 3.6, OpenCV, Dlib and the face_recognition module

17 min readOct 5, 2020

The new Nvidia Jetson Nano 2GB dev board (announced today!) is a single-board computer that goes for $59 and runs AI software with GPU-acceleration.

With face recognition and python, you can easily track everyone who creeps up to your door.

The kind performance you can get out of a $59 single-board computer in 2020 is kind of amazing. Let’s use it to create a simple version of a doorbell camera that tracks everyone that walks up to the front door of your house. With face recognition, it will instantly know whether the person at your door has ever visited you before — even if they were dressed differently.

What is the Nvidia Jetson Nano 2GB?

The Jetson Nano 2GB is a single-board computer with a quad-core 1.4ghz ARM CPU and an Nvidia Maxwell GPU built-in. It’s the cheapest Nvidia Jetson model and aimed at the same kind of hobbyists who would be buying a Raspberry Pi.

The Nvidia Jetson Nano 2GB is similar to a Raspberry Pi — it is a Linux computer on a single board. But it has a 128-core Nvidia GPU for accelerating deep learning models and it supports CUDA.

If you are already familiar with the Raspberry Pi product line, this is the exact same idea except the Jetson Nano has an Nvidia GPU on board. It can run GPU-accelerated applications (like deep learning models) far more quickly than a board like Raspberry Pi which doesn’t have a GPU that is supported by most deep learning frameworks.

There are a lot of AI dev boards and accelerator modules out there, but Nvidia has one big advantage over them — it is directly compatible with desktop AI libraries and it doesn’t require you to covert your deep learning models into any special formats to run them. It uses the same CUDA libraries for GPU acceleration that almost every Python-based deep learning framework already uses. This means that you can take an existing Python-based deep learning program and run it on the Jetson Nano 2GB with little-to-no modification and still get good performance (as long as your application can run with 2GB of RAM). The ability to take the exact same Python code you wrote for a beefy server and deploy it on a $59 stand-alone device is pretty great.

This new Jetson Nano 2GB board also feels a little more polished than the previous hardware releases from Nvidia. The first Jetson Nano model inexplicably lacked WiFi, but this one comes with a plug-in WiFi module so you don’t have to mess with ethernet cables. They’ve also upgraded the power input to a more modern USB-C port and on the software side, some of the rough edges have been sanded off. You don’t need to hack around to do basic things like enable a swap file, for example.

Shipping a simple-to-use hardware device with a real GPU for under $60 is an aggressive move from Nvidia. It seems like they are aiming right at Raspberry Pi with this and trying to take over the educational / hobbyist market. It will be interesting to see how the market responds.

Let’s Assemble Our System

With any hardware project, the first step is to collect all the parts that we’ll need:

1. Nvidia Jetson Nano 2GB board ($59 USD)

The boards are currently available for pre-order (as of Oct. 5th, 2020) and are expected to be released at the end of October. I don’t know what the initial availability will be like after release, but the previous Jetson Nano model was in short supply for months after release.

Full disclosure: I got a Jetson Nano 2GB dev board for free from as a review unit from Nvidia, but I have no financial or editorial relationship with Nvidia. That’s how I was able to write this guide ahead of time.

2. USB-C power adapter (You probably already have one?)

The new Jetson Nano 2GB uses USB-C for power. A power adapter isn’t included, but you probably already have one sitting around.

3. A Camera — Either a USB webcam (You probably have one?) or Raspberry Pi Camera Module v2.x (~$30 USD)

If you want a small camera to mount in a case, the Raspberry Pi Camera module v2.x is a great choice (Note: the v1.x camera module won’t work). You can get them on Amazon or from various resellers.

Some USB webcams like Logitech’s C270 or C920 also work fine with the Jetson Nano 2GB, so that’s a great option if you already have one. Here’s an incomplete list of USB cameras that should work.

Don’t be afraid to try out whatever USB devices that you have lying around before buying something new. Not everything will have Linux driver support, but some will. I plugged in a generic $20 HDMI-to-USB adapter I got on Amazon and it worked perfectly. So I was able to use my high-end digital camera as a video source over HDMI without any extra configuration. Neat!

There are also a few other things that you will need but you probably already have them sitting around:

A microSD card with at least 32GB of space. We’ll install Linux on this. You can reuse whatever microSD card you have sitting around.
A microSD card reader for your computer so that you can install the Jetson software.
A wired USB keyboard and a wired USB mouse to control the Jetson Nano.
Any monitor or TV that accepts HDMI directly (not via an HDMI-to-DVI converter) so you can see what you are doing. You need a monitor for the initial Jetson Nano setup even if you run it without a monitor later.

Loading the Jetson Nano 2GB Software

Before you start plugging things into the Jetson Nano, you need to download the software image for the Jetson Nano. Nvidia’s default software image includes Ubuntu Linux 18.04 with Python 3.6 and OpenCV pre-installed.

Here’s how to get the Jetson Nano software onto your SD card:

Download the Jetson Nano Developer Kit SD Card Image from Nvidia.
Download Etcher, the program that writes the Jetson software image to your SD card.
Run Etcher and use it to write the Jetson Nano Developer Kit SD Card Image that you downloaded to your SD card. This takes about 20 minutes or so.

Time to unbox the rest of the hardware!

Plugging Everything In

First, take your Jetson Nano 2GB out of the box:

The first step is inserting the microSD card. The microSD card slot is well hidden, but you can find it on the rear side under the bottom of the heatsink:

You should also go ahead and plug in the included USB WiFi adapter into one of the USB ports:

Next, you need to plug in your camera.

If you are using a Raspberry Pi v2.x camera module, it connects with a ribbon cable. Find the ribbon cable slot on the Jetson, pop up the connector, insert the cable, and pop it back closed. Make sure the metal contacts on the ribbon cable are facing inwards toward the heatsink:

If you are using a USB webcam instead, just plug it into one of the USB ports and ignore the ribbon cable port.

Now, plug in everything else:

Plug in a mouse and keyboard into the USB ports.
Plug in a monitor using an HDMI cable.
Finally, plug in the USB-C power cord to boot it up.

If you are using a Raspberry Pi camera module, you’ll end up with something that looks like this:

Or if you are using a USB video input device, it will look something like this:

The Jetson Nano will automatically boot up when you plug in the power cable. You should see a Linux setup screen appear on your monitor after a few seconds. Follow the steps to create your account and connect to WiFi. It’s very straightforward.

Installing Required Linux and Python Libraries for Face Recognition

Once you finish the initial Linux set-up, you need to install several libraries that we’ll be using in our face recognition system.

From the Jetson Nano desktop, open up a LXTerminal window and run the following commands. Any time it asks for your password, type in the same password that you entered when you created your user account:

sudo apt-get updatesudo apt-get install python3-pip cmake libopenblas-dev liblapack-dev libjpeg-dev

First, we are updating apt, which is the standard Linux software installation tool that we’ll use to install the other system libraries that we need. Then, we are installing a few linux libraries that our software needs that aren’t pre-installed.

Last, we need to install the face_recognition Python library and it’s dependencies, including the machine learning library dlib. You can do that automatically with this single command:

sudo pip3 -v install Cython face_recognition

Because there aren’t pre-built copies of dlib and numpy for the Jetson platform available, this command will compile those libraries from source code. So take this opportunity to grab lunch because this might take the better part of an hour.

When that finally completes, your Jetson Nano 2GB is ready to do face recognition with full CUDA GPU acceleration. On to the fun part!

Running the Face Recognition Doorbell Camera Demo App

The face_recognition library is a Python library I wrote that makes it super simple to do face recognition using dlib. It lets you detect faces, turn each detected face into a unique face encoding, and then compare those face encodings to see if they are likely the same person — all with just a couple of lines of code.

Using that library, I put together a doorbell camera application that can recognize people who walk up to your front door and track each time the person comes back. Here’s it looks like when you run it:

To get started, download the code. I’ve posted the full code here with comments, but here’s an easier way to download it onto your Jetson Nano from the command line:

wget -O doorcam.py tiny.cc/doorcam2gb

At the top of the program, you need edit a line of code to tell it if you are using a USB camera or a Raspberry Pi camera module. You can edit the file like this:

gedit doorcam.py

Follow the instructions, then save it, exit GEdit and run the code:

python3 doorcam.py

You’ll see a video window pop up on your desktop. Whenever a new person steps in front of the camera, it will register their face and start tracking how long they have been near your door. If the same person leaves and comes back more than 5 minutes later, it will register a new visit and track them again. You can hit ‘q’ on your keyboard at any time to exit.

The app will automatically save information about everyone it sees to a file called known_faces.dat. When you run the program again, it will use that data to remember previous visitors. If you want to clear out the list of known faces, just quit the program and delete that file.

Turning It Into a Stand-Alone Hardware Device

At this point, we have a dev board running a face recognition model, but it’s still tethered to our desktop for power and display. Let’s look at how we could run it without needing to be plugged in.

One of the cool things about modern single-board computers is that they all pretty much support the same hardware standards, like USB. That means there are lots of cheap add-ons you can get on Amazon like touch-screen displays and batteries. You have lots of options for input, output and power. Here’s what I ordered (but anything similar is fine):

A 7-inch touch-screen HDMI display than can run off of USB power:

And a generic USB-C battery pack to supply power:

Let’s hook that it up and see what it looks like running as a stand-alone device. Just plug in the USB battery instead of a wall charger and plug in the HDMI display to both the HDMI port and a USB port to act as a screen and a mouse input.

It works great. The touch screen operates as a normal USB mouse without any extra configuration. The only downside is that the Jetson Nano 2GB throttles down the GPU speed if it draws more power than the USB battery pack can supply. But it still ran fine.

With a little bit of creativity, you could package this all into a project case to use as a prototype hardware device to test out your own ideas with almost no upstart costs. And if you ever came up with something that you wanted to produce in volume, there are production versions of the Jetson boards that you can buy and use to build a real hardware product.

Doorbell Camera Python Code Walkthrough

Want to know how the code works? Let’s step through it.

The code starts off by importing the libraries we are going to be using. The most important ones are OpenCV (called cv2 in Python), which we’ll use to read images from the camera, and face_recognition, which we’ll use to detect and compare faces.

import face_recognition
import cv2
from datetime import datetime, timedelta
import numpy as np
import platform
import pickle

Then, we need to know how to access the camera — getting an image from a Raspberry Pi camera module works differently than using a USB camera. So just change this variable toTrue or False depending on your hardware:

# Set this depending on your camera type:
# - True = Raspberry Pi 2.x camera module
# - False = USB webcam or other USB video input (like an HDMI capture device)
USING_RPI_CAMERA_MODULE = False

Next, we are going to create some variables to store data about the people who walk in front of our camera. These variables will act as a simple database of known visitors.

known_face_encodings = []
known_face_metadata = []

This application is just a demo, so we are storing our known faces in a Python list. In a real-world application that deals with more faces, you might want to use a real database instead, but I wanted to keep this demo simple.

Next, we have a function to save and load the known face data. Here’s the save function:

def save_known_faces():
    with open("known_faces.dat", "wb") as face_data_file:
        face_data = [known_face_encodings, known_face_metadata]
        pickle.dump(face_data, face_data_file)
        print("Known faces backed up to disk.")

This writes the known faces to disk using Python’s built-in pickle functionality. The data is loaded back the same way, but I didn’t show that here.

Whenever our program detects a new face, we’ll call a function to add it to our known face database:

def register_new_face(face_encoding, face_image):
    known_face_encodings.append(face_encoding)known_face_metadata.append({
        "first_seen": datetime.now(),
        "first_seen_this_interaction": datetime.now(),
        "last_seen": datetime.now(),
        "seen_count": 1,
        "seen_frames": 1,
        "face_image": face_image,
    })

First, we are storing the face encoding that represents the face in a list. Then, we are storing a matching dictionary of data about the face in a second list. We’ll use this to track the time we first saw the person, how long they’ve been hanging around the camera recently, how many times they have visited our house, and a small image of their face.

We also need a helper function to check if an unknown face is already in our face database or not:

def lookup_known_face(face_encoding):
    metadata = None    if len(known_face_encodings) == 0:
        return metadata    face_distances = face_recognition.face_distance(
        known_face_encodings, 
        face_encoding
    )    best_match_index = np.argmin(face_distances)    if face_distances[best_match_index] < 0.65:
        metadata = known_face_metadata[best_match_index]
        metadata["last_seen"] = datetime.now()
        metadata["seen_frames"] += 1        if datetime.now() - metadata["first_seen_this_interaction"]  
                > timedelta(minutes=5):
            metadata["first_seen_this_interaction"] = datetime.now()
            metadata["seen_count"] += 1    return metadata

We are doing a few important things here:

Using the face_recogntion library, we check how similar the unknown face is to all previous visitors. The face_distance() function gives us a numerical measurement of similarity between the unknown face and all known faces — the smaller the number, the more similar the faces.
If the face is very similar to one of our known visitors, we assume they are a repeat visitor. In that case, we update their “last seen” time and increment the number of times we have seen them in a frame of video.
Finally, if this person has been seen in front of the camera in the last five minutes, we assume they are still here as part of the same visit. Otherwise, we assume that this is a new visit to our house, so we’ll reset the time stamp tracking their most recent visit.

The rest of the program is the main loop — an endless loop where we fetch a frame of video, look for faces in the image, and process each face we see. It is the main heart of the program. Let’s check it out:

def main_loop():
    if USING_RPI_CAMERA_MODULE:
        video_capture = 
            cv2.VideoCapture(
                get_jetson_gstreamer_source(), 
                cv2.CAP_GSTREAMER
            )
    else:
        video_capture = cv2.VideoCapture(0)

The first step is to get access to the camera using whichever method is appropriate for our computer hardware.

Now let’s start grabbing frames of video:

while True:
    # Grab a single frame of video
    ret, frame = video_capture.read()    # Resize frame of video to 1/4 size
    small_frame = cv2.resize(frame, (0, 0), fx=0.25, fy=0.25)    # Convert the image from BGR color
    rgb_small_frame = small_frame[:, :, ::-1]

Each time we grab a frame of video, we’ll also shrink it to 1/4 size. This will make the face recognition process run faster at the expense of only detecting larger faces in the image. But since we are building a doorbell camera that only recognizes people near the camera, that shouldn’t be a problem.

We also have to deal with the fact that OpenCV pulls images from the camera with each pixel stored as a Blue-Green-Red value instead of the standard order of Red-Green-Blue. Before we can run face recognition on the image, we need to convert the image format.

Now we can detect all the faces in the image and convert each face into a face encoding. That only takes two lines of code:

face_locations = face_recognition.face_locations(rgb_small_frame)face_encodings = face_recognition.face_encodings(
                     rgb_small_frame, 
                     face_locations
                  )

Next, we’ll loop through each detected face and decide if it is someone we have seen in the past or a brand new visitor:

for face_location, face_encoding in zip(
                       face_locations, 
                       face_encodings):metadata = lookup_known_face(face_encoding)    if metadata is not None:
        time_at_door = datetime.now() - 
            metadata['first_seen_this_interaction']
        face_label = f"At door {int(time_at_door.total_seconds())}s"    else:
        face_label = "New visitor!"        # Grab the image of the face
        top, right, bottom, left = face_location
        face_image = small_frame[top:bottom, left:right]
        face_image = cv2.resize(face_image, (150, 150))        # Add the new face to our known face data
        register_new_face(face_encoding, face_image)

If we have seen the person before, we’ll retrieve the metadata we’ve stored about their previous visits. If not, we’ll add them to our face database and grab the picture of their face from the video image to add to our database.

Now that we have found all the people and figured out their identities, we can loop over the detected faces again just to draw boxes around each face and add a label to each face:

for (top, right, bottom, left), face_label in 
                  zip(face_locations, face_labels):
    # Scale back up face location
    # since the frame we detected in was 1/4 size
    top *= 4
    right *= 4
    bottom *= 4
    left *= 4    # Draw a box around the face
    cv2.rectangle(
        frame, (left, top), (right, bottom), (0, 0, 255), 2
    )    # Draw a label with a description below the face
    cv2.rectangle(
        frame, (left, bottom - 35), (right, bottom), 
        (0, 0, 255), cv2.FILLED
    )
    cv2.putText(
        frame, face_label, 
        (left + 6, bottom - 6), 
        cv2.FONT_HERSHEY_DUPLEX, 0.8, 
        (255, 255, 255), 1
    )

I also wanted a running list of recent visitors drawn across the top of the screen with the number of times they have visited your house:

A graphical list of icons representing each person currently at your door.

To draw that, we need to loop over all known faces and see which ones have been in front of the camera recently. For each recent visitor, we’ll draw their face image on the screen and draw a visit count:

number_of_recent_visitors = 0for metadata in known_face_metadata:
    # If we have seen this person in the last minute
    if datetime.now() - metadata["last_seen"] 
                         < timedelta(seconds=10):# Draw the known face image
        x_position = number_of_recent_visitors * 150frame[30:180, x_position:x_position + 150] =
              metadata["face_image"]number_of_recent_visitors += 1        # Label the image with how many times they have visited
        visits = metadata['seen_count']
        visit_label = f"{visits} visits"if visits == 1:
            visit_label = "First visit"cv2.putText(
            frame, visit_label, 
            (x_position + 10, 170), 
            cv2.FONT_HERSHEY_DUPLEX, 0.6, 
            (255, 255, 255), 1
        )

Finally, we can display the current frame of video on the screen with all of our annotations drawn on top of it:

cv2.imshow('Video', frame)

And to make sure we don’t lose data if the program crashes, we’ll save our list of known faces to disk every 100 frames:

if len(face_locations) > 0 and number_of_frames_since_save > 100:
    save_known_faces()
    number_of_faces_since_save = 0
else:
    number_of_faces_since_save += 1

And that’s it aside from a line or two of clean up code to turn off the camera when the program exits.

The start-up code for the program is at the very bottom of the program:

if __name__ == "__main__":
    load_known_faces()
    main_loop()

All we are doing is loading the known faces (if any) and then starting the main loop that reads from the camera forever and displays the results on the screen.

The whole program is only about 200 lines, but it detects visitors, identifies them and tracks every single time they have come back to your door.

Fun fact: This kind of face tracking code is running inside many street and bus station advertisements to track who is looking at ads and for how long. That might have sounded far fetched to you before, but you just built the same thing for $60!

Extending the Program

This program is an example of how you can use a small amount of Python 3 code running on a cheap Jetson Nano 2GB board to build a powerful system.

If you wanted to turn this into a real doorbell camera system, you could add the ability for the system to send you a text message using Twilio whenever it detects a new person at the door instead of just showing it on your monitor. Or you might try replacing the simple in-memory face database with a real database.

You can also try to warp this program into something entirely different. The pattern of reading a frame of video, looking for something in the image, and then taking an action is the basis of all kinds of computer vision systems. Try changing the code and see what you can come up with! How about making it play yourself custom theme music whenever you get home and walk up to your own door? You can check out some of the other face_recognition Python examples to see how you might do something like this.

Learn More about the Nvidia Jetson Platform

If you want to learn more about building stuff with the Nvidia Jetson hardware platform, Nvidia has a new a free Jetson training course. Check out their website for more info. There are also great community resources, like the JetsonHacks website.

If you want to learn more about building ML and AI systems with Python in general, check out my other articles and my book on my website.

If you liked this article, sign up for my Machine Learning is Fun! Newsletter to find out when I post something new:

You can also follow me on Twitter at @ageitgey, email me directly or find me on linkedin.