Yolo & Raspberry Pi : How to create a smart camera

11 min readJun 30, 2024

A project to keep an eye on your house while you’re on holiday and receive e-mail alerts in the case of an intruder.

Introduction

Are you going on holiday and want to keep an eye on your home? Would you like to learn about artificial intelligence and computer vision? Do you have a Raspberry Pi, a webcam and a bit of spare time? Then this is the project for you!
In this article, we’re going to look at how to create an intelligent surveillance camera in Python using a Raspberry Pi, at low cost and without losing control of your data.
This camera will be able to detect the presence of anyone in the webcam’s field of vision and send you an email alert with a photo.
To carry out this project, we’re going to use computer vision, a branch of artificial intelligence that can process information on images by analysing the pixels that make them up.
Let’s start with the basics, what Artificial Intelligence (AI) is and the importance of protecting personal data (RGPD).

You said AI, what is it?

Artificial intelligence (AI) is a computer programme that imitates human intelligence, making it possible to carry out tasks that normally require human intelligence, such as voice recognition, machine translation, chess or autonomous driving.
Computer vision is a sub-discipline of AI that focuses on the processing and analysis of images and videos. Its aim is to enable machines to see and understand the visual world, just as humans do with their eyes and brains.
Computer Vision is based on mathematical and statistical techniques and today has many practical applications, including in security, medicine, robotics, the automotive industry, entertainment and commerce. For example, Computer Vision can be used to anticipate and help detect pathologies on X-rays or MRIs, to guide robots in complex environments, to recognise road signs or pedestrians on the road, to animate virtual characters or to identify products on a shelf.

And what about the protection of personal data such as your face and identity?

Why should we respect the GDPR and data protection?

Computer vision, like any technology, has its advantages but also its risks. One of the main risks is respect for the privacy and personal data of the people filmed or photographed by the cameras. The images and videos may contain sensitive information, such as identity, facial expression, emotion, location, activity, etc. ….
The General Data Protection Regulation (GDPR) is a European law designed to protect the rights and freedoms of European citizens with regard to the processing of personal data.
The GDPR requires data controllers to comply with the principles of lawfulness, fairness and transparency, purpose limitation, data minimisation, accuracy and retention. The RGPD also gives people affected by data processing rights, such as the right of access, the right of rectification, the right to erasure, the right to portability or the right to object.
It is therefore essential to respect the GDPR and data ownership when using computer vision, particularly for surveillance camera projects, which is why developing your own solution means you can retain control over the entire data lifecycle: acquisition, processing, use and storage.

Material :

Raspberry Pi: to run the Computer Vision and execute all the tasks
Webcam: for the acquisition of the images
[Optional] Arduino KY-035 & ADS1115: Magnetic sensor that can be used to detect the opening of the door. It must be combined with a 16-bit analogue-to-digital converter (ADC) to transmit the information to the Raspberry Pi.

A few prerequisites for your Raspberry pi

The Raspberry Pi will be the brain of the mechanism. It will receive the images of your home captured by the webcam, process them with the computer vision model, and send an email.
The Raspberry Pi uses the Python programming language, and to use the Tensorflow Lite model, you need to install two libraries:
opencv-python which is an open source library that includes several hundred computer vision algorithms. It allows you to use the webcam, process the images and transmit them to the model.

In order to use these libraries, you will need to run the following command to install the Python library:

pip install numpy imutils opencv-python
pip install ultralytics

You Only Look Once: Yolo

The aim of this tutorial is not to train your own model, but to test an approach that is simple to implement using an already well-trained and proven model such as Yolo.

Yolo is an algorithm for detecting objects in an image. It is an artificial intelligence method for detecting and classifying objects in an image, such as people, cars or objects. To achieve this, it uses a convolutional neural network, which is a type of model capable of analysing the pixels in an image and extracting useful information from them.
Instead of looking at the image several times to find the objects, Yolo divides it into small cells. Each cell tries to predict where the objects within its perimeter are, and to which category they belong. It does this by calculating the coordinates and dimensions of the bounding boxes, which are rectangles that surround the objects, as well as the probabilities of belonging to each class. For example, a cell can say that there is an 80% chance that an object is a person, and a 20% chance that it is a bicycle.

Example: Let’s imagine a picture of a street with cars and pedestrians. YOLO will divide this image into a grid, predict the bounding boxes around each car and pedestrian, assign a class (car, pedestrian) to each box and eliminate redundant or unlikely predictions. The end result is an image with precise bounding boxes around every object detected, all in one go, hence the name ‘You Only Look Once’.

After seeing how the Yolo algorithm works, we’re now going to look at how to use it to create our intelligent surveillance camera using Python and a Raspberry Pi.
The aim of this use case is to detect the presence of a person entering the area filmed by the camera, and to send an alert email with the photo. We will therefore divide this approach into three main stages:

Loading the Yolo model, which will enable us to detect objects in the images captured by the camera.
Inferring from the model, which will return the bounding boxes and classes of the objects detected, and check whether any of them correspond to a person.
Sending the email, which will use the smtplib library to send a message with the photo attached to a pre-defined address.

We will go into more detail about each of these steps in the rest of this article.

Let’s get coding!

1. Loading the Yolo model

The first part of the code will be dedicated to loading the pre-trained Yolo model. To do this, we will first choose the :

Ultralytics/ultralytics

Contribute to Ultralytics/ultralytics by creating an account on DagsHub. Where people create machine learning projects.

dagshub.com

and then download it locally onto the RaspberryPi.
The Yolo model is initialized (the .pt file is downloaded at the start of the program).

from ultralytics import YOLO

#load and save YOLO model
path_dir = "/home/florianberga/Desktop/camera/"
model = YOLO(path_dir+"yolov8n.pt"). #yolov5nu.pt is better for my case

#show all classes of the Yolo model
print(model.names)

Now, the model is available, it’s time to test it.

2. Yolo model inference

Computer Vision model inference involves using the trained model to analyse images or videos and detect objects, faces, actions or any other relevant element. Inference transforms visual data into actionable information, which can then be used to make decisions, automate processes or improve services.

# initialize the video capture object
vs = VideoStream(src=0, resolution=(700, 700)).start()

#define the accuracy min to validate the class identified
accuracy_min = 0.7 
 
#define the colour of the text and the bounding box on the picture
Green_RGB_color = (0, 255, 0)

i=0
while True:
    try : # if the camera stop to work

        #get the picture of the webcam
        frame = vs.read() 

         #press key Q to exit
        if cv2.waitKey(1) & 0xFF ==ord('q'):
            break
        
        i=i+1
        if i>=15 :# 1 image per second is enough
   
            # we run the model of YOLO on the picture
            detection = model(frame)[0]
            Label_total=""
            nb_person=0

            # loop on all the detections
            
            for box in detection.boxes:

                #we extract the accuracy with the detection
                data=box.data.tolist()[0]
                accuracy = data[4]
                #we extract the class label name
                label=model.names.get(box.cls.item())

              # filter out bad detections
                if float(accuracy) < accuracy_min :
                    continue
                else:
                    Label_total=Label_total+label+'_'
                    if label=="person":
                        #A person has been detected per Yolo
                        nb_person=nb_person+1
                    
                    
               # draw the bounding box on the picture
                xmin, ymin, xmax, ymax = int(data[0]), int(data[1]), int(data[2]), int(data[3])
                cv2.rectangle(frame, (xmin, ymin) , (xmax, ymax), Green_RGB_color, 2)
               #draw confidence and label
                y = ymin - 15 if ymin - 15 > 15 else ymin + 15
                cv2.putText(frame, "{} {:.1f}%".format(label,float(confidence*100)), (xmin, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, Green_RGB_color, 2)


            if nb_person>0:
                
                today = datetime.date.today()
                datetoday = today.strftime("%Y-%m-%d")
                now = datetime.datetime.now()
                current_time = now.strftime("%H:%M:%S")

                cv2.putText(frame, datetoday+" "+current_time, (50,50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1)
                
                #we would like to save the picture : Create Dir if they don't exist
                if os.path.exists(path_dir+'folder_imwrite/')==False:
                    os.mkdir(path_dir+'folder_imwrite/')

                if os.path.exists(path_dir+'folder_imwrite/'+str(datetoday))==False:
                    os.mkdir(path_dir+'folder_imwrite/'+str(datetoday))
                #save
                cv2.imwrite(path_dir+'folder_imwrite/'+str(datetoday)+'/'+current_time+'__'+str(nb_person)+"-pers__"+str(j)+'.jpg', frame)


                # we would like to show the frame to our screen
                #cv2.imshow("Frame", frame)




    except :
        print("Error camera")


#vs.release()
vs.stop()
cv2.destroyAllWindows()

Once the model has identified a person in the image, it moves on to the next stage: sending an email to let people know that someone is there.

3. Send the mail

Before looking at how to send mail in Python, here are a few tips and prerequisites :

For my part, I’ve created a distinctive Gmail account: camera.xxxx@gmail.com
App Password
An app password is a 16-digit passcode that gives a less secure app or device permission to access your Google Account. App passwords can only be used with accounts that have 2-Step Verification turned on.
To find out how to recover your password :
https://support.google.com/mail/answer/185833?hl=en


def envoie_mail(nb_people_max, table_photo):

    email_subject = "Home XXX - Detection someone at home"

    sender_email_address = "Camera.xxxx@gmail.com"
    receiver_email_address = "xxx.xxxx@gmail.com"
    email_smtp = "smtp.gmail.com"
    email_password = "xxxx xxxx xxxx xxxx" #App Password : https://support.google.com/mail/answer/185833?hl=en
  
# create an email message object
    message = EmailMessage()
  
# configure email headers
    message['Subject'] = email_subject
    message['From'] = sender_email_address
    message['To'] = receiver_email_address
  
# set email body text
    message.set_content(str(nb_people_max)+" people detected at home")
    i_image=0
    while i_image<len(table_photo):
        image_data=""
  # open image as a binary file and read the contents
        with open(table_photo[i_image], 'rb') as file:
            image_data = file.read()
  # attach image to email
  # message.add_attachment(image_data, maintype='image', subtype=imghdr.what(None, image_data))
        message.add_attachment(image_data, maintype='image', subtype="jpeg")
        i_image=i_image+1


# set smtp server and port
    server = smtplib.SMTP(email_smtp, '587')
# identify this client to the SMTP server
    server.ehlo()
# secure the SMTP connection
    server.starttls()
  
# login to email account
    server.login(sender_email_address, email_password)
# send email
    server.send_message(message)
# close connection to server
    server.quit()

4. Full code : main.py


# -*- coding: utf-8 -*-

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import threading

import datetime
from ultralytics import YOLO
import cv2
from imutils.video import VideoStream
import screeninfo
import os
import smtplib
from email.message import EmailMessage


#load and save YOLO model
path_dir = "/home/florianberga/Desktop/camera/"
model = YOLO(path_dir+"yolov8n.pt") #yolov5nu.pt is better for my case


accuracy_min = 0.7
Green_RGB_color = (0, 255, 0)



def envoie_mail(nb_people_max, table_photo):

    email_subject = "Home XXX - Detecting someone at home"

    sender_email_address = "Camera.xxxx@gmail.com"
    receiver_email_address = "xxx.xxxx@gmail.com"
    email_smtp = "smtp.gmail.com"
    email_password = "xxxx xxxx xxxx xxxx" #App Passworld
  
# create an email message object
    message = EmailMessage()
  
# configure email headers
    message['Subject'] = email_subject
    message['From'] = sender_email_address
    message['To'] = receiver_email_address
  
# set email body text
    message.set_content(str(nb_people_max)+" people detected at home")
    i_image=0
    while i_image<len(table_photo):
        image_data=""
  # open image as a binary file and read the contents
        with open(table_photo[i_image], 'rb') as file:
            image_data = file.read()
  # attach image to email
  # message.add_attachment(image_data, maintype='image', subtype=imghdr.what(None, image_data))
        message.add_attachment(image_data, maintype='image', subtype="jpeg")
        i_image=i_image+1

# set smtp server and port
    server = smtplib.SMTP(email_smtp, '587')
# identify this client to the SMTP server
    server.ehlo()
# secure the SMTP connection
    server.starttls()
  
# login to email account
    server.login(sender_email_address, email_password)
# send email
    server.send_message(message)
# close connection to server
    server.quit()




# initialize the video capture object
vs = VideoStream(src=0, resolution=(700, 700)).start()

   
i=0
j=0
nb_image_mail=0
nb_personne_max=0
table_path_image=[]

while True:
    try : # if the camera stop to work

        #get the picture of the webcam
        frame = vs.read() 

         #press key Q to exit
        if cv2.waitKey(1) & 0xFF ==ord('q'):
            break
        
        i=i+1
        if i>=15 :# 1 image per second is enough
   
            # we run the model of YOLO on the picture
            detection = model(frame)[0]
            Label_total=""
            nb_person=0

            # loop on all the detections
            
            for box in detection.boxes:

                #we extract the accuracy with the detection
                data=box.data.tolist()[0]
                accuracy = data[4]
                #we extract the class label name
                label=model.names.get(box.cls.item())

              # filter out bad detections
                if float(accuracy) < accuracy_min :
                    continue
                else:
                    Label_total=Label_total+label+'_'
                    if label=="person":
                        #A person has been detected per Yolo
                        nb_person=nb_person+1
                    
                    
               # draw the bounding box on the picture
                xmin, ymin, xmax, ymax = int(data[0]), int(data[1]), int(data[2]), int(data[3])
                cv2.rectangle(frame, (xmin, ymin) , (xmax, ymax), Green_RGB_color, 2)
               #draw confidence and label
                y = ymin - 15 if ymin - 15 > 15 else ymin + 15
                cv2.putText(frame, "{} {:.1f}%".format(label,float(confidence*100)), (xmin, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, Green_RGB_color, 2)
  
            if nb_person>0:
                
                today = datetime.date.today()
                datetoday = today.strftime("%Y-%m-%d")
                now = datetime.datetime.now()
                current_time = now.strftime("%H:%M:%S")

                cv2.putText(frame, datetoday+" "+current_time, (50,50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1)
                
                #we would like to save the picture : Create Dir if they don't exist
                if os.path.exists(path_dir+'folder_imwrite/')==False:
                    os.mkdir(path_dir+'folder_imwrite/')

                if os.path.exists(path_dir+'folder_imwrite/'+str(datetoday))==False:
                    os.mkdir(path_dir+'folder_imwrite/'+str(datetoday))
                #save
                cv2.imwrite(path_dir+'folder_imwrite/'+str(datetoday)+'/'+current_time+'__'+str(nb_person)+"-pers__"+str(j)+'.jpg', frame)
                # we would like to show the frame to our screen
                #cv2.imshow("Frame", frame)

                bilan_day_nbpersonnes.append(str(nb_person))
                bilan_heure_presence.append(str(current_time))
                bilan_day_presence.append(str(datetoday))
   
                if nb_personne_max<nb_person:
                    nb_personne_max=nb_person
   
                if j>1800: #noboady during 1h (1800)
                    table_path_image.append(path_dir+'folder_imwrite/'+str(datetoday)+'/'+current_time+'__'+str(nb_person)+"-pers__"+str(j)+'.jpg')

    #We can send the email to inform there is someone
    # We would like to collect other pictures before to send the mail : to catch other pictures : 3
                    if nb_image_mail>=3:
     #we send the mail
                        envoie_mail(nb_personne_max, table_path_image)
                        j=0
                        nb_image_mail=0
                        nb_personne_max=0
                        table_path_image.clear()
                    else:
                        nb_image_mail=nb_image_mail+1
            else:
                today = datetime.date.today()
                datetoday = today.strftime("%Y-%m-%d")
                now = datetime.datetime.now()
                current_time = now.strftime("%H:%M:%S")

            j=j+1
            i=0

    except :
        print("Error camera")


#video_cap.release()
vs.stop()
cv2.destroyAllWindows()

Launch the mechanism when the Raspberry Pi is switched on

If you want to run the code when you switch on the Raspberry Pi, you just have to modify the setting “ crontab -e ”:

You can use a console mode editor like nano. Simply add the following line with the @reboot option:

@reboot python3 /home/florianberga/Desktop/camera/main.py &

It is important to note 2 points:
1. Use absolute paths
2. Add the & at the end of the line to launch your command in a separate process. This avoids blocking the Raspberry Pi’s start-up if your software doesn’t give up quickly.

Conclusion

I hope this has been helpful and inspiring to use Computer Vision in your own projects. Of course, this is just a demonstrator and can be easily improved with facial recognition and other training models, and other features...

Please don’t hesitate to contact me with any comments, questions or suggestions for improvement. Enjoy !