Yolo & Raspberry Pi : How to create a smart camera

Florian BERGAMASCO
11 min readJun 30, 2024

--

A project to keep an eye on your house while you’re on holiday and receive e-mail alerts in the case of an intruder.

Introduction

Are you going on holiday and want to keep an eye on your home? Would you like to learn about artificial intelligence and computer vision? Do you have a Raspberry Pi, a webcam and a bit of spare time? Then this is the project for you!
In this article, we’re going to look at how to create an intelligent surveillance camera in Python using a Raspberry Pi, at low cost and without losing control of your data.
This camera will be able to detect the presence of anyone in the webcam’s field of vision and send you an email alert with a photo.
To carry out this project, we’re going to use computer vision, a branch of artificial intelligence that can process information on images by analysing the pixels that make them up.
Let’s start with the basics, what Artificial Intelligence (AI) is and the importance of protecting personal data (RGPD).

Dall-E generation via Copilot

You said AI, what is it?

Artificial intelligence (AI) is a computer programme that imitates human intelligence, making it possible to carry out tasks that normally require human intelligence, such as voice recognition, machine translation, chess or autonomous driving.
Computer vision is a sub-discipline of AI that focuses on the processing and analysis of images and videos. Its aim is to enable machines to see and understand the visual world, just as humans do with their eyes and brains.
Computer Vision is based on mathematical and statistical techniques and today has many practical applications, including in security, medicine, robotics, the automotive industry, entertainment and commerce. For example, Computer Vision can be used to anticipate and help detect pathologies on X-rays or MRIs, to guide robots in complex environments, to recognise road signs or pedestrians on the road, to animate virtual characters or to identify products on a shelf.

And what about the protection of personal data such as your face and identity?

Why should we respect the GDPR and data protection?

Computer vision, like any technology, has its advantages but also its risks. One of the main risks is respect for the privacy and personal data of the people filmed or photographed by the cameras. The images and videos may contain sensitive information, such as identity, facial expression, emotion, location, activity, etc. ….
The General Data Protection Regulation (GDPR) is a European law designed to protect the rights and freedoms of European citizens with regard to the processing of personal data.
The GDPR requires data controllers to comply with the principles of lawfulness, fairness and transparency, purpose limitation, data minimisation, accuracy and retention. The RGPD also gives people affected by data processing rights, such as the right of access, the right of rectification, the right to erasure, the right to portability or the right to object.
It is therefore essential to respect the GDPR and data ownership when using computer vision, particularly for surveillance camera projects, which is why developing your own solution means you can retain control over the entire data lifecycle: acquisition, processing, use and storage.

Dall-E generation via Copilot

Material :

  • Raspberry Pi: to run the Computer Vision and execute all the tasks
  • Webcam: for the acquisition of the images
  • [Optional] Arduino KY-035 & ADS1115: Magnetic sensor that can be used to detect the opening of the door. It must be combined with a 16-bit analogue-to-digital converter (ADC) to transmit the information to the Raspberry Pi.

A few prerequisites for your Raspberry pi

The Raspberry Pi will be the brain of the mechanism. It will receive the images of your home captured by the webcam, process them with the computer vision model, and send an email.
The Raspberry Pi uses the Python programming language, and to use the Tensorflow Lite model, you need to install two libraries:
opencv-python which is an open source library that includes several hundred computer vision algorithms. It allows you to use the webcam, process the images and transmit them to the model.

In order to use these libraries, you will need to run the following command to install the Python library:

pip install numpy imutils opencv-python
pip install ultralytics

You Only Look Once: Yolo

The aim of this tutorial is not to train your own model, but to test an approach that is simple to implement using an already well-trained and proven model such as Yolo.

Yolo is an algorithm for detecting objects in an image. It is an artificial intelligence method for detecting and classifying objects in an image, such as people, cars or objects. To achieve this, it uses a convolutional neural network, which is a type of model capable of analysing the pixels in an image and extracting useful information from them.
Instead of looking at the image several times to find the objects, Yolo divides it into small cells. Each cell tries to predict where the objects within its perimeter are, and to which category they belong. It does this by calculating the coordinates and dimensions of the bounding boxes, which are rectangles that surround the objects, as well as the probabilities of belonging to each class. For example, a cell can say that there is an 80% chance that an object is a person, and a 20% chance that it is a bicycle.

Example: Let’s imagine a picture of a street with cars and pedestrians. YOLO will divide this image into a grid, predict the bounding boxes around each car and pedestrian, assign a class (car, pedestrian) to each box and eliminate redundant or unlikely predictions. The end result is an image with precise bounding boxes around every object detected, all in one go, hence the name ‘You Only Look Once’.

Dall-E generation via Copilot

After seeing how the Yolo algorithm works, we’re now going to look at how to use it to create our intelligent surveillance camera using Python and a Raspberry Pi.
The aim of this use case is to detect the presence of a person entering the area filmed by the camera, and to send an alert email with the photo. We will therefore divide this approach into three main stages:

  • Loading the Yolo model, which will enable us to detect objects in the images captured by the camera.
  • Inferring from the model, which will return the bounding boxes and classes of the objects detected, and check whether any of them correspond to a person.
  • Sending the email, which will use the smtplib library to send a message with the photo attached to a pre-defined address.

We will go into more detail about each of these steps in the rest of this article.

Let’s get coding!

1. Loading the Yolo model

The first part of the code will be dedicated to loading the pre-trained Yolo model. To do this, we will first choose the :

and then download it locally onto the RaspberryPi.
The Yolo model is initialized (the .pt file is downloaded at the start of the program).

from ultralytics import YOLO

#load and save YOLO model
path_dir = "/home/florianberga/Desktop/camera/"
model = YOLO(path_dir+"yolov8n.pt"). #yolov5nu.pt is better for my case

#show all classes of the Yolo model
print(model.names)

Now, the model is available, it’s time to test it.

2. Yolo model inference

Computer Vision model inference involves using the trained model to analyse images or videos and detect objects, faces, actions or any other relevant element. Inference transforms visual data into actionable information, which can then be used to make decisions, automate processes or improve services.

# initialize the video capture object
vs = VideoStream(src=0, resolution=(700, 700)).start()

#define the accuracy min to validate the class identified
accuracy_min = 0.7

#define the colour of the text and the bounding box on the picture
Green_RGB_color = (0, 255, 0)

i=0
while True:
try : # if the camera stop to work

#get the picture of the webcam
frame = vs.read()

#press key Q to exit
if cv2.waitKey(1) & 0xFF ==ord('q'):
break

i=i+1
if i>=15 :# 1 image per second is enough

# we run the model of YOLO on the picture
detection = model(frame)[0]
Label_total=""
nb_person=0

# loop on all the detections

for box in detection.boxes:

#we extract the accuracy with the detection
data=box.data.tolist()[0]
accuracy = data[4]
#we extract the class label name
label=model.names.get(box.cls.item())

# filter out bad detections
if float(accuracy) < accuracy_min :
continue
else:
Label_total=Label_total+label+'_'
if label=="person":
#A person has been detected per Yolo
nb_person=nb_person+1


# draw the bounding box on the picture
xmin, ymin, xmax, ymax = int(data[0]), int(data[1]), int(data[2]), int(data[3])
cv2.rectangle(frame, (xmin, ymin) , (xmax, ymax), Green_RGB_color, 2)
#draw confidence and label
y = ymin - 15 if ymin - 15 > 15 else ymin + 15
cv2.putText(frame, "{} {:.1f}%".format(label,float(confidence*100)), (xmin, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, Green_RGB_color, 2)


if nb_person>0:

today = datetime.date.today()
datetoday = today.strftime("%Y-%m-%d")
now = datetime.datetime.now()
current_time = now.strftime("%H:%M:%S")

cv2.putText(frame, datetoday+" "+current_time, (50,50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1)

#we would like to save the picture : Create Dir if they don't exist
if os.path.exists(path_dir+'folder_imwrite/')==False:
os.mkdir(path_dir+'folder_imwrite/')

if os.path.exists(path_dir+'folder_imwrite/'+str(datetoday))==False:
os.mkdir(path_dir+'folder_imwrite/'+str(datetoday))
#save
cv2.imwrite(path_dir+'folder_imwrite/'+str(datetoday)+'/'+current_time+'__'+str(nb_person)+"-pers__"+str(j)+'.jpg', frame)


# we would like to show the frame to our screen
#cv2.imshow("Frame", frame)




except :
print("Error camera")


#vs.release()
vs.stop()
cv2.destroyAllWindows()

Once the model has identified a person in the image, it moves on to the next stage: sending an email to let people know that someone is there.

3. Send the mail

Before looking at how to send mail in Python, here are a few tips and prerequisites :


def envoie_mail(nb_people_max, table_photo):

email_subject = "Home XXX - Detection someone at home"

sender_email_address = "Camera.xxxx@gmail.com"
receiver_email_address = "xxx.xxxx@gmail.com"
email_smtp = "smtp.gmail.com"
email_password = "xxxx xxxx xxxx xxxx" #App Password : https://support.google.com/mail/answer/185833?hl=en

# create an email message object
message = EmailMessage()

# configure email headers
message['Subject'] = email_subject
message['From'] = sender_email_address
message['To'] = receiver_email_address

# set email body text
message.set_content(str(nb_people_max)+" people detected at home")
i_image=0
while i_image<len(table_photo):
image_data=""
# open image as a binary file and read the contents
with open(table_photo[i_image], 'rb') as file:
image_data = file.read()
# attach image to email
# message.add_attachment(image_data, maintype='image', subtype=imghdr.what(None, image_data))
message.add_attachment(image_data, maintype='image', subtype="jpeg")
i_image=i_image+1


# set smtp server and port
server = smtplib.SMTP(email_smtp, '587')
# identify this client to the SMTP server
server.ehlo()
# secure the SMTP connection
server.starttls()

# login to email account
server.login(sender_email_address, email_password)
# send email
server.send_message(message)
# close connection to server
server.quit()

4. Full code : main.py


# -*- coding: utf-8 -*-

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import threading

import datetime
from ultralytics import YOLO
import cv2
from imutils.video import VideoStream
import screeninfo
import os
import smtplib
from email.message import EmailMessage


#load and save YOLO model
path_dir = "/home/florianberga/Desktop/camera/"
model = YOLO(path_dir+"yolov8n.pt") #yolov5nu.pt is better for my case


accuracy_min = 0.7
Green_RGB_color = (0, 255, 0)



def envoie_mail(nb_people_max, table_photo):

email_subject = "Home XXX - Detecting someone at home"

sender_email_address = "Camera.xxxx@gmail.com"
receiver_email_address = "xxx.xxxx@gmail.com"
email_smtp = "smtp.gmail.com"
email_password = "xxxx xxxx xxxx xxxx" #App Passworld

# create an email message object
message = EmailMessage()

# configure email headers
message['Subject'] = email_subject
message['From'] = sender_email_address
message['To'] = receiver_email_address

# set email body text
message.set_content(str(nb_people_max)+" people detected at home")
i_image=0
while i_image<len(table_photo):
image_data=""
# open image as a binary file and read the contents
with open(table_photo[i_image], 'rb') as file:
image_data = file.read()
# attach image to email
# message.add_attachment(image_data, maintype='image', subtype=imghdr.what(None, image_data))
message.add_attachment(image_data, maintype='image', subtype="jpeg")
i_image=i_image+1

# set smtp server and port
server = smtplib.SMTP(email_smtp, '587')
# identify this client to the SMTP server
server.ehlo()
# secure the SMTP connection
server.starttls()

# login to email account
server.login(sender_email_address, email_password)
# send email
server.send_message(message)
# close connection to server
server.quit()




# initialize the video capture object
vs = VideoStream(src=0, resolution=(700, 700)).start()


i=0
j=0
nb_image_mail=0
nb_personne_max=0
table_path_image=[]

while True:
try : # if the camera stop to work

#get the picture of the webcam
frame = vs.read()

#press key Q to exit
if cv2.waitKey(1) & 0xFF ==ord('q'):
break

i=i+1
if i>=15 :# 1 image per second is enough

# we run the model of YOLO on the picture
detection = model(frame)[0]
Label_total=""
nb_person=0

# loop on all the detections

for box in detection.boxes:

#we extract the accuracy with the detection
data=box.data.tolist()[0]
accuracy = data[4]
#we extract the class label name
label=model.names.get(box.cls.item())

# filter out bad detections
if float(accuracy) < accuracy_min :
continue
else:
Label_total=Label_total+label+'_'
if label=="person":
#A person has been detected per Yolo
nb_person=nb_person+1


# draw the bounding box on the picture
xmin, ymin, xmax, ymax = int(data[0]), int(data[1]), int(data[2]), int(data[3])
cv2.rectangle(frame, (xmin, ymin) , (xmax, ymax), Green_RGB_color, 2)
#draw confidence and label
y = ymin - 15 if ymin - 15 > 15 else ymin + 15
cv2.putText(frame, "{} {:.1f}%".format(label,float(confidence*100)), (xmin, y), cv2.FONT_HERSHEY_SIMPLEX, 0.5, Green_RGB_color, 2)

if nb_person>0:

today = datetime.date.today()
datetoday = today.strftime("%Y-%m-%d")
now = datetime.datetime.now()
current_time = now.strftime("%H:%M:%S")

cv2.putText(frame, datetoday+" "+current_time, (50,50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1)

#we would like to save the picture : Create Dir if they don't exist
if os.path.exists(path_dir+'folder_imwrite/')==False:
os.mkdir(path_dir+'folder_imwrite/')

if os.path.exists(path_dir+'folder_imwrite/'+str(datetoday))==False:
os.mkdir(path_dir+'folder_imwrite/'+str(datetoday))
#save
cv2.imwrite(path_dir+'folder_imwrite/'+str(datetoday)+'/'+current_time+'__'+str(nb_person)+"-pers__"+str(j)+'.jpg', frame)
# we would like to show the frame to our screen
#cv2.imshow("Frame", frame)

bilan_day_nbpersonnes.append(str(nb_person))
bilan_heure_presence.append(str(current_time))
bilan_day_presence.append(str(datetoday))

if nb_personne_max<nb_person:
nb_personne_max=nb_person

if j>1800: #noboady during 1h (1800)
table_path_image.append(path_dir+'folder_imwrite/'+str(datetoday)+'/'+current_time+'__'+str(nb_person)+"-pers__"+str(j)+'.jpg')

#We can send the email to inform there is someone
# We would like to collect other pictures before to send the mail : to catch other pictures : 3
if nb_image_mail>=3:
#we send the mail
envoie_mail(nb_personne_max, table_path_image)
j=0
nb_image_mail=0
nb_personne_max=0
table_path_image.clear()
else:
nb_image_mail=nb_image_mail+1
else:
today = datetime.date.today()
datetoday = today.strftime("%Y-%m-%d")
now = datetime.datetime.now()
current_time = now.strftime("%H:%M:%S")

j=j+1
i=0

except :
print("Error camera")


#video_cap.release()
vs.stop()
cv2.destroyAllWindows()

Launch the mechanism when the Raspberry Pi is switched on

If you want to run the code when you switch on the Raspberry Pi, you just have to modify the setting “ crontab -e ”:

You can use a console mode editor like nano. Simply add the following line with the @reboot option:

@reboot python3 /home/florianberga/Desktop/camera/main.py &

It is important to note 2 points:
1. Use absolute paths
2. Add the & at the end of the line to launch your command in a separate process. This avoids blocking the Raspberry Pi’s start-up if your software doesn’t give up quickly.

Conclusion

I hope this has been helpful and inspiring to use Computer Vision in your own projects. Of course, this is just a demonstrator and can be easily improved with facial recognition and other training models, and other features...

Please don’t hesitate to contact me with any comments, questions or suggestions for improvement. Enjoy !

--

--