Person Detection from Raspberry Pi Images on the Cloud

Shreyas Kera
9 min readNov 9, 2021
Photo by Jonathan Lampel on Unsplash

The aim is to implement a drone system for detecting people through aerial images. The captured images and the GPS location of the drone are then sent to the cloud where the image is processed and the number of detections is identified. This information is then displayed on a real-time dashboard to keep track of all the detections, the processed images, and their corresponding GPS coordinates for subsequent action to be taken. To do this, multiple components are needed. To process images in real-time, a GPU is hugely helpful, which Google Colab provides free of charge. To get the images to Colab, we need a middle-man to sync with the RPi as well as store and send the images to Colab. This works perfectly when using Google Drive. To see the results aesthetically, we can run a Flask webpage, which is invoked directly from Colab. To get a better understanding of the pipeline, we can visualize the system architecture. You can also follow along with the source code for this article, available below.

System Architecture

Briefly, the system comprises the RPi to capture data, which is sent to Google Drive for storage. A Colab instance picks up this data and with the help of a GPU, processes the image using a deep learning model to detect persons in the images. Finally, the processed data is displayed on a webpage using the help of Flask.

Pipeline

Raspberry Pi

In this article, the Raspberry Pi simulates the on-drone device, which can send images and GPS signals. We will simply assume these pictures have already been uploaded on the RPi system for brevity. For actual clicking of pictures, the Raspberry Pi Camera Modules can be used, which are official products from the Raspberry Pi Foundation. It is a 12-megapixel High-Quality Camera Module., and the command raspistill can be used to click pictures.

RPi with camera

The GPS coordinates can be captured simultaneously with the images to give the real-time position of the drone. For this article, random values for GPS values are simulated, but actual values can be obtained with the help of a GPS sensor. To pair the image and its corresponding GPS coordinates, they are saved based on the time they were captured, i.e. YYYY-MM-DD_hhmmss.jpg and YYYY-MM-DD_hhmmss.txt respectively. Once a batch of images and coordinates have been captured, they are zipped and uploaded to Google Drive. How is this upload done?

To connect your RPi with Google Drive, you can use the rclone facility. Following the steps on the following website will let you sync your Drive with your RPi.

Once this is done, we can set up a shell script to perform the actual transfer of data. The most important bit is to sync the local folder containing the images and GPS locations with the Google drive folder using the command rclone sync -v /home/pi/Documents/download drive:images. Any changes in the local folder ‘/home/pi/Documents/download’ will reflect in the Google Drive folder ‘images’. To see the actual script check out RPI-Drone-Image-Detection-IOT-Project/RaspberryPiCode/clickpic.sh from the repository linked at the start.

To automate this process and make it continuous, we can utilize a cronjob, which calls the shell script every 1 minute automatically.

The local folder with the captured images and their corresponding GPS coordinates.
The local folder with the zip file is synced with the Google Drive folder.
The cronjob that calls the shell script once a minute.

Google Drive

There are many motives behind choosing Drive. A cloud storage system is required to persistently store data received from the Raspberry Pi. Further, a method to transfer the said data and communicate with the server to receive processed images is required. The storage of images and the processed results need to happen on the cloud since the storage capabilities of the Raspberry Pi are limited. Thus we use the Google Drive cloud storage system, which is effective for our use case.

There are two purposes the Drive serves:

Drive as a Communication Channel

Google Drive serves as an intermediary communication channel between the Raspberry Pi and the Google Colab processing service. All data produced by the Raspberry Pi (images and GPS data) is synced once a minute to a private folder on Google Drive using the rclone facility. Google Colab can maintain constant synchronization with the data on Google Drive after mounting the appropriate Drive. Thus, data can be sent continuously and seamlessly between the Raspberry Pi and Colab.

Drive as a Storage Mechanism

Google Drive also serves to store all the data received from the Raspberry Pi, to remove the burden of having the Raspberry Pi store all the image and GPS data locally. The processed images and results too can be stored in a private Drive folder which persists even after the Colab notebook shuts down. The code for the image processing and flask application can also be stored in the Drive, and imported once the Colab notebook starts. The saved weights for the model are stored in Drive so that they can just be plugged into the model when required, instead of having to retrain the model.

Storing all the data received from the R-Pi on Drive

Google Colab

We briefly save the motive behind Colab, which I’ll explain a bit more in detail now. Once images and GPS data are received from the Raspberry Pi, a mechanism by which the Person Detection can take place in required. To this end a Deep Learning model can be used; however, the models are generally computationally intensive, thereby necessitating high computational power for inference. This can be done with the help of a GPU instance. A server is also required to host a Flask App by which the results can be visualized.

Model

The model used for inference is a pre-trained RetinaNet, trained on a subset of images from the Stanford Drone Dataset. The model is built using Keras and TensorFlow, both of which are Python packages the can be installed on Colab. Model weights are saved in Drive and imported when required. The RetinaNet architecture overview can be seen below.

GPU Instance

The main requirement for our use case is a GPU. When first testing the model on a normal CPU, it provided an average inference time of ~6 seconds. Given that this time was large, it would be infeasible to run detection on the RPi itself. Testing the model on the GPU instance Colab provides (16 GB Tesla T4), we get an average inference time of between 0.3 to 1 seconds per image (depending on the number of detections). This is more suitable for real-time applications.

Running a Webpage

To run a webpage, Colab allows us to use the Flask module. However, since the service provided by Colab is essentially a GPU-enabled VM, Flask cannot be run normally, since it generally connects to localhost. We use the Flask-Ngrok module to provide a URL that can be accessed through the browser.

Overview of our Google Colab setup

Flask

Once Colab runs the image processing, we display the latest processed image on a real-time dashboard along with the number of detections made. The card then has a link to all the previously processed images and their corresponding GPS locations. The code uses JS, CSS, HTML, and a Google Maps API to get the corresponding map link. We use the Flask-Ngrok module to provide a public URL rather than connecting to localhost since Colab uses a VM. Since the URL is public, anyone with the URL can access the webpage. There are utilities for refreshing the webpage, extracting the images for a zip file, and then continuously streaming the data. To get to the final product, several steps were taken (feel free to skip if you just want to get the result).

Method

  • The first approach to getting new data from the R-Pi was to constantly refresh the page. This would constantly query a folder on the Drive to find out if any new data has been received, and if it has then it would be displayed. The code for this is at RPI-Drone-Image-Detection-IOT-Project/flask-app/app-refresher.py.
  • The second approach was to stream the image data to the Flask server. This was done using the Response function in Flask. There was now no need to constantly refresh the page, as the stream would automatically update the current image if any new image is uploaded. The code for this is at RPI-Drone-Image-Detection-IOT-Project/flask-app/app-stream.py.
  • An extension to the second approach was to use zip files. It was found that Colab takes a longer time to sync several images. It took less time to sync a zip file that contained those images. So, a zip file is received with a new batch of images, the flask app first extracts the images, stores them in Drive, then runs inference on each of the images, and finally displays the results on the webpage. The code for this is at RPI-Drone-Image-Detection-IOT-Project/flask-app/app-stream-zips.py.
  • The final methodology makes use of all these previous implementations. The stream-zip approach is used to display a live feed of the processed images and number of detections, while the refresher approach is used in a separate URL to display all the previously processed images, as well as displaying the date they were captured and the link to their GPS location. The code is at RPI-Drone-Image-Detection-IOT-Project/flask-app/app.py.
  • The various HTML pages are linked by redirecting the website using the href function. The Map location URLs are received through dynamic routing.

Once the server is up, the Flask app can be started.

Running on a server with a random ngrok.io publicly available URL.
The basic frontend serves as our dashboard. Contains all the relevant details in a jiffy.

Result of the Image Processing

A live stream of the latest processed image and the number of detections identified. This image is refreshed to the latest version every time an updated image is uploaded on google drive. There is a link provided to be redirected to access all the previously processed images.

Using the Google Maps API we can view a small interactive window of the exact location obtained through the GPS signal.

The flask webpage can be accessed by multiple users, through any device like a smartphone using the ngrok URL as long as the colab server is running.

Accessed from a mobile phone

Conclusion

The overall pipeline allows us to send Images and GPS coordinates from a Raspberry Pi to a Drive cloud Storage, followed by person detection using a Deep Learning model run on Google Colab. The final results achieved include:

  • A live stream of the latest received image with bounding boxes drawn over the detected people.
  • The count of the number of detections.
  • A display of all the previously processed images.
  • The date when each image was captured.
  • The GPS location for each image is displayed on Google Maps.

--

--