A Step-By-Step Guide to Predictive Analytics on AWS — Generating Sensor Data with a Raspberry Pi and Storing it on S3

Azada Henze@ Comsysto Reply
comsystoreply
Published in
5 min readMar 3, 2022

Authors: Alexey Konovalenko, Azada Henze

Introduction

For the past decade, AWS has provided numerous services that meet the challenges of both the technological and business world. This is the inspiration for us to contribute to building the bridge between the Industrial IoT and Business Analytics. We have decided on writing a sequence of tutorials that will provide the most essential guidance to build an end-to-end solution for predictive analytics with IoT data. We will not dwell on the details of Raspberry Pi installation, or AWS user guides, for that we have official documentation. Instead, we will help you to follow only the necessary steps to gain the business value from your data with the least effort. Isn’t that what we all strive for, after all?

The setting is the following: this first blog post will help you with setting up a Raspberry Pi with Python to generate sensor data. It will also give a brief overview of how to store the incoming sensor data on S3. The second post will illustrate the steps you need to build an ETL pipeline with AWS Glue and AWS Athena. Finally, we will focus on visualizing and analyzing IoT sensor data on such AWS services as Quicksight and Metabase.

Why Raspberry Pi?

Photo by Harrison Broadbent on Unsplash

The Raspberry Pi is a tiny but powerful computer that can provide an impressive playground for building IoT prototypes. The benefits of using a Raspberry Pi are that you can easily order a Raspberry Pi Kit with the necessary equipment on Amazon and integrate it with one or multiple sensors. There are many sensors you can choose from depending on the IoT use case: motion, sound, temperature and humidity sensors, navigation modules, and many others. Also, you can easily connect it with your screen and keyboard, or access it remotely via ssh. To have a new Raspberry Pi set up and running, you can follow the official documentation here.

For our IoT use case, we used DHT11, an ultra-low-cost temperature and humidity sensor. It is available as a sensor and as a module that has a pull-up resistor and a power-on LED. The temperature range of the sensor is 0–50° C with ±2 °C accuracy. The humidity range is 20–80% with 5% accuracy. There is another sensor, DHT22, that works in a bigger range of humidity of 0–100% and a temperature range of -40 to 80° C. DHT22 offers more accurate measurements of 2–5% and ±0.5° C, but it is also more expensive and has a lower sampling rate.

Once you have Raspberry Pi set up and the sensor is connected as well, we can start writing Python code. For starters, make sure that you have installed the necessary python libraries by running the code below

pip3 install adafruit-circuitpython-dht

Below is the code with the necessary comments to guide you.

"""
humidity_sensor.py
"""
import adafruit_dht
import board
import time
"""Initialize the sensor and set correct board pin
"""
sensor = adafruit_dht.DHT11(board.D2)while True:
try:
temperature = sensor.temperature
temperature_f = temperature * (9 / 5) + 32
humidity = sensor.humidity
print(f"Temperature: {temperature:.1f} C ({temperature_f:.1f} F). Humidity: {humidity}% ")
except RuntimeError as error:
print(error.args[0])
except Exception as error:
sensor.exit()
raise error
time.sleep(1.0)

When you run this example, you will see the current temperature and humidity printed in the console.

Good for now!

Storing sensor data on S3

In the previous example, we were able to retrieve the sensor data, but we also would want to save it for further processing. As promised, we are going to use AWS S3 for that.

Given the fact that AWS is a major cloud provider, S3 builds the most integral core service of AWS. S3 is a leading provider of secure, highly available, and scalable storage for data. For our small IoT experiment, S3 provides an easy to manage storage for IoT devices and offers easy solutions for big data use-cases. That implies that we can effortlessly store our incoming sensor data from the Raspberry Pi on S3, perform ETL with Athena and perform data analytics on Quicksight and Metabase. We will describe each of these steps in our following tutorials.

For now, we need to prepare everything in AWS (if you don’t have an AWS account, you can create one and use the free tier here). To go on, we will need:

  • S3 bucket to store raw data
  • User with write permission to the S3 bucket

We will not try and reinvent the wheel by writing the documentation on any of the abovementioned requirements, but instead forward you to the official documentation on AWS.

Now that you have your S3 bucket and the permissions set up, we can start uploading the data to our bucket. Oh yes, the code below is long, but it again includes all the necessary comments to help you through with it.

Before we do that, make sure to install boto3, a library that enables communication with AWS with python:

pip3 install boto3"""
humidity_sensor.py
"""
import adafruit_dht
import board
import boto3
import datetime
import time
"""Initialize the sensor. Set correct board pin
"""
sensor = adafruit_dht.DHT11(board.D2)
AWS_S3_BUCKET = "bucket name"
AWS_S3_KEY_PREFIX = "/path/to/raw/sensor/data"
"""Initialize the AWS S3 service client. You need to set up authentication credentials for your AWS account before
using Boto3. You can use the `aws configure` command to configure the credentials file, or manually create it (the default location is ~/.aws/credentials)
"""
s3 = boto3.resource("s3")
class Reading:
def __init__(self, temperature=None, temperature_f=None, humidity=None):
self.timestamp = datetime.datetime.now().isoformat(timespec="milliseconds")
self.temperature = temperature
self.temperature_f = temperature_f
self.humidity = humidity
def to_csv(self):
"""CSV representation of the object
Column order:
- timestamp
- temperature (Celsius)
- temperature (Fahrenheit)
- humidity
"""
return (
f"{self.timestamp},{self.temperature},{self.temperature_f},{self.humidity}" )
def get_reading() -> Reading:
"""Returns current readings from the sensor. Sensor DHT11 has sampling rate of 1 Hz, so calling this method more
than once per second will result in the same readings
"""
try:
temperature = sensor.temperature
temperature_f = temperature * (9 / 5) + 32
humidity = sensor.humidity
print(
f"Temperature: {temperature:.1f} C ({temperature_f:.1f} F). Humidity: {humidity}%")
return Reading(temperature, temperature_f, humidity)
except RuntimeError as error:
print(error.args[0])
return Reading()
except Exception as error:
sensor.exit()
raise error
def upload_data(readings: list[Reading]):
"""Forms a CSV file and uploads it to AWS"""
body = "\n".join(map(Reading.to_csv, readings))
filename = f"readings-{datetime.datetime.now().isoformat(timespec='seconds')}.csv"
s3object = s3.Object(AWS_S3_BUCKET, AWS_S3_KEY_PREFIX + filename)
s3object.put(Body=body)
print(f"File uploaded: {filename}")
if __name__ == "__main__":
readings = list[Reading]()
while True:
reading = get_reading()
readings.append(reading)
if len(readings) >= 60:
"""Approximately every minute readings are uploaded to AWS S3"""
upload_data(readings)
readings.clear()
time.sleep(1.0)

As defined in the code, we upload the sensor data every minute and store it in CSV format. Keep the application running for a while, and you will finally see the uploaded files appear in the S3 bucket.

Let’s conclude…

That’s about it for now. We hope you’ve followed all the steps with us, and have successfully generated sensor data and stored it on AWS. As we know, in most of the real-world use-cases, this is the hardest part 🤷🏻. In our next tutorials, we will focus on building an ETL Pipeline on AWS with Athena and Glue, and later on, on doing predictive analytics on Quicksight and Metabase. So, stay tuned with us…

This blogpost is published by Comsysto Reply GmbH

(https://legal.comsysto.com/comsystoreply.de/en/impressum/)

--

--