Getting Started on AWS IoT with Simulated Data
AWS IoT Overview and Example Projects
Jumping into your first IoT proof-of-concept can be tough. Navigating through raspberry pi firmware updates, debugging connectivity issues, and more can be a few of the initial roadblocks.
Fortunately, there’s an easier starting point that doesn’t involve using a soldering gun: Simulated Data.
I recently started using AWS’s IoT Simulator to quickly build out the architecture and analytics for IoT projects before buying hardware (Microcontrollers are currently suffering a shortage)
The device simulator enables you to create devices with specific telemetry and send it directly to AWS IoT Core.
Launch the IoT Simulator
Fortunately, the IoT Device Simulator comes with a CloudFormation template. All you have to do is create a CloudFormation stack, upload the template, provide an admin email, and log in once all resources have been provisioned.
The IoT Device Simulator app is pretty simple to use. Create a device with specific telemetry (Or use the provided automotive example) and then create a simulation to configure how many devices and frequency of transmission to AWS IoT Core.
Once we’ve created a device and simulation, we can confirm receipt with the MQTT test client:
Using other AWS Services for IoT Storage, Analytics, and Visualizations
Now that we have data coming in to AWS IoT Core, the real fun begins. We can now create rules to route messages from IoT Core to other services such as IoT Analytics, DynamoDB, Kinesis Firehose, and more.
The heart of this is IoT Rules for routing data from IoT Core to the respective service. You use a SQL query to select a subset of the IoT message to transmit to other AWS services. The example below queries the entire message payload.
Creating an action within a rule requires linking the rule to the relevant service and using a role that has the appropriate access.
Simplest Example — IoT Analytics
Before getting into Kinesis streaming and writing to time series databases such as Timestream, we’ll cover IoT Analytics. This service is essentially an abstraction of several services including S3, Kinesis Firehose, Athena, and Sagemaker. You can go from IoT Core to IoT Analytics and then directly to other analyses (e.g. Jupyter Notebooks) or visualizations (e.g., AWS Quicksight).
IoT Analytics can be a bit confusing at first; the AWS provided guide is super helpful:
Let’s create an IoT Analytics Channel, Datastore, Pipeline, and Dataset to explore. Brief definitions below:
- Channel: This is the ingestion portion. The IoT Core rule points to a specific IoT Analytics Channel
- Pipeline: The pipeline pulls data from a channel to a datastore. Pipelines can be run on a trigger or scheduled timer. Pipelines can also include lambda functions and other data enrichments prior to sending to a datastore
- Datastore: Datastores store the pipeline outputs. They are not exactly a database; they are more of an abstraction of S3 to store all data from the pipeline. Datastores can be queried as a dataset using SQL.
- Dataset: Datasets are the cleaned up, queried, filtered, enriched subset of the datastore for a particular analysis. There can be multiple datasets pulling from the same datastore.
To make things visual, here’s a quick summary below:
Now what? What to do with IoT Analytics Data
We can use two key services next to turn our datasets into insights:
- Quicksight
- Sagemaker Notebooks
Quicksight: Quicksight is AWS’s native dashboard tool. Quicksight is relatively easy to connect with IoT Analytics. You can connect IoT Analytics to Quicksight using the SPICE connection. Unfortunately, you aren’t able to directly query data at this point.
Sagemaker Notebooks: You can quickly and easily query data from a dataset within a Sagemaker notebook. More so, you can even start a notebook instance with a template from IoT Analytics to jumpstart your data analysis.
Streaming with Kinesis
IoT Analytics abstracts away a lot of services such as Kinesis. Directly using Kinesis provides more flexibility for storing IoT data as well as analyzing data in-transit (IoT Analytics is primarily for data at rest — data that has already landed in S3).
A summary high-level architecture below shows how these systems interact with each other:
Here is how each of those services are used
- IoT Core — Just as before, IoT Core is used to ingest IoT messages via MQTT protocol and use rules to route messages to respective services. A rule to send data to Kinesis firehose can be created just as the IoT Analytics rule was created
- Kinesis Firehose — This service mimics a bit of the datastore and pipeline elements of IoT Analytics. It streams IoT messages to storage (Typically S3 or Redshift), and additional pre-processing can be performed (e.g., lambda functions for additional processing and parquet files as outputs instead of .json files). Additionally, this service is managed, so you pay by the amount streamed
- S3 — This is an example output of Kinesis firehose.
- Glue ETL — To streamline querying the data in S3, you can make use of a Glue Crawler to identify the structure of the IoT data in S3 and create a table in Glue. This table can be queried as SQL in Athena
- Athena — This service is a managed SQL engine; you simply pay by amount of data queried. You can use this to query Glue tables created by Glue crawlers. This data can be used in Quicksight dashboards, Sagemaker notebooks, or just as ad-hoc queries in the Athena console
Summary and Next Steps
The cloud has opened a lot of doors for IoT use cases and apps. Using an IoT simulator to build and test an analytics pipeline helps streamline an overall project. Using AWS IoT Analytics abstracts away a lot of AWS resources and provides a fast way to get data from IoT Core to a visual or analysis. If IoT Analytics is not sufficient, you can use the wealth of services from Kinesis including Kinesis Data Streams, Kinesis Firehose, and Kinesis Analytics. Quicksight and Sagemaker Notebooks enable you to perform end to end IoT analytics and visualizations without leaving AWS.