A panoramic overview of the IoT architecture building blocks on AWS (Part two)

Fabrizio Gattuso
Nordcloud Engineering
9 min readJun 6, 2022

In the first part (you can find the previous article here), we already discussed the different elements that are involved in the data generation and the communication between the IoT network with the cloud on the edge layer. In this article, we are going to discuss in detail the whole architecture components that are involved in the cloud section.

Ingest, store and analyze

This section of the system architecture is probably the most crucial after the IoT network. Every use case or scenario has different requirements, in some cases, it is not critical if the telemetries are duplicated, lost, or not received in the same order, in other scenarios you can’t miss telemetries because they are too sensitive for the system. To enforce a reliable ingestion process, it is important to design a pipeline that can scale fast and be resilient against potential problems. Usually, the telemetries received from the field are small data chunks where the traffic frequency can scale easily, pushing a lot of information to the cloud. To scale on the occasion of a possible pick, it is highly recommended to use serverless/managed services, accurate estimations are hard to perform.

AWS IoT Core and Sitewise can apply rules for every new telemetry received and transfer the traffic to the Kinesis suite to save, redirect and analyze the flow received. The kinesis suite has different services that can be potentially used in this architecture, the first candidate is Kinesis Data Stream. Until last year, it was necessary to provision Kinesis Data Stream capacity using shards. A shard has two main measure units: capacity and throughput. Data in the shard is ordered, but between shards is not. To not incur eventual problems during the ingestion phase, the user had to estimate the data flow and deploy the number of shards needed. AWS now offers an on-demand version of the service, giving the clients a data channel that scales on undefined workflows. Data passing through the service is stored in the shards for up to one year, but typically a value between one and seven days is configured.

Why save the data received in this service? First of all, it is essential to replicate telemetries in different architectural sections but the main reason behind this choice is the ability to rerun the ingestions process. In case of something happening during the workflow, relaunching the process with the same troubling data is useful for tests and debugging purposes. Data Stream can redirect data to different services such as ingestion lambda, EC2 and container services.

The second service of the Kinesis family is Data Firehose. This architecture element can receive incoming data from IoT Core and store data to S3 for backup purposes or data lakes formation.

A data lake is a collection of different data formats available for machine learning models and AI algorithms such as SageMaker. Data can also be processed using an extract transform and load (ETL) service such as Glue and for fast analysis is suggested to run SQL syntax queries using Athena. In order to build exploratory dashboards, Quicksight provides a drag and drop interface to simplify the development phase.

In the case of S3 data being used as backup only, analyzing the different S3 tiers will optimize the storage cost.

Ingestion and offline analysis workflow

Another essential element of this section is how to store telemetries to present them, to the end-user, in an efficient way. We already see how we store raw data in two different points: temporarily in the Kinesis Data Stream and for long-term purposes in S3. It is time to think about which database is the right solution for the IoT use case but in particular for your business scenario.

SQL for the nature of the data (limited or no relations between telemetries, schemaless and a high number of elements) is not a suitable choice. The most relevant technologies are the time-series DB and no-SQL databases.

Probably the most used time-series DB is InfluxDB, an open-source solution used in numerous applications. This DB engine also comes with a full suite of services (TICK stack) such as Telegraf, Chronograph and Kapacitor to store, analyze and apply rules to the stored data. The time-series DB technology allows fast ingestion and retrieval of the data based on the creation timestamp and optimizes storage capacity usage. The main problem with this solution is the cost of the management, managing an EC2 machine or a container by yourself is time-consuming and the right skills are needed. An alternative solution is to use InfluxDB cloud on AWS to lighten your duties but with an increase in the cloud cost.

Another feasible solution is Amazon Timestream, the time-series DB created by AWS, which offers a fully managed experience but is still a not consolidated solution.

The last possible product is a no-SQL DB such as DynamoDB. The most important element in this choice is how you design your data model and the user access pattern. Well-designed DynamoDB tables are valid solutions, leveraging the full integration of the AWS environment, the ability to scale on-demand, highly replicated data at the global level, and the enormous amount of further features can change your mind.

Various solutions for the database layer

An important factor to consider in an IoT application is the nature of these solutions. These systems are heavy reading, if the user and queries numbers of the platform scale fast it is recommended to consider a cache system in front of the DB. For Influx and Timestream, Elasticache for Memcached or Redis are valid ideas, for DynamoDB is mandatory to consider Amazon DynamoDB Accelerator (DAX).

Nowadays other scenarios are emerging, where traceability and immutability characteristics of your data are essential elements. In these cases, recent technology such as the blockchain is revolutionizing sectors like supply chain and asset tracking. A valid alternative to the standard database solutions is a private blockchain or a hyperledger on the cloud. For these particular cases, Amazon Managed Blockchain should be taken into consideration. However, these systems are slow, and used in conjunction with standard databases is necessary to increase the system response time.

End-user interaction

IoT applications usually are web interfaces focused on historical and real-time data shown in a dashboard view. In some specific scenarios and use cases, the user also needs to interact with the underlying network by sending configurations or commands to change the current state. A prototype or an internal application can be based on prebuilt dashboard services. This is useful for the initial phases of the project where it is essential to focus on the foundation level of your application but at the same time have control over the incoming data.

The first possible service and probably the simplest one is Amazon Quicksight. This software uses S3, Redshift, RDS and IoT Analytics as data sources. It is customizable and straightforward to use.

The second solution is Grafana. It is probably the standard for all the companies or business products that don’t want to invest money in an extensive web application. This open-source application offers to the customers: charts, graphs, and alert systems. It works stand-alone or can be embedded in a web page. The service usually runs on top of a container or EC2 instance but AWS offers a fully managed solution that takes care of all the infrastructure aspects.

Amazon Quicksight IoT dashboard example
Grafana IoT dashboard example

To get the most flexible customization and to offer a high-level application class, it is always preferable to build a web application solution from scratch. Nowadays the frontend and backend environments are saturated with frameworks, graphs libraries and public shared experience on how to build IoT dashboards. Amazon, for example, offers its help in the development phase of a web or a mobile application with AWS Amplify. This tool includes built-in libraries to connect your application to the AWS world in an effortless way. However, it is crucial to take into consideration some variables before jumping into the implementation phase.

IoT applications as we understood are heavy reading systems, often the different views you have to present to the end-user evolve fast, especially in the early phases. For these reasons, it is a good idea to take into consideration a GraphQL-based API. This query language for APIs offers fast responses to the frontend requests and it is not required to focus on tedious JSON schemas for every endpoint as we were used with the REST APIs.

AWS AppSync is the managed API gateway for GraphQL. It offers a single endpoint architecture able to connect the frontend application with multiple databases and services. Subscriptions enable real-time communication between the backend and the frontend, a must for IoT applications with a high-frequency data production or with the need to send commands to the devices.

It is even possible to build a solution with the classic REST APIs using the well-known Amazon API Gateway. To enable real-time communication, it is mandatory to establish a WebSocket channel.

Another important point is to minimize the time access to your application. Customers are everywhere in the world and caching static and most frequent pages of your application improves the end-user experience. A Content Delivery Network (CDN) optimizes the communication with the clients and protects the public access to the internet.

Amazon Cloudfront is the fully managed CDN from AWS that improves the application time response, blocks application-level attacks using the web firewall (WAF) and runs simple operations at the edge using AWS Lambda at Edge or Cloudfront functions.

In the following picture is proposed a possible architecture design:

IoT web application proposal design

Security

Security is a topic that should be addressed at each level of the architecture: from the IoT network to the end-user. Each area has its peculiarities and can be exploited by an attacker but the most vulnerable components are the IoT network and the ingestion layer. In the last years, we learned about famous attacks on devices with the end goal of transforming the IoT network into a botnet. It is essential to limit the access to the devices from the external. For this reason, where it is possible, establishing a single entry point to the system is highly recommended.

AWS IoT Greengrass and Amazon FreeRTOS already have libraries and frameworks to enforce security in the devices and detect anomalous behavior in the network.

AWS IoT Device Defender continuous monitoring and analyzes the device configuration and behavior to make sure they are not deviating from security best practices. When an alarm or a problem is detected, a notification is sent to Cloudwatch, SNS or the IoT console.

It is also critical to secure the link channel between the IoT network and the cloud. Every communication should be established using an SSL/TLS communication using valid certificates. A second security layer can be added using a VPN connection and/or a Direct Connect link to establish a private connection to the cloud. In case the budget is a concern, a site-to-site VPN is enough.

The public internet-faced services are also another critical section of the architecture. It is extremely important to enforce AWS WAF in all the services at the application level such as API GW, App Sync and Cloudfront. The firewall is able to block all the major attacks on the web and filter undesired IPs. Only the necessary services should be exposed to the public network and using Cloudfront, application load balancers (ALB) and enforcing HTTPS connections can save your nights.

In case you are looking for more secure in-depth architecture, services such as Guard Duty and Security Hub, provide checks and suggestions on where and how you should improve your security.

Attacks or bad user behavior should be prevented not only from the outside but also from the inside. Supposing that data lives in a data lake and sharing with other stakeholders is needed, it is important to regulate the access via an API, give read-only access to the data or replicate the original data lake. Using strict IAM roles and policies can limit access to important data, Cloudtrail instead gives you an overview/audit of the user access to the different services.

Conclusions

IoT systems are not easy to design and implement. Numerous considerations, different compromises and constraints make the journey, not a simple task. AWS offers a stunning number of already built solutions to make your life easier and to help you during your implementation. The fundamental suggestion is to start small, learn from the mistakes, and understand what brings business value and satisfaction to the end-users. Leveraging on already made services and serverless solutions offers the possibility to build a Minimum viable product (MVP) in a limited amount of time. The complexity and the customizations will come later and don’t forget to enjoy what you are doing.

--

--