A simple IoT architecture to Ingestion and Reporting

Published in

Opensanca

6 min readJul 10, 2020

Part II | Part III | Part IV | Part V| Part VI

Context

With the advent of 5G technology, many devices will be connected to the Internet and could be monitored, but to monitor these devices, the software development, and software architect has to follow this high demand of events and information, in an attempt to reach this objective I try to define a robust IoT (Internet of Things) architecture in this article.

One robust IoT architecture must explore some concepts and contain specific components, to make the architecture extensible, high available, and scalable because in this case, we must expect a mass incoming of information, the generation of customized reports, the flexibility of the data information, and too many other aspects.

Therefore, I will try to define and explore some aspects that I think are important in this kind of architecture.

Big Picture

The git repository of this project is here.

The architecture basically has three main areas:

1 — Ingestion

Ingestion is the heart of the architecture because the main responsibility is to consume the data sent from assets(sensors, cars, IoT devices in general), validate the data, and persist the information in a database.

Generally, the ingestion system uses the MQTT protocol to receive data.

MQTT stands for MQ Telemetry Transport. It is a publish/subscribe, extremely simple and lightweight messaging protocol, designed for constrained devices and low-bandwidth, high-latency or unreliable networks. The design principles are to minimise network bandwidth and device resource requirements whilst also attempting to ensure reliability and some degree of assurance of delivery. These principles also turn out to make the protocol ideal of the emerging “machine-to-machine” (M2M) or “Internet of Things” world of connected devices, and for mobile applications where bandwidth and battery power are at a premium. From: http://mqtt.org/faq

We can split this area, into three services:

Edge / Ingestion service

The main responsibility of this service is to consume the data sent to the MQTT broker and verify whether the information is valid to send it to the Time series of persistent service. The service uses an MQTT broker, the incoming message will be sent in JSON, with the following example of payload:

{
    "id": "2", 
    "tenantId": "1", 
    "token": "321", 
    "timestamp": 123213123, 
    ... custom fields
    "temperature": 27.3, 
    "memory": 36.3
}

In this example of payload, the mandatory fields are:

id: is the identifier of the asset
tenantId: is the identifier of the owner of the asset
token: is the secret key, which validates or invalidate the message
timestamp: the moment when the data is collected
custom-fields: these fields represent the data about an asset, in case of this could be: speed, acceleration, the voltage of the battery, temperature, air pressure, etc

Time series persistent service

This service is responsible for the persistence of information of incoming data from Edge / Ingestion service, this data will be persisted in one database, in a scenery where the time is fundamental information, the most recommended type of database is a Time Series Database (TSDB), in this case, the chosen for architecture was InfluxDB.

InfluxDB is an open-source time-series database (TSDB). It is written in Go and optimized for fast, high-availability storage and retrieval of time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics.

The main Influxdb concepts are:

Measurement: A measurement is loosely equivalent to the concept of a table in relational databases. Measurement is inside which data is stored and a database can have multiple measurements. A measurement primarily consists of 3 types of columns Time, Tags, and Fields
Time: A time is nothing but a column tracking timestamp to perform time-series operations in a better way. The default is the Influxdb time which is in nanoseconds, however, it can be replaced with event time.
Tags: A tag is similar to an indexed column in a relational database. An important point to remember is that relational operations like WHERE, GROUP BY, etc, can be performed on a column only if it is marked as a Tag
Fields: Fields are the columns on which mathematical operations such as sum, mean, non-negative derivative, etc can be performed. However, in recent versions string values can also be stored as a field.
Series: A series is the most important concept of Influxdb. A series is a combination of tags, measurement, and retention policy (default of Influxdb). Influxdb database performance is highly dependent on the number of unique series it contains, which in turn is the cardinality of tags x no. of measurement x retention policy

Asset Service

The asset service is responsible for maintaining the Assets of the system(Sensors, cars, IoT devices), each asset is composed of the token, id, and tenant to send data to Edge Service. All the assets will be persisted in a document database, in this architecture the chosen database was MongoDB

The token information has the responsibility to begin the signature of an IoT device, each device has a unique signature, in this way another device can’t send data to Edge / Ingestion Service impersonating a specific device.

The tenantId information represents a company identification, in the scenery where we probably will have many companies running our system, this information is extremely important.

2 — Reporting

Reporting will be the component responsible to show/generate information and send notifications about the assets.

The recommendation is to split the area into three services:

Time series aggregation service — offline information

This service is responsible for query the data of the assets, the service can return data in the following formats:

raw data: the representation of raw data sent from the IoT devices
aggregated data: Aggregates data creates aggregated summaries of numeric time-series data and provides interfaces to read them. This allows applications to retrieve smaller data sets that cover a long time range with much better performance than processing all the raw time-series data.

The following filters will be provided by API:

from: start date to query the time-series, the format MUST be Zulu Time or using the keyword now()
to: end date to query the time-series, the format MUST be Zulu Time or using the keyword now()
selectCriteria: criteria select to query, aggregation functions can be used(mean, man, min, etc)
intervalValue: group by interval value, for aggregation results
intervalUnit: group by interval unit, for aggregation results

Stream time-series service — online information

This service is responsible for a stream (online) of the last result from assets, with this service is possible to monitor near to real-time, what is happening to a specific asset.

Rule Service — system triggers

The main responsibility of the Rule service is to notify the owner of the asset when a rule is triggered. The notification can be sent with an e-mail/SMS/mobile (external systems) application notification. Above some examples of rules:

a car reached the 10000 kilometers and a new revision must be done
the temperature of a sensor reached a specific value
the speed of a car is getting high

When any of these conditions has reached, the rule automatically triggers events, the service acts by detecting the overshooting or undershooting of a defined threshold value.

All the rules can be defined with the exact threshold value in the rule configuration and it’s important what each asset has its own rules

3 — Embedded system

The Embedded system is responsible for the sent information from IoT devices, with the assets information (id, token, tenantId) the device sends a JSON to Edge / Ingestion Service. In this example of architecture, we chose the Arduino hardware to create the embedded system, because of the low cost of each board.

Embedded service — Arduino

The embedded system is responsible to connect to an MQTT broker and send a JSON with all the information catch from their own sensors, in this example of implementation, we are collecting the following information:

temperature
pressure
humidity

The initial implementation only connects using WiFi connections, but in the future will be changed to use EDGE(2G) connections.

After sending information to the MQTT broker, the embedded system has the responsibility to put the device on a sleep time and after this time, the device wakes up and starts the streaming again.

Final considerations

This architecture is only one case of how to approach the problem, many aspects could be changed or overridden, but the concepts of one IoT system architecture probably will be based on this type of approach.

In the next articles, I will cover more technical aspects of the architecture, stay tuned.