Best Practice for an IoT analytics platform

Giacomo Veneri
digitalindustry
Published in
6 min readFeb 20, 2019

Originally published at jugsi.blogspot.com.

Best Practice for an IoT analytics platform

As showed in our last book Hands-On Industrial Internet of Things to build an analytics platform we have to consider 7 principles:

  1. Data Availability
  2. Consuming Data
  3. Execution Partitioning
  4. Time Ordering Principle
  5. Statefull vs Stateless
  6. Additional information (such as Asset information)
  7. Ubiquity of Analytics

Data Availability

On IoT when we speak about “data” we refer to “time-series” or similar. Timeseries feed the platform with different approaches:

  • streaming : continuously ingesting of data
  • micro-batch processing : every minutes small portion of data
  • macro-batch processing : every hour big portion of data
  • on demand : when a particular event occurs
  • on data changed : only when the data changes the value

Fig. 1: data availability

Implementation

Data streaming is normally implemented using queues: RabbitMQ, Kafka or MQTT.

Micro Batch is normally implemented using a scheduler (eg. using cron expression).

Macro Batch is very simular to Micro Batch, but is leveraging on Big Data technologies.

On demand and on data changed require a mechanism to trigger the execution of the analytics when data is available.

These modalities imply different approaches to consume data.

Consuming Data

On the book Hands-On Industrial Internet of Things we differentiated between:

  1. Hot Path : data is immediately processed
  2. Cold Path : data is stored and organised on a low latency database (eg. timeseries database)
  3. Big Data Path : data is stored on a data lake avoiding any preprocessing (eg. S3, raw data, HDFS, Parquet, Hive, …)

Fig. 2: consuming data

Implementation

Hot Path is normally used for data streaming analytics, such as simple threshold rules and/or anomaly detection. Indeed, these analytics do not require big amount of historical data, but only the last data point. To be honest, Azure has implemented a very smart mechanism called “windowed data processing” see also Hands-On Industrial Internet of Things.

Cold Path is normally used for the analytics requiring small bunch of data (10mins — 5 hours) of a specific equipment (Asset Performance Management), but have to process data with low latency.

Big Data Path is normally applied for analytics working “at fleet level” (eg. comparing performance of different equipments) and do not require pseudo-real-time result. In other words we need to wait that all data from all monitored equipments are available to trigger the execution of the analytic.

These paths imply different ways to execute the analytic in parallel.

Execution Partitioning

Let’s introduce the concept of “asset”. Asset is

“something valuable”

… ok 😏I know it is not very useful.. Let’s introduce the “IoT Asset”

“Asset is the equipment or system we need to monitor to evaluate health, efficiency and performance.”

Assets are normally organised hierarchically:

  • Company/Municipality : ACME ltd, Florence
  • System : airplane, car, plant, industry, train, truck, house
  • Sub-System (optional) : line, train, …
  • Equipment: jet engine, wind turbine engine, pump, lube oil
  • Signal (or Measure or Tag): furnace temperature, car speed

a more comprehensible example:

  • ACME Refinery :: company
  • Production Train #1:: system
  • Valve #1 :: valve extends equipment
  • Valve #2 :: valve extends equipment
  • Furnace #1 :: furnace
  • Inlet-Temperature :: measure
  • Power Generation :: subsystem
  • Turbine LT200 :: turbine extends equipment
  • Rotation :: measure
  • Power Generator #2 :: power extends equipment
  • Production Train #2:: system
  • Storage :: system

normally analytics work at equipment or system level so we can analyse measures of Valve #1 and Valve #2 in parallel because they work independently. In other words, we can attach the same analytic to valve #1 and valve #2, and this analytic can run in parallel on valve #1 and valve #2

Fig. 3: parallel map

Implementation

This simple assumption simplifies our architecture, because the same analytic can run in parallel over thousands of assets. So that we can identify 3 scenarios:

Independence of assets : in this case we can deploy multiple instance of the same analytic over multiple asset without consequence.

Independence of measures (tags) : similar to asset, but applied to tag. We can deploy multiple instance of the same rule over multiple tags (of the sam asset) without consequence.

Analytics for full fleet : we cannot leverage on Independence of assets/tags. In this case we need to apply another parallel approach such as Big Data Map Reduce.

Time Ordering Principle

The second assumptions is the time ordering: we cannot process a timeseries in a wrong order. An analytic processing the data of second January before first January should raise an wrong alert or a wrong result.

Fig. 4: ordering

Implementation

The orchestrator of an IoT Analytics platform should respect this principle. Big Data platform, for instance, such as Map Reduce Hadoop do not take in consideration this constraint and we need to apply countermeasure. On the contrary, queue can respect the time ordering.

Notice (my personal opinion) : using standard Big Data Platform for IoT should be not the right choice. We can leverage on this platform only for fleet purpose analytics with high latency requirement or for data explorative analysis.

There are few cases when this principle can be ignored, for instance simple rule or stateless analytics.

Stateless vs Stateful

Consider an analytic counting the number of shutdown day by day. This analytic needs to know the status of the previous day to continue his work. In other words a analytics can require to save the status of the previous run.

Fig. 5: stateful

Implementation

To implement a stateful mechanism, we can save the output of the previous run and passing it as another input to the next run.

In other words, we pass the status as an additional information.

Additional Information

Analytics can require to know additional information related to asset, eg. installation data, type of asset, configuration fo the asset, etc. This information is normally known as “asset metadata”.

Implementation

To implement this additional information we can pass as additional input the asset’s metadata acquired from a standard database. An example of asset database is AnchorDB .

Ubiquity of Analytics

Let’s speak about cloud, edge or on-premise. Analytics can run on cloud (leveraging on the maximum computational power), on premise (leveraging on data centre computational power) or on edge (very close to data with low latency). There are some circumstances where we need to deploy analytics on edge/on-premise from the cloud, or to call analytics running on cloud from the edge.

Implementation

The fastest way to achieve this goal is to leverage on container based technologies (docker) and micro-services.

Final platform

Given our 7 principles, we can implement an IoT platform taking in consideration the following components:

Fig. 6: the proposed platform

For instance Azure and AWS allow orchestration using Lambda or Azure Function. On Hands-On Industrial Internet of Things we proposed an example with airflow. GCP, Azure and AWS leverage in streaming analytics to implement simple or complex rule. Azure supports a very interesting feature for windowing (see Chapter 12 of Hands-On Industrial Internet of Things ). GCP, AWS and Azure proposes a Microservices architecture for analytics based on Docker.

--

--