Robust IoT platform based on distributed MQTT, Kubernetes and InfluxDB

Published in

ELCA IT

7 min readOct 19, 2021

Authors: Alexandre Gilles-Lordet, Florent Martin

The data privacy in eHealth is a major concern. Most of our clients aims at keeping ownership of the data. That’s why, as an IT service company, ELCA has created an IoT platform fullfilling the specific needs of the healthcare industry. This platform is based on open source software (K3S, Mosquitto, InfluxDB) and thus can be easily deployed on most infrastructures, e.g. private or public clouds or on-premise. The platform maps data produced by sensors (we focus on simple medical devices) to downstream consumers which implement health communication standards, and are able to analyze, display and provide insights to the medical staff and the patients.

High level architecture for ECG use case

Think for example of an hospital where electrocardiographs and oximeters are worn by patients. We want to be able to gather sensor data, compute the patient’s heart rate with the electrocardiograph’s data and present the results to the medical staff. The sensors connect through various protocols (Wi-Fi, Bluetooth, LoRa) which are not specifically dedicated to cloud communication, that is why they are positioned beyond edge gateways. Gateways are connected to the core of the platform where the data is processed and stored. These edge gateways are small ARM boards that have Bluetooth and Wi-Fi connectivity, plus network access. We have successfully tested Raspberry Pi 4 but this might change for certification reasons. From a large perspective, the architecture of the platform looks like this:

In this article, we will focus on data ingestion and security features.

Kubernetes everywhere

At ELCA, we want our platform to be versatile, to support most of current and future devices and to adapt itself to the specific processing or protocols the downstream consumers may need. This is why we chose to build a highly modular platform based on Kubernetes. More specifically, each of our gateways and our server are operating a Kubernetes cluster. For the gateways, we are running k3s on Raspberry Pi 4. k3s is a lightweight implementation of Kubernetes specifically designed for IoT. For our testing, the server also runs a k3s cluster, although it could be any kind of Kubernetes distribution. It provides us a lot of freedom and stability when updating to a new release, and the separation of the clusters ensures that gateways are able to run autonomously in case of connection outage.

But this separation comes at a cost: managing dozens, perhaps hundreds of Kubernetes clusters at the same time. Fortunately, Rancher (the company behind k3s) provides Fleet, a management system designed to manage up to a million clusters using GitOps. So we just need to put our Kubernetes definition files on a Git repository, register our clusters on Fleet, and stop worrying about manual deployments.

Distributed MQTT

MQTT is a well known Pub/Sub protocol in IoT, and we use it for asynchronous messaging between our modules, and between the gateways and the server. By subscribing and publishing to the right topics, a module can communicate with other modules, and we can easily reconfigure the data path when adding new modules. We chose Mosquitto for this purpose.

Let’s start with a basic connection between a device and a gateway:

Here, the interfacing module manages the Bluetooth connection with the device (1), and translates its data to MQTT messages sent to the gateway MQTT broker (2) on the topic “toServer/deviceData”. Finally, the gateway broker forwards the messages on the topics under “”toServer/” to the server (3).

Let’s now add some edge processing capabilities to our gateway, we just need to create a new module, deploy it to the gateway and re-route the messages:

Now, the interfacing module publishes on “toDataProcessing/deviceData” (2) and the data processing module subscribes to “toDataProcessing/deviceData” (3) and publishes to “toServer/deviceData” (4).

The server works similarly: it has its own Mosquitto MQTT broker, which forwards the messages between the different modules which I will present later.

To link the gateways and the server, we use a very useful feature of Mosquitto, MQTT bridges. Basically, the server broker subscribes to a specific topic on the gateway broker, and so does the gateway broker on the server broker. It means that we can have bidirectional communication between the server and the gateway modules as if they were connected to the same MQTT broker.

Here is the part of our mosquitto.conf where we setup the MQTT bridge:

connection gateway-to-server
address mqtt.server.lan:1883
topic # out 2 toServer/ fromGateway/

The third line means that the server will subscribe to all the topics under “toServer/” and will forward them under “fromGateway/”. For now, there is no need to send data to the gateway from the server, we only setup a one-way bridge with “out”.

Another key advantage to this MQTT Bridge is that even if the server is subscribing to the gateway, the gateway is initiating the connection. It is very useful to us as our server does not need to keep an updated list of all the gateways it needs to connect to.

Server Architecture

We chose to store our data inside an InfluxDB instance. InfluxDB is well adapted to our needs, as it stores time series (which is what our sensors produce), it can take a high ingestion rate, and provides dashboards and alerting out-of-the-box. InfluxData also provides Telegraf, which is configured to collect metrics and insert them in InfluxDB. Luckily for us, it has a MQTT input plugin, so we just have to tell it to subscribe to the right MQTT topics and it will inject the messages into InfluxDB, provided the payload of the MQTT messages is formatted with the InfluxDB Line Protocol (which is what our interfacing modules in the gateway generate).

This is the configuration file for Telegraf, which is all we need to bridge the gap between the server MQTT broker and InfluxDB.

[agent]
metric_batch_size=5000
metric_buffer_limit=500000[[outputs.influxdb_v2]]
urls = ["http://influxdb2"]
organization = "influxdata"
bucket = "default"
token = "${INFLUX_TOKEN}"[[inputs.mqtt_consumer]]
servers = ["tcp://server-broker:1883"]
topics = ["fromGateway/toInfluxdb"]
qos = 2
connection_timeout = "30s"
persistent_session = true
client_id = "telegraf"
data_format = "influx"

Eventually, here is the full data path:

Once the data are stored in InfluxDB, we have a last module providing an interface for the standard protocols HL7 and FHIR which could be extended with other protocols.

Security considerations

IoT security is a crucial topic, and it is highly critical in the health sector. Thus, our platform must ensure the data confidentiality, integrity and availability (the famous CIA triad). In this context, we faced the challenge of identity verification and propagation for multiple distributed gateways.

The first thing we wanted to ensure is that all the communications are using mTLS (mutual TLS). To do that, we built a public key infrastructure with a root key and two intermediary keys, one to sign the server certificates and one for the gateway. The signatures are done with cert-manager and a Hashicorp Vault instance located in the server.

On each cluster (server, gateways), we installed linkerd with its auto-injection feature to provide mTLS between the cluster’s modules. We also added Ambassador API gateway on the server to force mTLS on every connection from the outside to the server. Finally, we set the Mosquitto MQTT bridge configuration to also verify the server certificate.

Below is the schema of the Key Infrastructure:

Another important point is the gateway provisioning and authentication. As explained before, we use a TLS certificate to check the identity of the gateway when opening the MQTT connection. When setting up the gateway, we push a Vault AppRole token on it, which is allowing it to sign its short-lived TLS certificate through cert-manager (renewal every 1h).

The schema below details the architecture and components involved in this process.

This operation of putting the token in a kubernetes secret on the gateway is done once at configuration time on the Configuration Bench. The Configuration Bench is a trusted system used to configure the gateways before deployment at customer. The advantage of this method is that we can easily revoke the access of a gateway by removing its AppRole token in Vault.

Conclusion

Using open-source components, at ELCA we have created a powerful and extensible platform architecture targeted at healthcare IoT (e-Health). This platform can be tailored to the needs of our clients. We focused on the security of the platform.

We consider many improvements in the future, such as friendlier device and gateway management, using smartphones as gateways, and demonstrating an end-to-end real-life use case with medical-grade devices.

In the follow-up articles, we will talk about how we monitor the edge gateways to detect failures and also of deployment with Rancher Fleet..

Monitoring multiple Kubernetes clusters in IoT

Kubernetes cluster monitoring with Telegraf and InfluxDB

medium.com

Using Rancher Fleet to deploy an IoT Platform and edge modules | by Florent Martin (ELCA) | ELCA IT | Feb, 2023 | Medium

Acknowledgments

This article is based on the work of ELCA IoT team with the contributions of Alexandre Gilles-Lordet, Paul Bernet, Sébastien Morand, Daniel Raimundo, Sonia Duc and Marco Maccio.