Filtering Azure Event Hubs events with low latency

Published in

DataReply

6 min readDec 6, 2021

Azure Event Hubs lets you ingest millions of events per second and process them in real-time. It implements many of the core concepts found in Apache Kafka (e.g. distributed log, means for stream processing, connectors). Its notion of ‘namespaces’ corresponding to Kafka Streams and an ‘Event Hub’ is the equivalent of a Kafka topic. Moreover, it provides several connectors and is well integrated with other Azure services such as Blob Storage or Azure Data Explorer.

Thus, storing events with Event Hubs Capture or building analytical apps with Azure Stream Analytics is relatively simple. However, filtering events with sub-second latencies is still a challenging task.

In this article, I would like to present different architecture approaches you can use to filter events with low latency. First, let’s start with the problem definition.

Problem

For one of the projects, my team had to visualize events coming from an external service. The service was sending telemetry data from ~2000 sensors. However, the data was not received from sensors directly. Instead, it was collected, enriched, and streamed to Azure Event Hubs using Avro serialization format.

We were tasked to make the data available for visualization with the lowest latency possible. Visualizations were eventually rendered by a 3rd party desktop app. Analysts were using their instance of the app to examine the data coming from 10–20 sensors simultaneously. Moreover, they could switch between different sets of sensors.

The main challenge for us turned out to be the filtering of raw events. Azure does not have a ready-to-use solution that can filter Event Hubs events and propagate the results to clients. So, we had to build a service that could process events with the following characteristics:
partition keys (sensors): 2000
partitions: 4
messages: 3500/s
throughput: 7mbit/s = 0.875 mbyte/s
keys to filter(sensors to visualize): 20

The section below summarizes some ideas we considered during the design phase of the project.

Idea #1: Consume the raw stream

Using this approach, client applications can consume messages from the EventHub directly.

Event Hubs support Session Access Signature (SAS) authentication. Thus, each client can receive its own ‘connection string’ which allows reading directly from the stream. However, this approach has some issues.

Firstly, there is no filtering functionality in Event Hubs. While it is possible to set up a filter with Azure Stream Analytics, Stream Analytics doesn’t support Avro. This means clients need to get all the data and filter it on their side.
Secondly, the Event Hub’s outgoing traffic is multiplied by the number of clients. And so do the costs.
Thirdly, clients need to make sure that their network is capable of handling the throughput.
Finally, each client needs a dedicated Event Hub consumer group to be independent of other clients. The current limit is 1000 consumer groups per Event Hub in the most expensive Dedicated tier.

To overcome these issues we need to have one consumer that can filter the raw stream and serve the results to clients. This led us to the next idea.

Idea #2: WebSockets

This idea is based on a custom application that utilizes an Event Hub consumer and supports WebSocket connections.
The client can call the WebSocket URL with the list of sensors it needs. The consumer can filter the sensor data according to the list and pass it to the WebSocket connection.

Using consumer and WebSockets in one app

This approach solves most of the problems from Idea #1. It also ensures that the latency is low as there is the least number of components the data travels through.
However, this architecture has serious scalability issues. If we increase the number of consumers, we need to implement complex routing logic to make sure clients can receive sensor data from different partitions.

The same problem occurs when we need a highly available application. For example, if we have two consumers running on different machines, each of them should have its own consumer group. Otherwise, clients may get duplicate or not all of the filtered results. Thus, in case the WebSocket connection interrupts, we need to ensure the client connects to the same app instance it was using before.

A potential solution would be to decouple consumer and WebSocket logic. That is why the in-memory database Redis was suggested as a middle layer between the two apps.

Idea #3: Redis

Redis has proven to be one of the best tools for the application caching layer. On top of caching, it provides Publish/Subscribe messaging functionality, which can be used to overcome issues found in approach #2.

Instead of connecting to the consumer directly, clients can use a dedicated WebSocket API. Redis can serve as a middleware, gluing the consumer and WebSocket applications.

The simplified workflow may look like this:
1. The consumer reads from the raw stream and ‘demultiplexes’ telemetry stream to Redis channels based on the sensor id.
2. The client connects to the WebSocket service and passes the list of sensors it wants to listen to.
3. The service subscribes to Redis channels and propagates messages to the WebSocket.

Using Redis to glue consumer and WebSocket app

The major advantage of this approach is that consumer and WebSocket APIs are decoupled. This allows us to scale the consumers without the need to care about the routing logic for client requests(the problem we had with Idea#2).

The downside is that we need to manage two applications: Event Hub consumer and WebSocket APIs. Also with two extra components, the latency will increase. Thus, we started looking a the service that can help us reduce the number of steps in the data flow.

Idea #4: MQTT Broker

MQTT is a lightweight network protocol designed specifically for device communication. With MQTT you can scale to millions of devices and provide a reliable and secure way of transferring data.
MQTT brokers connect IoT devices with the consumers of telemetry data. Its main features include authentication, device health monitoring and last message retention. MQQT Essentials written by HiveMQ provide a very good overview of the protocol and broker capabilities.

Azure offers a managed service with MQTT support — Azure IoT Hub. However, this service doesn’t really fit our use case.
Firstly, the sensors can’t send telemetry directly. Instead, the data is streamed to a single Event Hub and there is no need for authenticating each sensor separately.
Secondly, we don’t need to communicate back to the sensors. So, such IoT Hub features as bi-directional communication or device updates wouldn’t be used at all.
Lastly, we want to achieve the lowest latency possible. If a lag is observed in IoT Hub, it is difficult to trace it without the help of Azure developers.

For our use case we determined that it would be best to run one of the open-source versions of the broker. We consider Mosquito to be a good candidate to try first. Mosquito broker is a light weight implementation of MQQT. It has all the features we need and is actively supported on Github. Even if hosting the broker comes with the maintenance cost, it will be compensated by removing the need to support Redis and the WebSocket apps.

Results

For the PoC we’ve implemented the WebSocket-based approach (Idea #2). It was straightforward to set up and we managed to achieve sub-second latencies. On average, sensor data was available for visualizations 150 ms after it had been sent to the Event Hub.

The production version is still under development. However, we need to support multiple Event Hubs with many desktop clients. Thus, we require a solution that scales well. So the MQTT version is coming soon. Stay tuned!