How Do Serverless Azure IoT Solutions Scale?

Published in

The Startup

6 min readAug 28, 2019

In my last article, I wrote about how to create a simple load test for your IoT solution using Pulumi and Azure Functions. Now, I will dig deeper into the scaling behavior of a simple serverless IoT solution based on the Azure IoT Hub and Azure Functions.

The tested IoT solution is composed of 4 major components:

The Azure IoT Hub is the first to receive telemetry messages from IoT devices
An Azure Function reads the incoming message stream from the IoT Hub, transforms and stores the data into the storage
An Azure Storage Table as the long term persistence layer for telemetry messages
A device simulation to create load in order to verify how the rest of the system behaves under high load

Setting up the device simulation

Since we only want to experiment with the scaling behavior of the components of the solution and don’t try to implement a realistic use case from a data model & business logic perspective, we are using the Device Simulation IoT solution accelerator provided by Microsoft. While this allows us to simulate relatively high load without having to do any coding, this approach might not be flexible enough to implement load tests in real world scenarios.

Setting up the IoT solution

We will be using the following Pulumi program to set up the basic IoT solution in Azure:

Performing load tests to evaluate scaling behavior

Our load test will consist of 4 different scenarios:

Single message processing — the Azure Function will process only 1 message at a time
Batch message processing — we’ll see how much more performance we can get by allowing the Azure Function to process batches of messags
Adding a downstream service — we’ll measure the impact of actually storing the received messages into a database from our Azure Function
Adding more partitions to improve scale out — we’ll measure how much performance gain can be expected by adding more partitions to the Azure IoT Hub

Single message processing

In order to establish a baseline for the following scenarios, we’ll run a load test against the IoT solution as shown in the code snippet above. The Azure Function contains no actual business logic, but gets triggered for every message arriving at the IoT Hub — and this will lead to performance metrics being captured by Application Insights.

For this scenario, we’ll set up a load test that simulates 10.000 devices, each sending a message every 10 seconds, over a period of 10 minutes:

This chart, which is rendered using the metrics provided by Azure IoT Hub, shows that the load test needs about 2 minutes to get to the full load, and then continues to send the expected 60.000 total messages per minute.

The metrics gathered by Application Insights for the Azure Function processing those messages show, that using this approach this load cannot be handled appropriately:

Even after some minutes during which the Azure Function scales up to handle the load, it only handles ~ 20.000 messages per minute, and takes about 40 minutes to handle all the messages sent during the 10 minute load test.

The picture also shows that the Azure Function starts processing on a single host, then scales up to 2 hosts and later to 4 hosts — which was to be expected, since the Azure IoT Hub uses 4 partitions by default.

Batch message processing

We’ll now change some bits of the code presented above to make the Azure Function process more than just a single message per call, and we’ll also add some more code to track a custom metric to Application Insights allowing us to monitor how many messages have been processed in total.

Since using batch processing (as we will see) enables us to process more than the 60.000 messages per minute sent in the first scenario, we up our game and simulate twice as many devices: 20.000 devices sending 120.000 messages per second in total. We’ll work with this load for the rest of the article:

Looking at how many times our Function gets called, we can see that each Azure Function host is now handling less Function calls per minute than before (only ~2.000 instead of the 5.000 we had before):

But on the other hand, each Function call is able to process 10 messages per call, after a short rampup time of about 40 seconds:

In total, the batch processing enables us to process ~80.000 messages per minute:

Adding a downstream service

Until now our Azure Function isn’t really doing anything except logging performance metrics. Lets add a Azure Table Storage as a database to store telemetry data to:

This brings our performance down from the previous 2.000 Function calls per minute per host to ~600. Since still every call processes 10 messages per call, we end up processing ~25.000 messages per minute in total.

Adding more partitions to improve scale out

While being able process 25.000 messages per minute with such a simple setup seems rather impressive, our performance is obviously restricted by the amount of hosts the Azure Function decides to scale out to. And this amount is tied to the amount of partitions our IoT Hub is using. Let’s bump that number up from the default of 4 to 16 partitions and see what happens. (Unfortunately, the number of partitions cannot be changed after the IoT Hub was already created, so I had to recreate the Iot Hub with more partitions).

First of all, we see that the Azure Function indeed scales out to 16 hosts. But each host only handles ~ 350 requests per minute instead of the 600 we had before. Still, we end up with an improved performance of ~60.000 messages per minute being processed:

Conclusion

With only 2 simple changes we can improve the performance drastically:

Using batch processing of messages
Optimizing the number of partitions used in the IoT Hub

On the other hand, we also learned that

Adding downstream services impairs performance massively — but without them, our solution isn’t really doing much 😜
While the scale out of Azure Functions is linear to the number of Azure IoT Hub partitions, the performance is not — we increased the partitions from 4 to 16, while the number of messages we processed “only” went from 25.000 to 60.000
Also the batch size does not linearly improve performance — going from from a size of 1 to 10 improves performances by a factor of 4. Nevertheless, additional improvements could be possible by playing around with the batch size

And for those that seek to further optimize throughput, here is another great article on optimizing throughput when using Azure Functions and Event Hubs (which are similar to the IoT Hub).