Consideration: Scaling Out Azure Functions With Event Hubs Effectively
Expected reader and outcome from this article
- Expected reader: Software Engineer/Cloud Solution Architect
- Outcome: Understand what you should consider for scaling out Azure Functions and Event Hub effectively
Motivation
Let’s imagine you need to design and implement smart factory solution. Currently you have small number of customers, but it should be scaled-out to handle huge number of messages (ex. 1M messages/min and more) in the future, because sales team aggressively get new customers. Your skip manager ask you to release this solution with smallest cost.
In this case, Server-less sounds sweet. You play around with Event Hubs and Azure Functions. It’s easy to start! , but… your ver.0.1 solution doesn’t handle enough number of messages and don’ know why. You want to know 1) scale-out background logic and 2) how to set it up with Azure resources.
These 2 points are popular questions which I’ve been asked :) Let’s go detail with step-by-step story based on my experience.
*If you are familiar with Azure, you may want to chose IoT Hub. But in the real world, I sometimes need to use existing messaging hub and Event Hub, so I write this blog based on the Event Hub.
Prerequisites
You don’t need deep experience about Event Hubs and Azure Functions, but I recommend to have following experience.
- Create Event Hub
- Send and receive messages with Event Hub
- Create Azure Functions project and run it in Azure
- Use Event Hub trigger in Azure Functions
Ver.0.1 Tutorial based solution which doesn’t scale-out well
You may create following resources based on tutorial document.
- 1 Event Hub namespace with 1 TU. It has Event Hub whose partition count is 2
- 1 Function App which has 1 function
It doesn’t scale out well and doesn’t handle enough number of messages. WHY? Let’s deep dive into scale-out background logic.
Function App can scale-out based on the number of partitions. In this case, your Function App can increase number of instances greater than 2, but only 2 instances will pull messages from the event hub partition. You realized that you need to increase number of Event hub partition.
Memo:
- This article describes Azure Functions and Event Hub scaling well.
- In the background, Azure Functions uses Scale controller to monitor the rate of messages and determine whether to scale out. You refer the scaling logic in the GitHub. For example, EventHubsScaleMonitor class works for Event Hub.
Ver.0.3 Observe scaling
Based on the document, you decided to increase the number of Event Hub partition. But, how do you observe whether the scaling-out actually happens ?
If you want to pick up easiest way, I recommend to use Application Insights Live Metrics Stream. When you create Azure Functions, Azure create Application Insights as default. You don’t need any additional setup ;)
Your current environment is below.
- 1 Event Hub namespace with 1 TU. It has Event Hub whose partition count is 8
- 1 Function App which has 1 function
You can observe number of instances (8 servers online) and it’s CPU/RAM usages! Great progress. You may want to see this 9min demo video for detail.
Ver.0.5 Separate Function App for utilizing instance resource well
Your team working hard to implement business logics. Azure Functions project has 10 functions now !
- 1 Event Hub namespace with 1 TU. It has Event Hub whose partition count is 8
- 1 Function App which has 10 functions
You realize each Azure Functions instance resource shortage (ex. near to 100% CPU usage) via Application Insights ! You thought each functions are automatically assigned unique resource, but when you add more functions in the project, resource shortage is worse and worse.. Why it happens and how to resolve this ?
Let’s focus on Why as 1st step. Let’s assume you uses Function App with Consumption Plan. According to this document, you found followings.
- Azure Functions infrastructure scales CPU and memory resources by adding additional instances of the Functions host.
- Each instance of the Functions host in the Consumption plan is limited to 1.5 GB of memory and one CPU
- An instance of the host is the entire function app, meaning all functions within a function app share resource within an instance and scale at the same time
- Function apps that share the same Consumption plan are scaled independently
Got it. Function App can scale-out by increasing instances, but all functions in the sample project shares the limited instance resources.
As next step, you start to consider How to resolve this challenge. At least, you need to create additional Function App. You have 2 options.
- Keep project structure, deploy same project to each Function App. Then, disable some functions via Azure Functions application configuration. With this setting, for example, you can set up Function App A runs function 1–5, and Function App B hosts 6-10.
- Separate project meaningfully and deploy each project to each Function App.
It totally depends on your team’s decision, but I recommend 1st one because 1) you can set unique batch size and other parameters by host.json for each Function App for more optimization and 2) you don’t need to create separated CI pipeline.
- 1 Event Hub namespace with 1 TU. It has Event Hub whose partition count is 8
- 2 Function Apps. Each Function App runs 5 functions
ver.0.7 Scale-up Event Hub namespace
Your team to start load test to this system. Event producers (IoT sensors..etc) sends over 2,000 messages/sec. Your team member realized event producer sometimes need to wait for sending new messages to Event Hub.. WHY?
This is because you’ve not assign enough resource to Event Hub Namespace. Event Hub Namespace is Cluster in the Kafka world. It is physicals border. You can scale up it by assigning more Throughput units (TU). A single TU lets you
- Ingress: Up to 1 MB per second or 1000 events per second (whichever comes first).
- Egress: Up to 2 MB per second or 4096 events per second.
In this case, you need at least 2 TU. In the production environment, if your solution’s workload is steady and load keeps growing (ex. month by month, you may want to use Auto-inflate for automatically scale-up. This is scale-up only feature, so you need to scale-down by yourself if your solution receives spike.
In addition to it, you want to detect this kind of issue immediately in the production. In this case, I recommend to see Throttled Requests metrics in the Azure portal. It shows the number of requests that were throttled because the throughput unit usage was exceeded, so you can easily identify whether the TU is enough or not. You may want to set Alert for this metrics for detecting it immediately.
Ver.1.0 Production and next actions
Your team realized the system working fine based on the load test result. Current system is
- 1 Event Hub namespace with 2 TU. Set Auto-inflate for scale up automatically. It has Event Hub whose partition count is 8
- 2 Function Apps. Each Function App runs 5 functions
This system works fine, but when your business is grow, you may receive messages than Event Hub can handles with 20 TU. In this case, you can consider
- Ask support team to scale-up than 20TU
- Add more Event Hub Namespaces and distribute Event Hub to them.
- Use Event Hubs Dedicated
I like 2nd approach because Event Hub team recommend “balance 1:1 throughput units and partitions to achieve optimal scale. A single partition has a guaranteed ingress and egress of up to one throughput unit.”. If you separate Event Hub Namespace, you can easily achieve this for providing best performance.
Thank you for reading long story. If you wan to go deeper, I recommend to read following documents. Have a good hack and business ;)
- Azure Functions and Event Hubs: Optimising for Throughput
- Eventhub triggered Azure function: Replays and Retries
- Event processor host <- This is run under the hood in Azure Functions
Next blog post
I posted new blog post for describing how we can separate Azure services (ex. Function App, Storage Account..etc) to make better performance with IaC. Please read it if you want to prepare great infrastructure.