Scheduling can be a tricky business. I was working on a project that, among other things, required me to rearchitect a simple cron job that used to run every minute on a Linux box. The job was written in node.js, was small, stateless, without external dependencies and therefore was a good fit for Azure Functions. Or so I thought…
To begin with, let’s take this simple cron job that runs every minute, appends current time to /tmp/cron.out and sleeps for 2 minutes:
By default, cron jobs aren’t singletons — a new instance will be kicked off even if the previous instance is still running. In this scenario, although each iteration takes at least two minutes to run, a new line will be written to /tmp/cron.out roughly every minute:
In fact on Linux, if you need to ensure there’s only one instance of your job running, it’s your responsibility — and there’s a number of methods to choose from. But what about Azure Functions?
Timer Trigger in Azure Functions
You can think of the timer trigger in Azure Functions as “serverless cron”. The catch is that the trigger is built on the Webjobs SDK TimerTrigger which ensures that only a single instance of your triggered function is running at any given time. Let’s test this out:
Create a TimerTrigger1 node.js Consumption Plan Azure Function that runs every second (I use the in-portal experience for simplicity):
… and observe it fire every second:
Now, replace the default code with the following snippet which simulates a Very Important Task that takes about 5 seconds to complete (the COMPUTERNAME environment variable will contain the unique name of the virtual host your function is executed on — I’m on a Consumption Plan):
As expected, the Singleton behaviour will prevent a new instance from kicking off every second due to a lock created in the background. Our code now fires every 5 seconds or so on the same virtual host RD0003FF02FEFE:
Often, this behaviour is desired: it helps prevent race conditions and hard-to-debug issues due to overlapped executions, but what if this is not what we want? What if we want it firing every second, regardless? This requirement is quite common, especially if you’re frequently polling an API that takes more that a second to return and you dedupe its results in a separate process? The most straightforward solution is to ditch the Azure Function Timer Trigger and use a single-step Logic App. Let me demonstrate this approach.
Recurrence Trigger in Logic Apps
Some of you may have used the Azure Scheduler service — it is being deprecated and replaced with Azure Logic Apps
First, disable the TimerTrigger1 function:
Then, create an HTTP-triggered function imaginatively called HTTPTrigger1:
Replace its default code with a slightly modified version of the timer-triggered function we’ve just disabled:
Then, create a blank Logic App, add a recurring trigger to fire every second, and an action — you guessed correctly — our HTTPTrigger1 function:
Save and observe how your Very Important Task is now being fired every second. Also observe that we’re seeing new virtual hostnames in the execution log, thanks to our Function scaling itself out to 5 instances, give or take (you can always use Application Insights for a more accurate analysis):
Now, it’s all fun if you know exactly how long your task will run for (5 seconds in this case) but what if it’s unpredictable and you would like to be able to limit the number of Very Important Tasks running in parallel? It makes sense do always do so if you’re dealing with an external endpoint and you don’t want to DDoS it or be rate-limited.
This is another interesting challenge, I initially tried enforcing it on the Azure Function side via a couple of methods:
- WEBSITE_MAX_DYNAMIC_APPLICATION_SCALE_OUT app setting that can be used to limit the maximum degree of parallelism. It’s fairly unreliable, not 100% bulletproof and is not recommended for values greater than 5
- Azure Function’s HTTP Trigger maxConcurrentRequests setting in host.json— same situation, I would like to have something more bulletproof
The most bulletproof solution I found was hidden in the Logic App’s Recurrence trigger settings. All that’s required is to flick the concurrency control toggle and choose a desired degree of parallelism and voila!
Another way to schedule activities is to use a Durable Azure Function with an Eternal Orchestrator. I will cover it in a separate post.
Thanks for reading and until next time!