Zone’s principal architect, Andy Butland examines how the durable functions framework has matured over the past couple of years…
A couple of years ago I had opportunity to speak, write and work with what was then a new serverless Azure technology, Durable Functions, based on top of the Azure Functions framework for the purposes of handling long-running or multi-stage tasks. As part of investigating the technology I built a couple of sample applications using both durable functions and the standard functions framework by itself, and compared and contrasted the results.
What’s obvious is that with the durable functions approach, much of the “plumbing code” is abstracted away from you as a developer, leaving you free to concentrate on the business logic of the workflow you are looking to implement.
You can see from a “before and after” component diagram that a lot of the necessary intermediate queue and storage components are no longer an upfront concern, and we can rely more on the framework itself for the tasks of marshalling data into the functions as parameters and out again as results.
You also get the pleasant benefit of being able to apply single-responsibility principle to your functions — stringing together what can be a complex workflow from a series of individual components, each focused on their own specific task.
If you’re interested in reading more on this comparison between Azure functions and the newer durable functions approach, please see my previous article.
A new problem to solve
On a recent project, we had a problem with a file upload workflow. Via a back-office application, we allowed the user to export a set of data in Excel format, modify it, and re-upload the file to batch update their changes. While this worked, because our initial implementation was to do this within a single, synchronous request/response web application cycle, we had some issues with large file uploads. Behind the scenes, the web application was making a call to a REST API, whose processing updated the records one at a time into a database — and this was our bottleneck.
While we could potentially treat the symptoms here with increased timeouts, and perhaps internal batching up of the updates, it was clear the main issue here really was the attempt to do this within one synchronous request/response cycle. A better approach, at least once we’ve detected the uploaded file was greater than a particular size, would be to switch to asynchronous processing.
Instead we’d implement the web application functionality like this:
- User uploads the file.
- The file contents are validated and the number of records provided counted.
- If the number is less than a cut-off value, proceed with the existing synchronous upload.
- If it’s greater though, we save the contents of the file to blob storage, send a trigger message to a queue and redirect the user to a “status pending” screen.
The workflow itself needs to:
- Be triggered by the queue message where it can read an identifier for the process.
- Load the data from blob storage based on that identifier.
- Divide the data into appropriately sized chunks and initiate a “fan-out” operation where several, parallel operations run to update each chunk of records.
- When all are complete, aggregate (“fan-in”) the results and write a summary of the results to a file blob storage.
Finally back in the web application we have:
- The “status pending” screen doing a client-side poll to a web application API endpoint that checks for the presence of the results summary blob file.
- If it’s not found, we stay on the page, and poll again in a few seconds.
- If it is found, the summary results are read the user is redirected to a confirmation screen.
These data flows are illustrated in the following diagram:
Given the solution was already hosted in Azure, we had access to the necessary queue and blob storage facilities, and for the asynchronous processing itself, we had a reason to turn again to durable functions. In the rest of this article I’ll share a few details and code samples of the aspects I found “new and improved” since my last chance to work with the technology.
The “leaky abstraction” is plugged
When working with an earlier release of durable functions, one issue I ran into was that although intermediate storage components are abstracted away, they of course still exist. At the time, only queues were used to pass data between functions, which have a 64KB message size limit. Hence if the data being passed around exceeded that size, you would run into problems. You could serialise and compress data but then lose the strongly typed nature of the messaging. And even then, you could go over the limit.
What I found really nice this time round was that this abstraction of storage component really does hold, with the framework seamlessly using compression and switching from queues to other forms of storage as the data size requires. This means there’s no need for the serialisation and compression, and you can truly forget about the plumbing parts of the workflow.
The general flow of a durable orchestration workflow is:
- A trigger function (in our case initiated by a queue message).
- This calls an orchestration function, that’s responsible for organising the workflow.
- Which it does by calling one or more activity functions that carry out the individual tasks.
In the code samples below, you can see an extract of the implementation, where an orchestration is triggered, passing a strongly typed parameter.
Another features now fully implemented with Azure functions is the ability to use the dependency injection pattern. This is something we use extensively with .Net when working in other contexts such as MVC web applications or APIs, but previously this wasn’t easily achieved with Azure functions. They were essentially static classes, and hence typical means of constructor injection of dependencies weren’t available.
To get around this earlier, I’d use a kind of “poor man’s DI”, where you instantiate components in the function, and then pass them in to a separate class via means of its constructor, where the majority of the work happened. This was useful to support testing to an extent — as you could validate the class behaviour using mocked dependencies — but it wasn’t particularly elegant.
As of Azure Functions 2.0 though, the dependency injection pattern is now fully supported.
The registration of dependencies is carried out in a special class decorated such that it will run on startup. Here you can see we’re registering a named HttpClient as well as a service for accessing blob storage, using environment variables for configuration.
Then in the activity functions themselves, we can use constructor injection to get a concrete instance at run-time. First for the HttpClient:
And then for the blob service:
Another pleasant discovery when looking at durable functions in 2020, is that they are now fully unit testable. Previously as noted, I’d look to extract the code under test into a separate class, instantiated from within the function at runtime, and test that. Which covered most things, but now we can actually test each of the three types of function in a durable functions workflow directly.
Firstly the trigger function. This one usually isn’t doing much, but we can at least verify that we’re calling the appropriate orchestration and providing the correct input data:
Then the orchestration. The responsibility of the orchestration function in a workflow is not to do any work with external services itself — in fact there are code constraints you must adhere to to ensure it doesn’t — rather it’s involved with calling the appropriate activity functions, passing the necessary input data and collating the results. We can write a test ensuring that it’s doing this, such as in this simplified example:
Finally the activity functions themselves. In a similar way we can mock the durable functions context class, and verify that it operates as it should:
A feature of using queue-triggered single Azure functions is the retry behaviour available, whereby if the function throws an exception, the message will be returned the queue and picked up again for re-processing — after a configurable delay and maximum number of attempts. We lose that facility with durable functions, as the trigger function that begins the orchestration completes once it successfully exits, and the message is removed. An exception within the orchestration or activity functions will be too late to stop that happening.
We do however have some control over retry logic within the durable functions workflow itself.
Firstly, we can call activity functions with retry behaviour, using an object indicating how many times we’d like to retry, and with what delay:
And we can use try/catch logic, with exceptions bubbling up to the orchestration function, where, if appropriate, the FunctionFailedException can be caught and compensatory actions can be made.
To restore the behaviour of a single Azure function, where the message is put back on the queue for processing again, it would be necessary to take specific action to put a new copy of the message on the queue. If the message keeps failing through, you wouldn’t want to keep processing it indefinitely; normally it would be moved after a few attempts to a “poison queue” for further, often manual, handling. As far as I can see though, the dequeue count of a message can’t be set, only read, hence some custom means of tracking this via message content or a header would be needed.
Using durable functions has proved a valuable tool in solving this type of problem, where we want to break off chunks of functionality from a monolithic web application, and handle them asynchronously via background processes. It’s pleasing to see how well the framework has matured over the past couple of years and how we’re able to use the best practices that we’re used to in other areas when programming in this still relatively new “serverless” paradigm.