Azure Durable Functions: before and after
Zone’s head of .NET development, Andy Butland, looks at the use of Azure Functions and Durable Functions to tackle common concerns…
I went through some of introductory topics such as what the platform is and why we might consider using it, and then ran through a number of examples — illustrating solving real-world problems with Azure Functions via a combination of diagrams, demos and code. The latter half of the presentation was focused on a comparison between two implementations, for two patterns, using “classic” (my name — it’s too early for a “classic” designation I think!) Azure Functions and a newly released extension to the platform named Durable Functions.
I’m going to spin that discussion off into the following blog post — comparing the implementation of two long-running tasks, using chaining and fan out/in techniques. Some important aspects of the code will be shown within the article, but to view the full implementation of each pattern, please see the Github repository.
As a brief introduction though, Durable Functions allow us to abstract away intermediate queues and tables that we might otherwise need to support workflows using the standard platform. Rather than going via these storage components, functions can call other functions directly, passing input and retrieving output — which sounds obvious given the name “functions” of course, but it’s much more clearly apparent when using the durable functions approach. As we’ll see, we can’t completely ignore the underlying storage components that support this, but nonetheless, many scenarios can be implemented with less “plumbing” required.
We might look to chain together multiple Azure Functions for a couple of reasons. The first is that function execution time is limited — 5 minutes by default, extendable via configuration to 10 minutes. That’s still quite a long time and plenty for most processes but we could certainly have examples of a workflow that takes longer than that, and therefore couldn’t be completed within a single function invocation.
Even if that’s not the case, we would likely want to adhere to the single-responsibility principle (SRP) within our function applications — for the same reasons of clean code and ease of understanding and maintenance as we would in any software development. So we would likely want to break up more complex workflows into multiple functions, each responsible for a single, smaller piece.
To do that though, we need to find a way to have the output of one function be passed to the next one in the chain. Which we can do via an intermediate storage component, most typically, a queue.
The example I looked at was a (fictitious) e-commerce workflow, where we want to carry out a number of steps following the placement of an order by a customer. We need to write to a database, make a call to an external web-service to manage inventory updates and to send a confirmation email to the customer. In order to provide a responsive experience to the user, we don’t want to wait to complete all the necessary steps in the workflow before sending a response back to them, so instead we take the order details, save them to a queue message and continue the workflow as a background process.
Chaining using Azure Functions
The first step of our implementation using the standard Azure Functions platform is to create a queue triggered function to respond to this queue message. This function has one primary job — to update the database. Its secondary job is to create a new message, on a new queue, which in turn triggers the second function to carry out its job of calling the external web service. And so on, to the final function that sends the email to the customer.
We are able to exhibit some control and customisation implementing logic to simply not pass the message onto the next queue if part of the process fails and we want to cut short the full workflow.
The following diagram illustrates the components involved:
The implementation of the first function is as follows:
Key points to note include the incoming item parameter, which is bound via the QueueTrigger attribute and thus contains the content of our initial message. We also have a second parameter named outputQueue, which is of a collection type. Once the task of updating the database is complete, we manipulate that collection in code by adding a typed item to it — and because of the Queue attribute applied to the parameter, this change is reflected in the storage component itself as a new message in the queue.
This is one of the nice features of Azure Functions I find, these attributes provide a lot of power when it comes to integrations, meaning we can avoid dropping down into the lower-level APIs of the storage component or service libraries themselves and focus more on implementing the logic of our business process.
Chaining using Durable Functions
Taking the same scenario with Durable Functions, we can immediately see via a compare and contrast of the two component diagrams that we have on the face of it a much simpler solution:
On the left, we have our input as before, from queue storage. And on the right, our external systems and services. In the middle though, just functions calling functions, and no (apparent) intermediate storage components.
The first function is now implemented as follows:
We have our queue trigger as before, but this time a new parameter called starter that’s of a type DurableOrchestrationClient. Via it, we can call our main workflow function — known as an orchestration function within the framework — passing in the name of the function to call, and the initial input (the contents of our queue message).
The orchestration function looks like this:
As input we receive a parameter of type DurableOrchestrationContext which allows us to do various things like retrieve the input to the function and then make calls to other functions — called activity functions — to carry out their dedicated tasks. We can pass information to those functions, as well as receive their returned output back — and if need be, respond appropriately with standard C# control of flow code to amend the workflow. For example here, if the call to update the inventory fails, we opt not to send the confirmation email to the customer.
Finally an example of an activity function, which has a parameter of type DurableActivityContext, again allowing us to retrieve the function input (which, as is shown here, can be typed rather than simply a string):
Fan-in/out using functions
Another pattern used for handling larger workloads is known as fan-in/out, or sharding, where we take one large input file and break it up into smaller ones that can be executed in parallel. When all are finished, we aggregate these intermediate results in order to produce an overall output.
The example I chose to illustrate this was to obtain from Kaggle a large data-set of summer Olympic data from 1896 to the present day, containing every single event and medal won. I then wanted to analyse that to determine the overall ranking of countries in terms of medal success in Olympic history.
Fan-in/out using Azure Functions
The following diagram illustrates the components and functions that were involved in the implementation using the standard Azure Functions platform:
We start with the large dataset (a CSV file) uploaded to a blob storage container. A function, configured with a trigger for that container is then fired. It’s going to be responsible for the “fan-out” operation, and will break the file up into smaller files, one for each year, and write them to a second blob storage container.
First though, we need somewhere to store our intermediate results for each of the fanned out operations. We use table storage for that, so populate a table with one record for each year, with an empty field reserved for population when each operation on a given year completes.
A second function set up to be triggered on the second blob storage container will be fired. At this point the scaling feature of the Azure Functions platform will kick-in, and, as we have multiple files in this container, we’ll get a number of function instances triggered and run in parallel. As each completes it does two things. Firstly, it updates the record in table storage for the given year. And secondly, it writes a message to a queue — just a simple flag indicating “I’m done”, that will be used to trigger the last step in the process.
The final function, responsible for the “fan-in” operation, triggers from these queue messages. We have an issue here though, in that we can only calculate our final result once all years have been processed. So the first thing this function does is check in table storage and query to see if data for all years has been completed. If not, the function simply exits. When the final message on the queue is processed, we’ll have data for all years, so can continue, aggregate the results and produce an output, which is written as a CSV file to a final blob storage container.
Code-wise, the first function looks like this:
Perhaps the most interesting thing to note here is the use of the parameter of type Binder. We need this because we are going to create multiple files in blob storage, and we don’t know for sure at compile time what the amount and names of the files will be (there’ll be one for each year). So we can’t use an attribute as before. Well, in fact we can, but it’s one we have to create dynamically at run-time which can be seen in the WriteFilePerYear method. Here we dynamically create an attribute on an instance of a CloudBlockBlob object, and having done that, the changes we make to it in terms of setting properties and uploading bytes, are reflected in the blob storage container. Again, we avoid digging down into the lower-level storage API directly (though of course, if we ever need to, we can ).
The second function processes the data for each year and writes the results to table storage:
And the final function carries out the fan-in operation as described above:
Fan-in/out using Durable Functions
We can turn to Durable Functions to implement the same pattern and, as we’ll see, we get a couple of key benefits. Here’s the updated component diagram:
Once again, note the absence of intermediate storage components; we’ve been able to do away with explicit use of message queues and table storage for tracking the intermediate results. We have our input and output blob storage containers, but the workflow itself is just functions calling other functions, passing input and receiving and acting on their return values.
The other valuable benefit is that our fan-in operation — which was tricky to implement previously — is much more straightforward, and efficient. We are no longer having to make unnecessary function invocations where we check to see if all intermediate operations are complete, and if not, exit. Each of those would have an, albeit very small, cost associated, which we can now avoid.
Our initial trigger function is now responsible for taking the initial input and transforming it into a typed structure to pass to the orchestration function:
There’s one little aside to note here, which is that although intermediate storage components are abstracted away, they still exist. Queues are used to pass data between functions, which have a 64KB limit, equating to 32KB of UTF-16 string data. If the serialised function input or output goes over this limit, currently the process will fail, often in quite odd and tricky to diagnose ways. As such currently I’m taking care to serialise and GZip compress the inputs and outputs, pass them as a string, and decompress and deserialise on the other side.
This is something the functions team are aware of — nicely, much of the development is out in the open on Github and can be followed via issues and commits — and hopefully in future this compression or use of an alternative storage mechanism will become something the platform takes care of and is also abstracted away from the developer. But for now, it’s something to keep an in mind.
Here’s the orchestration function:
The orchestration function, having received input, then creates multiple Tasks, each hooked up with an activity function instance being passed it’s appropriate input for a given year. Once all primed, they are triggered in parallel and the code waits for all to complete. When they all have, the final activity function is called to process the output. Note in particular here how we’ve replaced all the calls to table storage for handing intermediate output with the single await Task.WhenAll(parallelTasks) line.
The two activity functions looks like this:
Over the last few months we’ve been making use of Azure Functions and other server-less offerings for implementing a number of supporting, auxiliary application components for the web applications we are building. With a few provisos — not unexpected in a new and evolving platform — we’ve been able to make good use of them, partly replacing the more traditional console apps or Windows services we might have previously employed for similar tasks. The introduction of the durable functions extension just makes that all a bit more straightforward, particularly when it comes to the more complex workflows.