Boosting Efficiency with Azure Durable Functions for Long-Running API

huzefa qubbawala
7 min readMar 8, 2024

--

In the realm of modern applications, efficiency and responsiveness are paramount. However, there are scenarios where certain operations, such as processing large documents for insights extraction using applied AI services, can consume a significant amount of time, hindering the responsiveness of your application. Azure Durable Functions come to the rescue, offering a powerful solution for handling such long-running operations asynchronously while maintaining scalability and responsiveness.

In this article, we’ll explore how Azure Durable Functions can effectively tackle this challenge, along with best practices for managing long-running tasks.

Understanding the Challenge

Consider a scenario where you have a REST API that needs to process large documents to extract valuable insights

  • Detecting important entities such as Person, location, etc.
  • Extracting key phrases in the document.
  • Creating a summary of the document using OpenAI.
  • Return the text of the document

Performing these operations synchronously within your API could lead to blocking requests, resulting in poor user experience due to slow response times. To address this challenge, you need an asynchronous approach that can handle long-running operations efficiently without compromising responsiveness.

Additionally, you also need a good orchestrator function that can divide the problem into smaller tasks, run those tasks independently combine the results when required, and return them to the user.

Enter Azure Durable Functions

Azure Durable Functions provide a robust framework for building serverless, stateful workflows. They allow you to define workflows as code using orchestrator functions, which can coordinate the execution of multiple activities in parallel or sequentially. This makes them an ideal choice for handling long-running operations asynchronously.

Solution Overview

To solve the problem described above, we’ll leverage Azure Durable Functions to perform document processing asynchronously. Here’s an overview of the solution:

  1. Orchestrator Function: This function coordinates the execution of the document processing workflow. It starts by triggering the processing activities and then waits for their completion.
  2. Activity Functions: These functions perform specific tasks within the document processing workflow, such as extracting text from the document, creating a summary, extract key phrases. They execute asynchronously and can run in parallel for optimal performance.
  3. Client Interaction: The client initiates the document processing workflow by calling the orchestrator function via a REST API endpoint. Once the workflow is initiated, the client can monitor its progress and retrieve the results when processing is complete.
Durable Functions Solutions

Implementation with Code Snippets

Let’s dive into the implementation details with code snippets to illustrate how Azure Durable Functions can be used to solve the problem at hand.

Full code can be found in github repo — https://github.com/huzefaqubbawala/azuredurablefunctions.git

Client Interaction API Endpoint Function

 [Function(nameof(StartDocumentProcessingOrchestrator))]
public async Task<HttpResponseData> StartDocumentProcessingOrchestrator(
[HttpTrigger(AuthorizationLevel.Anonymous, "post")] HttpRequestData req,
[DurableClient] DurableTaskClient durableTaskClient)
{
var documentDto = await req.ReadFromJsonAsync<DocumentRequestData>();
if (documentDto == null || string.IsNullOrEmpty(documentDto.DocumentUrl))
{
return req.CreateResponse(System.Net.HttpStatusCode.BadRequest);
}
else
{
documentDto.InstanceId = documentDto.InstanceId ?? Guid.NewGuid().ToString();
}

// Start a orchestrator function.
string instanceId = await durableTaskClient.ScheduleNewOrchestrationInstanceAsync(
nameof(RunDocumentProcessingOrchestrator),
input:documentDto,
new StartOrchestrationOptions(documentDto.InstanceId));

_logger.LogInformation("Started orchestration with ID = '{instanceId}'.", instanceId);

// Returns an HTTP 202 response with an instance management payload.
// See https://learn.microsoft.com/azure/azure-functions/durable/durable-functions-http-api#start-orchestration
return durableTaskClient.CreateCheckStatusResponse(req, instanceId);
}

The above function is the starting point. Users can post the request using this endpoint and the API returns immediately with an HTTP 202 response and status query URI for tracking the status of long-running operation. Please see below the durable task framework also generates the terminate URI in case the user wants to terminate this operation.

This endpoint is also responsible for starting the orchestrator function which will take care of the flow of long running function.

Orchestrator Function

The orchestrator is the heart of durable functions. This is the place where we will design our workflow and use the different patterns offered by Azure's durable functions. For instance, we have used Fan out — Fan in pattern here in conjunction with function chaining to get the desired result.

Orchestrator functions have the following characteristics:

  • Orchestrator functions define function workflows using procedural code. No declarative schemas or designers are needed.
  • Orchestrator functions can call other durable functions synchronously and asynchronously. Output from called functions can be reliably saved to local variables.
  • Orchestrator functions are durable and reliable. Execution progress is automatically checkpointed when the function “awaits” or “yields”. Local state is never lost when the process recycles or the VM reboots.
  • Orchestrator functions can be long-running. The total lifespan of an orchestration instance can be seconds, days, months, or never-ending.

For more detailed information on orchestrator functions and their features, see the Durable orchestrations article.

[Function(nameof(RunDocumentProcessingOrchestrator))]
public async Task<DocumentResponseData> RunDocumentProcessingOrchestrator(
[OrchestrationTrigger] TaskOrchestrationContext context,
[DurableClient] DurableTaskClient durableTaskClient)
{
ILogger logger = context.CreateReplaySafeLogger(nameof(Function1));
logger.LogInformation($"Starting {nameof(RunDocumentProcessingOrchestrator)}.");
var documentDto = context.GetInput<DocumentRequestData>();

// First get the document text.
var documentText = await context.CallActivityAsync<string>(
nameof(ExtractDocumentText),
documentDto);
// Set the output to object. Function chaining is used here.
documentDto.DocumentText = documentText;

// Use Fan out and Fan in pattern to run multiple task parallely
var parallelTasks = new List<Task<string>>();

// Fan out : Add multiple activity function in parallel.
// This will call all functions in parallel.
var detectEntitiesTaskResult = context.CallActivityAsync<string>(
nameof(DetectEntitiesFromDocumentText),
documentDto);
parallelTasks.Add(detectEntitiesTaskResult);

var keyPhrasesTaskResult = context.CallActivityAsync<string>(
nameof(ExtractKeyPhrasesFromDocumentText),
documentDto);
parallelTasks.Add(keyPhrasesTaskResult);

var summaryTaskResult = context.CallActivityAsync<string>(
nameof(CreateSummaryOfDocumentText),
documentDto);

parallelTasks.Add(summaryTaskResult);

//Fan in: Wait for all tasks to be completed.
await Task.WhenAll(parallelTasks);

return new DocumentResponseData
{
Entities = detectEntitiesTaskResult.Result,
KeyPhrases = keyPhrasesTaskResult.Result,
Summary = summaryTaskResult.Result
};
}

Activity Functions

Activity functions are the basic unit of work in a durable function orchestration. Activity functions are the functions and tasks that are orchestrated in the process.

Unlike orchestrator functions, activity functions aren’t restricted in the type of work you can do in them. Activity functions are frequently used to make network calls or run CPU-intensive operations or call third party API to process the data.

An activity function can also return data to the orchestrator function. The Durable Task Framework guarantees that each called activity function will be executed at least once during an orchestration’s execution.

[Function(nameof(ExtractDocumentText))]
public async Task<string> ExtractDocumentText([ActivityTrigger] DocumentRequestData documentRequestData,
[DurableClient] DurableTaskClient durableTaskClient)
{
_logger.LogInformation($"Running {nameof(ExtractDocumentText)} for document url --------------- {documentRequestData.DocumentUrl}");
if (documentRequestData?.InstanceId != null)
{
var instance = await durableTaskClient.GetInstanceAsync(documentRequestData.InstanceId);
if (instance != null && instance.RuntimeStatus != OrchestrationRuntimeStatus.Terminated && instance.RuntimeStatus != OrchestrationRuntimeStatus.Failed)
{
// Sleeping for 30 seconds to mimic the actual processing time to extract the text from document.
Thread.Sleep(30000);
_logger.LogInformation($"Extracted Text for document url --------------- {documentRequestData.DocumentUrl}");
return $"Extarcted Document Text";
}
}

return $"Terminated for document url --------------- {documentRequestData?.DocumentUrl}";
}
[Function(nameof(CreateSummaryOfDocumentText))]
public async Task<string> CreateSummaryOfDocumentText([ActivityTrigger] DocumentRequestData documentRequestData,
[DurableClient] DurableTaskClient durableTaskClient)
{
_logger.LogInformation($"Running {nameof(CreateSummaryOfDocumentText)} for document url --------------- {documentRequestData.DocumentUrl}");
if (documentRequestData?.InstanceId != null)
{
var instance = await durableTaskClient.GetInstanceAsync(documentRequestData.InstanceId);
if (instance != null && instance.RuntimeStatus != OrchestrationRuntimeStatus.Terminated && instance.RuntimeStatus != OrchestrationRuntimeStatus.Failed)
{
// Sleeping for 30 seconds to mimic the actual processing time to create summary of the text from document.
// TODO: Write business logic here to create document summary.
Thread.Sleep(30000);
_logger.LogInformation($"Completed Summary of Text for document url --------------- {documentRequestData.DocumentUrl}");
return $"Summary of Document";
}
}

return $"Terminated for document url --------------- {documentRequestData?.DocumentUrl}";
}
[Function(nameof(DetectEntitiesFromDocumentText))]
public async Task<string> DetectEntitiesFromDocumentText([ActivityTrigger] DocumentRequestData documentRequestData,
[DurableClient] DurableTaskClient durableTaskClient)
{
_logger.LogInformation($"Running {nameof(DetectEntitiesFromDocumentText)} for document url --------------- {documentRequestData.DocumentUrl}");
if (documentRequestData?.InstanceId != null)
{
var instance = await durableTaskClient.GetInstanceAsync(documentRequestData.InstanceId);
if (instance != null && instance.RuntimeStatus != OrchestrationRuntimeStatus.Terminated && instance.RuntimeStatus != OrchestrationRuntimeStatus.Failed)
{
// Sleeping for 30 seconds to mimic the actual processing time to detect entities from document.
// TODO: Write business logic here to detect entities.
Thread.Sleep(30000);
_logger.LogInformation($"Completed detect entities for document url --------------- {documentRequestData.DocumentUrl}");
return $"PERSON: Huzefa ; Location: New York";
}
}

return $"Terminated for document url --------------- {documentRequestData?.DocumentUrl}";
}
 [Function(nameof(ExtractKeyPhrasesFromDocumentText))]
public async Task<string> ExtractKeyPhrasesFromDocumentText([ActivityTrigger] DocumentRequestData documentRequestData,
[DurableClient] DurableTaskClient durableTaskClient)
{
_logger.LogInformation($"Running {nameof(ExtractKeyPhrasesFromDocumentText)} for document url --------------- {documentRequestData.DocumentUrl}");
if (documentRequestData?.InstanceId != null)
{
var instance = await durableTaskClient.GetInstanceAsync(documentRequestData.InstanceId);
if (instance != null && instance.RuntimeStatus != OrchestrationRuntimeStatus.Terminated && instance.RuntimeStatus != OrchestrationRuntimeStatus.Failed)
{
// Sleeping for 30 seconds to mimic the actual processing time to Extract Key Phrases From Document Text.
// TODO: Write business logic here to Extract Key Phrases.
Thread.Sleep(30000);
_logger.LogInformation($"Completed Extract Key Phrases for document url --------------- {documentRequestData.DocumentUrl}");
return $"This is an agreement document between Huzefa and Other Party.";
}
}

return $"Terminated for document url --------------- {documentRequestData?.DocumentUrl}";
}

Best Practices for Long-Running Operations

When dealing with long-running operations using Azure Durable Functions, it’s essential to adhere to best practices for optimal performance and reliability. Here are some key best practices:

  1. Terminate Function: Leverage the built-in capabilities of Azure Durable Functions to terminate the operation. This ensures that resources are not consumed unnecessarily and helps maintain cost efficiency.
  2. Suspend and Resume Functionality: Leverage the built-in capabilities of Azure Durable Functions to suspend and resume workflows based on specific triggers or conditions. This allows you to pause execution and resume later, preserving the state of the workflow.
  3. Error Handling: Implement robust error handling mechanisms to handle exceptions gracefully and ensure the reliability of your workflows. Azure Durable Functions provide built-in support for retry policies and exception handling, which can be customized to meet your application requirements.

By following these best practices, you can effectively manage long-running operations with Azure Durable Functions while ensuring scalability, reliability, and cost efficiency.

Conclusion

In this article, we’ve explored how Azure Durable Functions can be leveraged to handle long-running operations asynchronously, specifically focusing on document processing scenarios. By utilizing orchestrator and activity functions, along with best practices for managing long-running tasks, you can build scalable and responsive applications that deliver optimal user experience. With Azure Durable Functions, you can efficiently tackle complex workflows while maintaining the flexibility and agility of serverless computing.

--

--

huzefa qubbawala

Senior Architect @ CTO office Icertis | Problem Solver | Cloud Solution | Azure Kubernetes | Serverless | API Management | Cognitive Service | Applied AI