Adopting Asynchronous Messaging With Azure Service Bus

Published in

The Startup

17 min readNov 15, 2020

With more and more companies moving towards a distributed or microservice architecture, the importance of scalable and resilient asynchronous message-brokers is only growing. And a challenge faced by such companies is striking a balance maintaining functionality whilst improving application dataflow.

This article is aimed at those who are currently adopting or considering adopting asynchronous communication protocols into their system. We will be breaking down some of the main things to consider whilst migrating, with a targeted focus on:

Minimising the impact of external integrations: Internal communication protocols should not be defined by our external integrations. Internal workflows should be easily adaptable to match the integration requirement’s, whilst ensuring effective/considered use of async messaging.
Establishing a path forward: Approach to safely and iteratively introduce internal async channels, whilst ensuring we safely manage cases where sync/async messaging is necessary.

Synchronous vs Asynchronous messaging

Let's take a minute to get some context on the different types of messaging protocols. Whilst adopting asynchronous (async) messaging into your system, it’s easy to fall into the trap of seeing synchronous (sync) as bad and async as good, but that's not the case. Each of them has distinct characteristics which make them suitable for different scenarios. The most common sync protocol used in inter-service communication is HTTP and is often characterised by its request/response cycle.

It’s important that we simply see these protocols as tools with distinct strengths and weaknesses. Often one of the most important aspects of a messaging design doesn’t come down to which protocol we use, but how well we can align the characteristics of the protocol to the desired behaviour.

Synchronous communication characteristics

Synchronous communication showing blocking behaviour’s after the request, and until a response is received.

Synchronous messages can be described as a request/response cycle where every request gets a single response from the recipient. A widely used synchronous protocol for messaging is HTTP. Some characteristics of synchronous protocols are:

Request/response cycle: Each request to the server will get a single response. Unless explicitly handled by the client service (e.g. receive the response on a worker thread) the application will block until it receives a response.
Runtime coupling: Service_A sending an HTTP request to Service_B results in runtime coupling, as we depend on the availability of both to achieve the desired outcome. As we add more services to this (i.e. Service_B then calls Service_C etc..) we then have a synchronous chain whereby all participating services are now coupled. With an HTTP based sync protocol, these are referred to as HTTP chains.
Simplistic and efficient: Sync HTTP based protocols are often simpler and more efficient than its semantically equivalent async alternatives. This is in part because we don’t have to integrate with, or manage, a dedicated message broker.

Simplified HTTP based architecture with two HTTP chains, API_Gateway, Service_A, Service_C and API_Gateway, Service_B, Service_C.

Asynchronous communication characteristics

Async interaction between two processes with no blocking.

Asynchronous message protocols decouple the client/server by exchanging messages through message brokers instead of directly between each other. It doesn’t natively have a request/response cycle, and at its most basic operates by sending messages with no expectation of a reply. Some characteristics of asynchronous protocols are:

Adaptable: There are many different message patterns you can utilise with this protocol and can be flexible depending on your requirements. When needed it can also trigger sync requests (e.g. HTTP or database calls) or operate within a sync request/response cycle.
Loose coupling: By exchanging messages through dedicated brokers, instead of directly between services, we decouple the sender and receiver applications. This encourages high availability, with the message brokers acting like buffers until the recipient is ready to receive (key in isolated deployments).
Requires message broker: Desired results depend on the message broker, this adds complexity in integrating with the message broker, managing it’s infrastructure and ensuring its highly available.

Simplified asynchronous request/reply interaction with a message broker acting as a buffer.

Adoption is easy…except when it’s not

When looking to incorporate async communication into your system, a good first step is to identify the dataflow requirements of your scenario and attempt to align them with the appropriate protocol. This will give you a good starting point and will highlight the “low hanging fruit” ready for migration. Some common examples of scenarios which tend to align well with async behaviours are:

Extract, Transform, Load (ETL) processes: These are long-running job’s which involves copying data into a destination datastore. In part due to the size and length of the operation, it is often executed as an offline job or triggered to run in the background (i.e. not within a request/response cycle).
Notify: This occurs when System_A wants to notify System_B about a change but does not care about System_B’s response. This can be identified by cases where the response of the notified doesn’t affect the behaviour of the notifier, in essence when the notifier doesn’t care about when, or whether the notified was successfully informed. Common examples of this are logging/auditing scenarios.
Offline batch processing: Commonly executed in a background/offline transaction and is often triggered as part of a scheduled job.

A common characteristic shared by these “easy to migrate” cases are the fact that none on them traditionally require a response to achieve their desired outcomes. Unfortunately, this is not a coincidence. In fact, looking past that towards the remaining harder to migrate scenarios, you will see that a need for a concluding response is a common barrier when looking for async adoption. Now it's possible to replicate this same workflow using async protocols…but this is often not a good idea. When an API relies on a response from another API, we rely on the fact that those services are coupled. This means that any async implementation, without revising those workflows, must honour that by explicitly implementing that coupling behaviour.

Without re-considering the dataflow requirements, an async implementation is not feasible in these cases where we depend on a final response. This is, in large part, due to the nature of those requirements minimising the application level benefits when using an async implementation. Some of the main reasons against an async implementation in these cases include:

Added Complexity: Dataflow requirements align more naturally with sync protocols, and require complex code when opting for an async implementation.
Loss in efficiency: Sync protocols are more efficient and simpler to implement than a semantically equivalent implementation using async protocols. This means getting similar request/response semantics with async protocols is significantly less efficient than its sync alternatives.
Dependency: By using a message broker we introduce coupling between ourselves and the broker, and the need to maintain the highly-available message broker. This is usually warranted as it often results in the decoupling between other services. However, in cases where we depend on that coupling behaviour, using a message broker is often less appropriate.

So…what now? Well if we can’t effectively utilise async protocols without changing the internal dataflow requirements, let's see about changing them!

To do this, we first need to find what initially triggers our behaviour (the “first rung” in the chain) and identify how they expect to interact with our system. This is so we can identify whether these expectations involve a concluding response and whether our internal application needs to provide one. To identify these integration requirements, we must trace back towards the earliest point in our HTTP chain and determine the expectations from our initial request/trigger.

Some examples of cases where these expectations align with async workflows are:

Scheduled jobs: These are commonly executed in the background and don’t require a timely response.
HTTP triggers: These occur when the initial HTTP request triggers the resulting behaviour and immediately returns a response (potentially with a callback to later get the result). This can be in the form of a customer sending an HTTP request to your API to trigger a long-running action.

If the integration requirements for your system aligns well with an internal async application, then great! You should be in a position to iteratively work your way down the HTTP chain to incorporate async protocols. However, in the more than likely chance the integration requirements still align more with sync workflows (e.g. a traditional API integration), we will have to explore alternative strategies…

Mitigating fixed integration constraints

To minimise the influence external integrations have on our internal messaging we need to be able to trigger and complete an async request/reply scenario within the initial sync request/response cycle. This async request/reply serves as an adapter to handle async events in a sync cycle. Because these async events must occur within the request/response cycle, these behaviours are coupled. This means, to avoid un-necessary chaining, we should have this adapter logic close to the API Gateway/public API.

Conceptually, you can simplify the basic “happy path” steps performed in the external API to be:

Receive the initial request: Begins sync request/reply loop. Request can be from an external customer.
Send async request: This triggers the start of the async processing.
Receive async reply: This will receive the async reply indicating success/failure of the operation.
Return final response: Use async reply to return a response to the client. This completes the sync request/response cycle.

Async messaging within sync HTTP cycle using an adapter.

A key benefit of this approach comes from the difference between the behaviours of sync request/response and async request/reply. When we send a sync request, we force the recipient of that request to form/return the response. This results in the runtime coupling between the requestor service instance and the responder service instance.

Contrastingly, in an async request/reply scenario, when we send a request, we just need to ensure that a reply is sent from somewhere, this can be a separate instance of the same service or even a completely different service. This has the following main effects:

Coupled to behaviour, not instances: We are coupled to the behaviours triggered by an async request which result in an async reply. This is both in-line with the sync response expectations whilst not imposing restrictions on the async internals (i.e. no coupling behaviours internally).
Coupling logic isolated to integration: Async internals simply need to be able to receive a request and send a reply without having to rely on runtime coupling to achieve desired outcomes. This is because the coupling logic (where we listen for the async reply) is isolated close to the top-level integration and not utilised within the internal application. This also means the internal async application is flexible and can work with both external sync and async integrations.

Async request/reply scenario where Service_B receives the initial request and Service_D sends the final reply.

Key components

No two scenarios are the same, so this adapter workflow may look different depending on your case. However, there are key components which should be present in most (if not all) cases, these are:

Request queue’s: This is where we will send our initial async request.
Response queue’s: This is where we will receive our async reply.
Scoped listener’s: This is a handler that is configured to listen to the response queue for a reply within the request-response cycle. If we do not get a timely reply then we return a response, completing the HTTP cycle, indicating that a timeout occurred to the consumer. This listener is configured with an address which is used to ensure that the relevant reply is received. Return address semantics are implemented differently depending on the message broker, for example, Azure Service Bus can leverage sessions and for Apache Kafka, we can use reply partitions.
Fallback listener’s: This is a handler that is configured to listen to the response queue for the duration of the services lifetime. If there are no scoped listeners listening to the reply address (tied with the incoming reply) then it will be handled here. Often if a message is received here then a timeout occurred in the sync response to the consumer. Common implementations of this handler involve logging/alerting and triggering rollbacks (e.g. if used as part of orchestrated saga then this can be seen as a failed step to trigger compensating actions).

Azure Service Bus sessions

Azure Service Bus is Microsoft’s asynchronous message broker offering and enables async communication through their queues and topics. Sessions, in conjunction with queue/topic subscriptions, can be used to ensure a message is delivered to whoever holds the current session lock. Each session has a unique session ID which can be used to request a lock for a particular queue or topic subscription. This has many applications, with the most common ones being:

Request/Reply: When a service holds a session lock, we are controlling the dataflow to ensure that messages with that session ID are processed by the holder of that lock. This can be leveraged to achieve request/reply semantics by first sending a request with a reply address (the reply session ID) and then holding a lock for that session (ready to receive a reply).
Enforcing queue FIFO: When scaling out consuming applications, there is no guarantee messages in a queue are processed in the same order they are added. By using session’s we can ensure that all messages associated with that session will be processed by the same consumer instance in the order they were added to the queue.
Saga orchestrator with session state persistence: Session state is a binary object that is accessible to the instance holding the session lock. This has a range of applications, including being utilised as the datastore for orchestrated saga persistence.

Code snippet utilising sessions to enforce FIFO.

Adapter implementation example

Here, we will go through an adapter implementation which operates an async request/reply within a sync request/response cycle. This is designed to illustrate how to minimise the impact sync integrations can have on your internal application. The example is built in .NET Core and uses Azure Service Bus as the message broker. We will be using Microsoft.Azure.ServiceBus to integrate with Azure Service Bus and Microsoft.Extensions.DependencyInjection to manage our service dependencies.

The full implementation for the snippets shown can be found on my GitHub page, along with a quick-start guide to get this running yourself. Please note this is not an Azure Service Bus guide, we are just using it to illustrate the dataflow for this approach.

This example has two services:

Producer API: This exposes two HTTP endpoints, each handling async request/reply with the consumer service. This API sends async requests to an Azure Service Bus queue and listens for the timely correlated reply on a separate queue. If no timely reply occurred then further replies will be handled by a fallback listener.
Consumer service: This service listens to the Azure Service Bus request queue and sends a reply using the specified reply address. This is to simulate an internal async application which can make up any number of services (as long as they interact with request/reply queue).

Async inter-service communication executed within sync HTTP cycle using an adapter.

Producer API service

This service is to represent an API gateway/public API which is consumed through sync HTTP endpoints and triggers async behaviours internally.

The management of the producer API’s internal dependencies is handled using Microsoft.Extensions.DependencyInjection and is configured using the AddServiceDependencies() function. This function is executed on start-up and is responsible for:

Configuring fallback handler (line 5–13): Requires an QueueClient instance to receive reply messages in the timeout scenario.
Configuring service logic (line 15–24): Requires a QueueClient to send request messages and a SessionClient to receive the correlated replies from the reply queue.

The definition of the producer API’s endpoints is handled in the TestController class. This API has two HTTP GET endpoints which can be used to trigger one or more async requests in varying circumstances.

Single request “/API/Test/{bool: true/false}”: This can be used to send a single request to the consumer service, with a “waitForReply” flag to control whether we listen for a reply or leave it for the fall-back handler.
Batch request “/API/Test/batch/{int}”: This can be used to send a batch of async requests and wait for a response for each. The int parameter can be used to control the number of messages sent in a batch.

The adaptor logic for handling the sending of the async request and the consuming of the reply is done in the RequestReplyService. This service is responsible for adding async requests to the request queue and optionally waiting for a correlated reply by listening to the reply queue with a timeout of 10 seconds.

This request/reply functionality is done using these steps:

Locking the reply session (line 14–16): Acquire a lock using a unique session ID. While we have that lock, any messages in the reply queue with that session ID will be handled here.
Building message (line 20–26): Construct initial request message with the content object as the payload and specify the session ID (the one we have a lock on) for the correlated reply.
Sending request (line 28–30): Add the message to the request queue, ensuring it doesn’t participate in any un-intended transient transaction (not strictly needed in this case, but often a good idea).
Waiting for a reply (line 32–40): In cases where we want to wait for the reply message, we wait 10 seconds for a reply. If no reply was received within that timeout, we exit early indicating that no reply was received.
Handling reply (line 42): We take the body of the reply and parse that into a string. This is what we use to construct our closing HTTP response.
Completing message (line 43): We mark the reply as completed which ensures it won’t be reprocessed. This action essentially deletes the message from the reply queue.
Releasing session lock (line 50): We release the session lock, as we expect no more replies. In the case where we do get another reply with the same session ID after this lock is released, this will be received by the fallback handler.

The configuring of the fallback reply handler is done within the FallbackHandler. The ExecuteAsync() function is executed a single time on startup and serves to configure the fallback message handler logic. This handler is responsible for receiving all messages from the response queue that do not have an active session open (i.e. nothing currently holds a lock for that session/address ID). This would commonly be used in timeout situations where the async reply occurred after the sync cycle timed out (resulting in the close of the session).

On start-up, we first register a session handler (line 14–37) to be executed when we receive a fallback reply. The logic of this session handler is written following these steps:

Parsing message (line 19): Parse the reply payload into a string. If the message was a complex type, this would be where we would de-serialise into an object with that type.
Processing message (line 21–24): In this example, our “process logic” is simply writing some details to the console, however, some common things to add here are alerting/logging or triggering rollbacks.
Completing message (line 27): We mark the message as completed which ensures it won’t be reprocessed.
Completing the session (line 31): We release the session lock.

Consumer service

This service represents an internal application made up of multiple services, which interacts with a single request and reply queue. In this example, this internal application is approximated in just a single service (with N instances) which interacts with both queues.

Similarly to the producer API, the management of this services internal dependencies is handled using Microsoft.Extensions.DependencyInjection and is configured using the AddServiceDependencies() function. This function is executed on start-up and is responsible for:

Configuring a request handler (line 5–14): Requires two dedicated QueueClient instances to facilitate receiving requests and sending replies.

The configuration of the request handler logic is done within the RequestReplyHandler. The configured handler is responsible for receiving requests from the request queue and sending a reply to the reply queue.

The logic for the configured request handler is written following these steps:

Parsing message (line 18–20): We take the body of the request and parse that into a string and determine the session ID to use for the reply.
Processing message (line 22–25): We create the reply content and log a message to console. Some common things to add here would be a database query or sending a message to another service.
Building reply (line 27–33): Using the session ID and the message content we create the reply message.
Sending reply (line 35): Added reply message to reply queue with the desired session.
Completing request (line 36): Mark the initial request as complete, ensuring it won’t be reprocessed.

The path forward

With the gateway adapter in place, we have established a means to synchronously integrate with our internal async application. Now our concerns around customer integrations are isolated to the gateway/adapter and not the internal application, enabling us more freedom when considering architectural changes/designs. When considering an architecture design for your internal application, there are two main approaches:

Organisation First: This emphasises the effect organisational implications has when considering architectural changes. This can often lead to extending the current pattern to minimise the organisational impact.
Solution First: This describes an approach which prioritises the best solution when considering architectural changes, independent on the organisational implications.

Often these approaches are used in conjunction and represent some of the different forces at play when designing an application architecture. The key is finding a balance where we take a Solution First approach as far as we feasibly can and compromise to minimise the negative organisational implications.

With our changes to make the internal application integrations async, we are enabled to iteratively incorporate async behaviours without revising external integrations. The internal application can be simplified as a black-box which interacts with a single request/reply queue. This gives us more flexibility in revising the application architecture and also supports iteratively implementing those changes. These are important considerations when looking from either an Organisation First or Solution First perspective.

Message broker dependency

A consideration when utilising a message broker is the overhead of maintaining its infrastructure. There are often various configuration options available for message brokers, and it can sometimes be complex to navigate effectively. When using a message broker for inter-service messaging this means you depend on:

Availability: Inter-service messaging depends on the availability of the message broker, meaning we must ensure that it is highly available. Many message brokers have SLA’s describing its availability (e.g. Azure Service Bus SLA).
Latency (potentially): Operating async behaviour’s within a sync request/reply means we rely on the timely async reply to avoid a timeout for the HTTP request. It’s common for message brokers to not have latency specified in its SLA, so using a message broker like this does incur risk. However, there are steps we can take to minimise (not remove) this risk.

Emphasis on performance

Async applications are typically designed to be eventually consistent, meaning that given time the application will reach a consistent state. Because of this, the concept of timeouts or timely responses isn’t really addressed. With the adapter example, we implemented the timeout logic in the adapter (close to the gateway) which correlated to the expectation on the internal application to complete within that time. The internal application itself has no concept of the gateway’s timeout restriction. This emphasises two important considerations when building your internal application:

Performance: We must closely consider the performance of our internal application and ensure that it operates within the expectations of our integrations.
Visibility and consistency: We must be able to closely monitor our performance over time and have processes to ensure changes are optimised. In cases where our internal application completes after the timeout, we must appropriately handle this situation (e.g. trigger compensating actions to rollback).

I won’t dive into some of the options we have when designing for performance/consistency in this article, but some of the common approaches involve:

Client-side batching: Batching messages to reduce the number of calls to message broker.
Consumer concurrency and pre-fetching: Configuring consumer services to receive N messages in parallel and configuring pre-fetching to load up messages in memory for faster access.
Batching broker actions: Batching interactions with the message brokers to minimise calls. Example interactions are to send/delete the message.
Scale-out messaging entities: Multiple instances of messaging entities (e.g. queues/topics) for higher throughput.

Conclusion

Effectively adopting async messaging into your system can be a challenge, especially when navigating existing internal/external integrations. Ideally, we’d simply re-implement or remove these integrations, however, if that is not an option utilising a sync/async adapter could aid in giving you a path forward to incrementally change your internal application.

To those of you looking to adopt async messaging into your application, I wish you good luck and hope this helped you in some way. Thanks for reading!