In our last article, Alex Anikiev, Arman Rahman and I introduced a text exploration pipeline based on microservices. The pipeline contains a variety of processing steps such as extracting plain text from a document, cleaning up and canonicalizing the text, extracting insights, and so forth.
In essence, the system described in the previous article implements a pipes-and-filters architecture orchestrated by HTTP calls: each microservice receives an input document via a request, performs a stateless document transformation and then sends its output to one or more independent downstream processing steps. This architecture has several advantages including speed of development and ease of maintenance: frameworks such as Express, Sanic or Spark enable us to quickly spin up web services and single-responsibility processing layers require less code to be understood or re-tested in the face of changes to any particular processing step.
However, the system has one big flaw: an orchestrator process has to manage the HTTP requests to the various services and pipe data between the inflight calls. This not only introduces a single point of failure but also makes it difficult to cleanly implement reliability features such as retries or handling intermittent failures. A natural way to improve system reliability is to introduce task queues between the processing steps, but this usually requires extensive code changes.
In this article we’ll show how we improved the reliability of the pipes-and-filters architecture via task queues without any service-side code changes by leveraging the ambassador container pattern, thus unlocking the best of both worlds: an easy to develop text processing system that also is fault tolerant.
Before diving deeper into how we made our text processing pipeline more reliable, let us first introduce the ambassador container on a conceptual level. Brendan Burns’ book Designing Distributed Systems defines the ambassador container pattern as follows:
An ambassador container brokers interactions between the application container and the rest of the world. […] The two containers are tightly linked in a symbiotic pairing that is scheduled to a single machine.
Designing Distributed Systems also provides concrete usage examples for the ambassador container pattern such as service sharding: when the application container makes a request to an external service, the ambassador handles the logic required to dispatch the request to the correct shard of the sharded service. This allows the application container to remain simple since it’s not burdened with the sharding logic.
In the example above, the ambassador container hides the sharded nature of the dependent service from the application: the application does not need to be aware that it’s talking to a sharded service such as for example a Redis cluster where data grew too large to fit into the memory of a single instance.
Following this example, we can use a variation of the ambassador container to improve reliability for our pipes-and-filters text processing architecture by transparently adding task queues to the system. Our text processing application container remains unchanged as a HTTP service and the ambassador container takes care of the integration with a queuing system:
- Continuously, the ambassador container listens for messages on an input task queue.
- When a message is received from the queue, the ambassador converts it into a web request to the text processing container and executes the request.
- The ambassador container waits for the response from the text processing service and on receipt sends the appropriate messages to notify one or more downstream task queues of the result.
Note that the addition of the queues by construction improves the reliability of the system. For one, task queue implementations can be set up for high availability, thus removing the orchestrator as a single point of failure. Additionally, the task queues also protect against intermittent outages of the text processing containers: a message will only get deleted from the queue once the processing step is complete and all downstream messages have been enqueued; so if the processing of a message crashes part way through, it will simply be retried by the next available queue listener.
We decided to implement the ambassador container pattern described in the previous section on top of Azure ServiceBus, a highly scalable, fully managed distributed task queue. We chose ServiceBus instead of other managed queues offerings such as Storage Queues since it supports more advanced features such as dead-lettering (which automatically purges unprocessable messages from the queue to avoid wasting resources on futile processing attempts) as well as geo-replication and failover.
We additionally decided to implement the ambassador container in C# since (unlike the Node SDK or Python SDK) the DotNet Core SDK for ServiceBus exposes a push-style API via AMQP. This simplifies the queue integration code by removing the need for implementing a message polling loop.
Setting up a simple tool that listens for ServiceBus messages, creates a post request and forwards the response to one or more downstream queues is fairly straight forward, as demonstrated by the snippet below:
The snippet above necessarily will have to contain some domain-specific logic to convert between a ServiceBus message and a HTTP request (omitted here for brevity), however, if the text processing steps all expose a unified interface such as “expect a posted plain text file and return a text response”, the same ambassador container can be reused for all the processing microservices. Beyond the conversion logic, the code skeleton above is fully general and will enable any type of application to leverage transparent ServiceBus queues for improved reliability.
After implementing the ambassador pattern as described in the previous section, we’re now ready to deploy the container. First, we can use a simple multi-stage Dockerfile to build and run the ambassador container:
In this article we discussed how to leverage the ambassador container pattern and Azure ServiceBus to improve the reliability of a pipes-and-filters text processing system by transparently introducing task queues. Additionally, the pattern can also be used for leveraging the ServiceBus push-style AMQP API from languages where the ServiceBus SDK does not implement AMQP, such as Node or Python.
The approach described in this article can be reused in any system where fault tolerance and automatic retries are important but where the application code should not be burdened with these concerns.