Episode 8: The one with Asynchronous Distributed Systems(Part 1).

Concurrency and Distributed Systems: A guide with Kubernetes

--

So far we have been studying synchronous distributed systems. The client sends a request and waits for a response. On the server side of stuff, the servers respond to the requests immediately and a communication link is established between the client and server. That is one way of communication between clients and servers. Another popular way is asynchronous communication.

In asynchronous communication, the client sends a request to the server and the server, instead of providing the actual response right away, sends back a response to the client confirming that it has received the request. Server will queue the requests. Client then polls the server for the actual response. Server might still respond with something like come back later, but eventually the server will respond with the final response. There is no global clock by which components can coordinate. Synchronous communication is blocking and asynchronous communication is non-blocking.

Synchronous vs Asynchronous Communication

As you can imagine, asynchronous communication is a bit more complicated than synchronous communication. It involves following a protocol between client and server and it is not as straightforward as synchronous one. That is not to say that synchronous systems do not have a protocol. They do, but it is usually hidden behind operating system calls. In distributed systems, nothing is really synchronous. We just create an illusion of synchrony by waiting.

Asynchronous distributed systems are prevalent in business and have their own advantages. Asynchrony would remove the pressure of immediate execution from the server. Server can take its time(still respecting the SLA) and respond when it can. Facebook’s famous query engine, Presto, is asynchronous.

Note that many services may look synchronous when you deal with them, but in fact they are asynchronous in implementation. Client libraries can hide the asynchronous nature of the server and make it look like the server is synchronous. They do that by busy-waiting or blocking IO. Hence using libraries we can encapsulate the implementation details and make it simpler(or synchronous) for clients to communicate with asynchronous systems.

Blocking Client Library

Asynchronous distributed systems are more appropriate for non-interactive use-cases. They are not necessarily bad for interactive or dashboarding scenarios, but they could pose some challenges.

In this episode we want to see how we can design and implement an asynchronous version of the Sum of Primes service.

Architecture

Every asynchronous distributed systems follows a more or less generic architecture. Obviously, due to business needs, more components can be added to the whole system, but with respect to asynchrony, they are all like this:

Asynchronous System Architecture

Major new components in this architecture are:

  • Request queue: to queue the requests for processing later.
  • Response store: to store the responses to be served to clients when they poll.
  • Progress service: to inform clients about the progress or status of their requests.

The rest of the components are always there in a distributed system. However, these three components are optional. They do exist in any asynchronous distributed system in one form or the other, but their full existence is not detrimental to the function of the system.

Queue

The implementation of these components could vary significantly depending on what kind of reliability we want from the system or what our SLA is. For example, we could leverage an in-memory queue or a fully-fledged queue service(SQS, Kafka, etc.). Obviously, using something like SQS is a must if we want a very reliable and fault tolerant system. But if we relax the requirements on reliability, we can get away with an in-memory(ephemeral) queue. With that design choice, clients will have to retry in case of any server failures.

Response Store

Moreover, we can store responses in an in-memory cache vs a persistent store such as a Relational DB, NoSQL or simply a filesystem such as S3. Either of these decisions will have its own implications with respect to the system’s reliability. If we store responses in memory, we have two limitations:

  • Number of responses that can be stored. Hence we have to set a time-to-live(TTL) on responses.
  • Server failure will result in re-computation of associated requests(through client retry or server auto-retry).

With a persistent response store, we have much less limitation on the number of responses that we can keep. Also, we can have a much longer TTL.

Progress service can be a first-class component in the design or can be hidden in the server’s implementation. It really depends on requirements and whether we want to lift the pressure of polling from our servers.

Load Balancer

On the load-balancer side of the story, asynchronous systems create new challenges. Since the communication channel is non-blocking and has to be re-established again and again, load-balancer might have to take into account the client and server affinity depending on prior decisions we have made in the architecture. For example, if responses are stored in the memory of each server, then we will have to know which server we need to go to from the load-balancer when a client does a poll. This is usually done by load-balancer using a cookie stored in the request’s header.
But if the responses are stored in a persistent database, any server instance could potentially return the response and there is no need for any affinity mechanism.

To summarize, requirements and design decisions on these components will impact various details of the implementation and server/client’s interactions.

Throttling

Similar to synchronous model, clients can still abuse/misuse the system by excessively polling for responses. To remedy that, we have a few options:

  • Develop a client library which will take care of sending requests and receiving responses. This library would decide how often it should poll and follow a safe protocol when communicating with the servers.
  • Implement throttling on the server side.

Doing at least one of these options is a must if we want to have a secure and reliable system.

Sum of Primes Asynchronous Distributed System

Requirements

We need the same functional requirements that we presented in episode 1. But with respect to asynchronous distributed system we have the following (non-functional) requirements:

  • Simple replicated servers(no workers)
  • Simple load-balancer(no affinity)
  • TTL of 3 min for responses
  • Reliability requirements are loose. Clients should retry in case of any server’s failure after 3 min.
  • No progress service. Any server should be able to provide responses or their statuses.

With those requirements in mind, let’s look at the architecture of the system:

Sum of Primes Asynchronous System Architecture

As we can see in the above figure, we have omitted the progress service and the queue, because this architecture would satisfy our requirements.

For response store we can use Redis, MongoDB or CouchDB. Once server receives a request, it creates an entry in the response store with the request id. Initially, it set the value of the entry to IN_PROGRESS like so:

a6f39925-4234-450e-a25b-cc3991c68d77 -> {
"requestId": "a6f39925-4234-450e-a25b-cc3991c68d77",
"status": "IN_PROGRESS",
"payload": null
}

And it returns this response to the client immediately. Using the provided requestId, the client can poll for the response later.

API Spec

Here is the API spec for this service in an asynchronous model:

POST /sumPrime 
BODY:
{
a: Positive integer
b: Positive integer
}
RESPONSE:
{
requestId: String
status: String
payload: String
}

In order to poll for the status of the response, one should use the following GET endpoint:

GET /sumPrime/{requestId}RESPONSE:
{
requestId: String
status: String
payload: String
}

Client must poll until the status is FINISHED. Once that happens, the payload fields will have the actual response in it.

The poll request can go to any server instance, since the response and its status is stored in the response store using the requestId.

Here we can see the value of client libraries in asynchronous. They can hide these implementation details from the client and can make it look like we are dealing with a synchronous one.

In the next part of this episode we will go through the implementation details and how to use Kubernetes to deploy our system.

--

--