Using Kafka as an async, load balanced, server-controlled API framework

Anoop Dixith
3 min readAug 23, 2020

--

In the traditional client-server API architecture (REST architecture, say), the clients call the shot. When I say clients call the shot, I mean

  1. They can send a request to the server (in other words, call the API) anytime they want,
  2. However many times they want (even DDOS-ing the server),
  3. With whatever params they want.

So, server is sort of helpless, adhering to rule based mechanism like debouncing, throttling etc. Additionally,

  1. Server does the authentication/verifying too, which is an added load.
  2. There’s no way for the server (except using different tokens in auth maybe) to prioritize requests (say I want my emailing-app server to first handle API calls related to sending emergency mails and then promotion mails, if those requests come concurrently).
  3. Server scalability is so hard because of server’s lack of control on incoming requests. From clients’ perspective, any 503 is a wasted effort.
  4. A load balancer is needed as the first line of defense for the server.

Now, this whole flow could be reversed and server could be made the boss by using Actor model (say by using Akka like frameworks). But interestingly, I was thinking (and currently implementing) more or less the same could be achieved using Kafka as the API broker.

Here’s how it works: Clients call the APIs on a server only by producing a message to a particular topic asked by server. Basically, broker + topic
is the equivalent of an API endpoint. This means that the API authentication is basically Kafka auth, so auth problem is delegated to an already tested module. Then the server pulls the message based on a pre-configured priority if any (topic), and responds to it by sending back a message. This way, server is never over-loaded and there won’t be 503s because of server load. (Sure, there will be latency, solvable by scaling based on offset in topic). Additionally, the load balancer is basically the server itself, as it can implement an algorithm to process the calls in the way it wants (Sampling, for example). Incidentally, this way, the server is at a better position of intercepting requests, tracing, monitoring etc. Furthermore, now the clients have a better way to retry, as we have an additional proof of the event in broker that could be utilized if needed.

Obviously, the biggest issue here is about server’s API responses. There are many ways to get through this:

  1. Create ephemeral topics that producer (client, the API caller) subscribes to. This topic is so ephemeral that it serves one client, for one message. The data and topic itself are pruned once it’s consumed. But ephemeral topics might be an overhead, and I’m not quite sure about the metrics on its overall efficiency. This is the approach I’m currently implementing but I’m yet to assess its efficacy.
  2. The API pushes the response to the same one topic, which the clients subscribe to (within one call, making it synchronous if that’s a requirement). They use the token they had sent during calling to both identify as well as decode the responses that have come to them.

Two of the common concerns that are raised about this design are as below:

  1. The broker doesn’t contribute much because there is no broadcast: I disagree with this, because with multiple clients calling multiple APIs exposed by multiple servers, I believe broadcast forms an essential part of the network.
  2. There is no requirement for storage: I like to point out that there is no requirement for storage as long as we treat the request-response system as the plain old request response system. But an enhanced approach that treats request-response more like an event stream will have the same requirements of any other stream - replay-ability, stats etc.

Some discussions on the same:

Confluent Community Slack and Reddit

--

--

No responses yet