Distributed API Platform

In Socialbakers we have many teams that do similar tasks or cooperate on similar ideas. We have a team that grabs and stores all the raw data from the social networks. There is a team that aggregates and calculates metrics on top of all those gathered data. We have product teams developing specific products: Analytics, Builder etc. We have a data mining team that does research based on those data. There are teams that develop per-customer solutions with attention to the unique needs of each customer. We have single-purpose websites that display some data in a really specific manner (Engage conference, US elections, Cheermeter for Olympic Games, etc…). And all these teams need to cooperate, exchange data, identify customers — simply said, they need to communicate.

By communication we mean something like this — about 5 million requests per hour.

To achieve this goal, we have developed a scalable system that is used across all the products, websites and internal tools. It is a REST API platform (in corporate speech we can call it ESB — Enterprise Service Bus) which consists of two essential parts:

Clients (workers): individual microservices that provide some functionality through REST endpoints

Brokers: Services which are controlling the whole system. Because we mostly use node.js, this platform is also written in JavaScript (CoffeeScript).

HERA

For simplicity and avoiding confusion when talking about this platform we had to find a suitable name for this project. Hera came to our mind as it is a brand of a butter used for baking, with a nice slogan “Baking is fun, Hera is baking” — and we are Socialbakers, right…? Another rumor says, that it’s an abbreviation of Hate Everything Right Away. But since we are a friendly, non-sarcastic collective of happy developers, this definitely can’t be true.

Brokers

Brokers are the central part of Hera. Because we need the whole system to be HA (Highly Available), there are many instances of a broker process (currently there are 10 of them, one per each application server) and they are synchronized via ZooKeeper. Upon the start, they register themselves as ephemeral nodes where they store their IP address, port and supported protocol. When any client (API) wants to connect to the platform, it queries the ZooKeeper for all currently running brokers and connects to all of them.

The main purpose of the broker is to keep track of all connected workers and to build a routing table to their endpoints. Broker itself then provides a public interface for accessing these endpoints. When a request comes to a broker, it finds a client that is able to serve it and forwards the request. The broker also tracks the status of each worker (RSS, CPU, ping) and takes into consideration all these factors when forwarding the request.

Clients

As mentioned above, client upon the start fetches all the brokers from the ZooKeeper and connects to them. After that, during the handshake procedure with the broker, it sends informations about itself — i.e. endpoints that the client is able to serve. Then, every 2 seconds the client is queried by the broker to report its status. The client responds with its CPU and memory utilization so the broker can use those stats when forwards an incoming request.

HAProxy

HAProxy serves as a load balancer for incoming requests. It uses simple round-robin to route the traffic among all the brokers.

Flow of the API call

When a HTTP request comes to Hera, (it can look like: POST http://api.ccl/0/metrics-api)
First it arrives on the HAProxy. HAProxy picks one broker and proxies the request to him. Broker (which acts like standalone web server — through express.js) receives the request, validates if the call is allowed (we have a token-based authorization and monitoring of all requests), finds appropriate worker and dispatches the request to the worker. The communication between broker and worker is accomplished via JSON-RPC2 protocol. The worker receives the request, finds out what to do (fetches data from our databases, does some computation, merges data from multiple sources…) and sends the result back to the broker — as a response to the JSON-RPC request. Then, the broker adds some metadata to the worker’s response, measures the time that took worker to respond, does some logging and finally passes the response back to the caller (as a response on the initial HTTP request).

As I already mentioned, the brokers are written in CoffeeScript and we have Clients written for JavaScript and Java. Clients are written as a library (npm module for JS and Maven JAR for Java). The node.js module is written in a way that very much mimics the express.js approach. Have a look:

hera = require "@sbks/hera"
database = require "lib/database"
heraClient = hera {secret: process.env.SECRET, zk: process.env.HERA}
# Hera endpoint this worker is capable of serving
heraClient.get "/0/metrics-api", (req, res, next) ->
database.find {id: req.data.id}, (e, result) ->
return next e if e
res.json result
# register this worker on Hera, and start serving requests
heraClient.connect () ->
console.log "connected to hera"

As you can see, it’s really close to writing an express.js “frontend” application. We needed to simplify usage of this library as much as possible, so our developers could easily use it.

Evolution of Hera

We came across many not-so-good solutions and dead ends while developing this platform. For example: in the beginning, we planned to develop also a PHP Client for Hera — as we used to have all of our services and products written in PHP. Because having a long running script that would be connected to Hera and serving many tasks in parallel would be really tricky in PHP, we had an idea of each worker listening on a specific port. Afterwards, when Hera would be processing an API call, it would send the request to the PHP application (with upfront Nginx listening on that port), PHP would handle the task and respond back — like most of the PHP websites work.

We actually never got to writing the PHP Client — but the port-listening idea remained. We haven’t had Zookeeper back in that days, so all APIs were listening on some port and they just broadcasted to all brokers: “hey, I’m XY API and I’m listening on 10.11.0.10:5678”. Then, each broker initiated the connection to this worker.

This turned out to be quite pain in the ass. As we had a limited number of application servers with multiple workers running on each of them, the port number had to be unique. Each API worker (and we have hundreds of different workers) had associated one port. Those all ports were synced on our internal wiki page — so adding a new API meant that you needed to edit this wiki page (and you were keeping your fingers crossed that nobody else was adding his new API at the same time). Plus you couldn’t scale up more processes of the same worker on one application server — as you would get EARRDINUSE error.

Reversing this process — and setting up Zookeeper to keep the list of all brokers definitely simplified the process of creating new workers and overall management of Hera.

So that’s it. Hera is one of the most crucial services (besides databases) that allows our products to communicate, exchange their data and integrate together. In the next articles, we will try to cover one really important feature of Hera and that are “Remote Calls”. That’s a process, when one API worker needs to query another worker. So instead of doing a classic HTTP request that would go through the flow described above (HAProxy, brokers…), we have a way how to bypass part of this procedure. We will also write about how do we monitor Hera — response times, request counts and other metrics of both — brokers itself and individual workers.

Like what you read? Give Martin Bydzovsky a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.