Tired of rogue REST servers? Learn how we scrapped them by upgrading to gRPC

Published in

Strise

7 min readMay 4, 2020

gRPC dealing the knockout punch to our old servers. Yes, I’m very bad at drawing

At Strise, we strive to give our customers the best possible tools to tackle their everyday challenges. For us to be able to do that we need support from the best and newest AI tools and frameworks out there. As some may have experienced, Python is often the language which first has a semi-maintained implementation of new techniques within AI.

Unfortunately, in terms of staying ahead of change, our main backend (including the text analyzing pipeline) is written in Scala. Scala has the advantage of being able to use any Java library, but even with that we are still be unable to take advantage of all new libraries as seamlessly as we want to.

The old way

To still have the main backend in Scala and be able to use all new tools developed in other languages, we need a way to communicate between our Scala code and applications running code written in other languages. Up until now we have chosen to do this over HTTP where the Scala application POSTs requests to an endpoint, which processes the request and gives its response.

This solution is rather simple on the surface, but beyond that it has several drawbacks. Many of our frustrations with this can be attributed to the lack of inter-connectivity between the server and the client. For the server-side setup, we have often chosen a message-queue system like RabbitMQ for incoming requests. It is the choice that makes the most sense to me; if a client sends a request to the server, it shall be put on a queue until a worker is available to handle the request.

One of the problems of solving it like this is what happens after the client has made one or several requests to the server but suddenly no longer waits for the response. Maybe the client has a shorter timeout period than the server, or maybe the client process has been shut down. The message would still be present on the server queue as the request has not been completed yet, i.e. the server would have to process the request even if there are no one to send the response to. And what happens if the server somehow crashes before the client get their response? The client will forever be waiting for a response that never will arrive.

The new way — gRPC

Fortunately, there is a better way. gRPC is a tool developed by Google that enables remote procedure calls (RPCs) across some form of network layer. One would still have a client and a server as before. However, much useful server-client communication that go beyond response/request has already been handled by the developers of this framework. The framework aims to make a remote call appear the same as it would if you were calling the same method locally from your own computer. As mentioned, one still has a client that calls a server, and this will still place the request in a queue to be processed when possible.

The difference now is that the client and server can more easily keep track of each other’s status than with a simple REST endpoint. For example, if a client crashes or the timeout set by the client is exceeded, the request is automatically aborted by the client. When a request is aborted, the server notices that the client no longer waits for the response, so it removes the request from the processing queue. To the same end, if the processing of a request crashes on the server side, the exception will be propagated to the client without affecting the server. This way the client will know exactly what went wrong during processing of the request and be notified right away without waiting until the timeout is exceeded.

One of the best pros with using gRPC is that one does not have to rely on encoding and decoding objects in JSON between client and server. Each gRPC server pre-defines its methods in the proto3 language. All methods are defined with a name, input- and output-type, i.e. it is an interface.

This interface is then used as a base for generating code in any of the supported languages and can be imported and used, just as code you have written yourself. Using this generated code, one can construct the accepted objects and call the method defined in the interface.

A brief example

If I were to create a server to serve a model predicting an article’s category (economy, politics, sport, …) given a BERT embedding, the interface could look like this:

syntax = "proto3";import "commontypes.proto";service Predictor {
  rpc Predict(Embedding) returns (Prediction) {}
}

Here the method “Predict” accepts an object of type “Embedding” and returns a “Prediction”. The types “Embedding” and “Prediction” are defined in another interface so that the types can be shared among services.

syntax = “proto3”;message Embedding {
  repeated double embedding = 1;
}
 
message Prediction {
  int32 prediction = 1;
}

This interface defines the types “Embedding” and “Prediction” and says that they are a list of doubles and an integer, respectively.

Using code generated by these interfaces you no longer have to deal with encoding and decoding JSONs, as the gRPC implementation in your chosen language handles all serialization for you!

What are the implications?

Another massive pro of using gRPC is utilizing HTTP/2 with its long-lasting connections. With the previously detailed approach, clients would initiate a new connection to the server for every request it had. Now, the client can reuse the same connection to the server for every request it has until the server is shut down. This is also part of the reason how clients can know if the server goes down while waiting for a response (the connection is broken).

Our applications are run in Kubernetes, where we define each as microservices and deploy them to the cloud. Kubernetes can detect that one microservice is under heavy load, and create another one just like it to help share said load. Keeping that in mind while talking about lasting connections, we stumble upon a new issue when load balancing between multiple microservices of the same type. The built-in load balancer in Kubernetes can only balance load based on routing new connections to the machine with the lowest load.

In the scenario where a new replica of a microservice is spun up as a result of high load on existing services, current clients already connected to the service(s) would not use the new service as Kubernetes can only load balance new connections. A solution could be to reinitiate all current connections when an extra service is created or give connections a max age of a few minutes.

However, we want to keep connections alive for as long as possible as initiating connections is expensive. The solution to this problem is creating a so-called service mesh with for example Istio or Linkerd. In essence, the service mesh enables us to connect to a service mesh controller instead. This controller keeps a lasting connection to clients and servers and distributes requests from clients across all server replicas!

The somewhat simplified new setup, with load-balancing managed by Linkerd

This is very neat as the client never has to worry about anything else than connecting to a single address and posting their requests to that one address. To learn more about how to do this, check out these posts: gRPC Load Balancing on Kubernetes without Tears (Linkerd), Using Istio to load-balance internal gRPC services (Istio).

Conclusion

I have to admit upgrading to gRPC has been challenging. It took some hours of tweaking for it to run stable in our high-performance pipeline. But it was definitely worth it! We are left with more transparent and predictable services, with much better error handling and logging. And to top it off, the same services run faster and is more efficient in terms of resources.

In the startup world you either move quickly or you die. Using gRPC to help us quickly integrate tools from any language straight into our backend is then an enormous win for us. And having pawned off such a large part of our intra-system communications to a framework maintained by competent people means we can spend more time doing what we want to do: creating the best product for our customers!

Some of the amazing people I work with everyday!

Want to work with gRPC or just AI in general? We are hiring! Send us an email at jobs@strise.ai