Building scalable micro-services with Kubernetes, GRPC & Linkerd

A practical guide on how to hack together auto-scaling Docker micro-services on K8S that communicate via GRPC and use Linkerd for scalable load balancing & service discovery…until Google supports it out of the box.

Warning: this is going to be a super technical post. It’s not for everyone but hopefully helpful to someone! So if you didn’t understand the title…watch this instead or donate some money to Wikipedia!

Update: Google got back to me about this problem and it seems we can solve it using an Ingress Controller instead of LB. I’m researching now and will write another post if we can solve it that way!

1. Introduction

In this post I will give a quick glance at how we recently revamped part of our Data Processing Pipeline to include a new NLP (Natural Language Processing) Micro-Service. Kubernetes, Docker and GRPC “should” have supported our use case out of the box but we ran into some annoying scaling issues. It took quite some head-scratching to figure out a temporary solution that’s “production ready”. To save others from the frustration…we’ll show you how we did it. And if you have any tips and suggestions; please comment (or send us your CV!).

I’ll start with explaining our design decisions, then I’ll give your our current implementation and I’ll end with some reflections and ideas for future improvements.

2. Background

Why Docker Micro-Services

If you’re completely unfamiliar with Docker or Micro-Services you should quickly read this article. There’s 3 great things we gained from changing Monolith “MapReduce” to a Micro-Service pipeline:

  1. We can implement the NLP model in any language we want. By abstracting the implementation into a Docker container and giving it a nice API outsiders no longer need to care about the implementation details. As long as it complies with the API specification. This is where GRPC comes in…but more on that later.
  2. We can easily replace, upgrade and A/B test our model. We simply swap out the Docker container with a new version. We can even do this at runtime which is especially handy when processing streaming data or with online machine learning models.
  3. Better scaling. This isn’t always the case but in this instance it is. It’s much more efficient to have 100 different NLP-micro-services trained for different languages that scale on demand than having to load all 100 models into memory inside our monolith. Though this could have been solved there it would have resulted in a lot of ugly load/unloading code that’s not really part of it’s core purpose.

The downside is that theoretically it’s a bad idea to “clog down” efficient MapReduce jobs with RPC calls but actually we found that using Apache Beam it works perfectly and allows us to use Beam more as a data orchestrator that can do ETL; rather than a pure MapReduce implementation.

Why K8S

Why do we use Kubernetes? Because it’s f#”king awesome! I remember the dark days when we tried with Docker Swarm and even (god forbid) Amazon’s attempt at a container orchestrator. It wasn’t fun. But Kubernetes has given us all the features we need (and more) to deploy and SCALE these micro-services.

We don’t run our own Kubernetes cluster but use Google Container Engine (GKE). Of course if you think that managing your own Kubernetes cluster is a good use of your time than you can use AWS or something else…but we have more important things to do so GKE it is!

Sadly, despite how integrated K8S is into GCE, we still found we had to hack together a good way for load-balancing our micro-services…but we’ll get back to that.


There’s lots of benefits for using something like GRPC over a plain REST/JSON API for your micro-services. For starters, GRPC is a binary protocol so much more efficient. Also, you only specify your “service API” and then generate language bindings in Java, Python, Go, Rust…whatever you need. It’s much nicer and much less error prone than sending around some JSON and hoping everyone is following the latest specifications. Definition and Implementation are separated.

Naturally GRPC isn’t the only one that promises these things…Apache Thrift is another common contender. You don’t have to agree…but i think GRPC is the future.

Growth vs. Legacy

Why Linkerd

As I hinted earlier we had some problems scaling our Micro-Services. The issue we ran into is that GRPC uses the new HTTP2 protocol to communicate. Which is great, it’s much faster and efficient than HTTP1.1 . But pretty much every “standard” load-balancer that we’ve seen used with Kubernetes (including Google’s and AWS’s ‘hardware’ load balancers) are L4 load-balancers. But to properly load-balance HTTP2 traffic we need a L7 load-balancer.

Without too much detail: A client would open a connection to the L4 load-balancer which would connect it to a backend server. And then the client would keep re-using this connection to make all of it’s requests. This essentially means the traffic isn’t load-balanced at all. With a L7 load-balancer the routing is much smarter and all requests get send to different servers.

So we had a look at several different load-balancers that support L7. And although Lyft’s Envoy was a strong contender we settled for Linkerd because it seems more battle tested and there was at least some documentation to indicate we could make it work with Kubernetes and GRPC traffic.

3. Implementation

So now you understand the different components and why we chose them. Here’s an example of how to tie them together and deploy an Auto-Scaling, Load-Balanced, GRPC Micro-Service on your Kubernetes cluster. If there’s enough demand I can turn it into a GitHub repo but for now…just fasten your seatbelt and read on!

4. Conclusion

First of all… I’m not 100% sure yet that this works entirely as it should. Some initial testing indeed shows that traffic is routed between all pods but it’s still all a bit magic to me. Granted I only looked at Linkerd’s documentation for 10 minutes and they have a shitload of features that I haven’t even touched on. So I’m sure there’s room for improvement.

One thing I would like to do is instead of deploying Linkerd as side-carts to the app, deploy them as DaemonSets which means they only run once per machine. To accomplish this we need to extract the service name requested in the HTTP2 header and route traffic accordingly. However with our current naming/namespace setup this was difficult to do so we tested it like this first.

The other thing I would like to add is have Linkerd send scaling commands to K8S based on demand instead of using the Horizontal Pod Autoscaler that a.t.m. only supports measuring CPU.

And lastly…this isn’t a final solution. Kubernetes and GRPC are developing extremely fast and I’m sure that deploying, scaling and load-balancing GRPC micro-services will be supported out of the box very soon. I’ll be the first to switch ;-)

Thanks to Gustav Maskowitz for his everlasting dedication to help us build awesome stuff on K8S!