Exploring gRPC on Google Cloud Platform

Diving into Google’s client-server communication talisman.

While gRPC is getting a lot of attention lately, RPC (the concept behind it) has been around for ages (~40 years).

As is often the case, yesterday’s solutions resurface to solve today’s issues.

gRPC and protocol buffers (more on that below) have been on my list of communication frameworks to explore for a while now, alongside Apache Thrift and Finagle from Twitter.

After a summary of what gRPC is and how it can help your engineering team, I will share the results of a comparison with a classic REST approach.

What’s gRPC?

Originally created (and later open-sourced) by Google in order to address performance and scalability issues in a high throughput and QPS (Query Per Second) heavy environment, gRPC shines where performance really matters.

It isn’t exactly new, but recently it has gained enough maturity to be seriously considered as an alternative to REST.

gRPC is a protocol agnostic client-server communication framework that uses binary serialisation by default.

It means that compared to REST, where we are passing JSON formatted payloads, with gRPC the data is all binary. We will come back to this later.

Interface Definition Language

gRCP relies completely on interfaces to define the services as well as the messages being transmitted from the client to the server. The default IDL is protobuf (protocol buffers). However, because the framework has been well designed, it also supports JSON, Wire from Square, Apache Avro or even Microsoft Bond.

Layer 4 Agnostic

As mentioned above, gRPC is also protocol agnostic. Depending on the use case, no need to stick with TCP — the framework can also be used with Quic, based on UDP when peak performance is the goal.

Benefits

gRPC is an attractive alternative to REST as it offers multiple advantages: better performance and easier implementation.

Canonical

With a classic usage of REST+JSON, the client-side implementation requires matching the (hopefully up-to-date) API documentation. More often than not, you won’t get it right on the first try, you may miss a parameter, a field, or get the version wrong. It happens to all of us.

Because gRPC forces the implementation to be defined in a client interface (called “stub”), you are less likely to get things wrong. That’s a great way to ensure consistency between the definition of the service and its implementation, client-side.

The main issue with REST is that it gives lots of room to interpretation. Hyphens in the paths? Snake-case vs. CamelCase?

These aren’t a problem with gRPC as the client libraries are generated for the consumer, based on the definition.

Performance

Performance is the argument of choice when it comes to gRPC and this is due to two things.

Because the data format is binary (as opposed to a JSON payload) it gets much lighter. Indeed, it’s not rare to see a payload with half of its size being the JSON syntax only.

The other reason is that gRPC uses HTTP/2. The main advantages of this new version is request multiplexing, allowing to make multiple requests within the same connection.
This great post by Ilya Grigorik describes how that works in more detail.

Backward Compatibility

As is the case with RESTful APIs, gRPC supports API versioning.

Let’s use a simple example to illustrate this. Assume the snippet below is “version 1” of our service.

service Ledger {
rpc GetTransaction (TransactionRequest) returns (TransactionResponse) {}
}
message TransactionRequest {
string id = 1;
}
message TransactionResponse {
string id = 1;
string type = 2;
}

Now, let’s imagine that we want to add a field in the response to the client and ship “version 2”:

...
message TransactionResponse {
string id = 1;
string type = 2;
string description = 3;
}

It’s that simple. If your server implements “version 2” but your client isn’t up-to-date yet, this field will simply be ignored. Obviously, it also works the other way around.

The number on the right side of the field definition is mandatory, it’s the field number and allows to identify a field in the binary.

A couple of suggestions regarding field numbers:

  • They must be unique per message
  • These can’t be changed afterwards — that would break the backward compatibility

Polyglot

gRPC is polyglot — the client stub can be provided in different languages (Python, Go, Java, Dart, …), with no limit whatsoever.

Multi-platform support is very likely to be your number one requirement when crafting consumable APIs.

How It Works

Define and compile your service description (requests, responses and endpoints) *.proto file for the targeted language, for instance, with Go:

protoc -I <DESTINATION> <SOURCE>/<FILE>.proto --go_out=plugins=grpc:pb

The command above will generate a file named *.pb.go, which will effectively contain all the classes. This is the file that we need to import both client-side and server-side.

Google Cloud Endpoints

Cloud Endpoints provides a highly scalable API gateway for the backend. It can be used with everything — App Engine, Compute Engine, Kubernetes Engine and even on-premise hosts.

Scalability

Cloud Endpoints will create an Extensible Service Proxy (ESP). This proxy (based on NGINX), will run in a container within the Kubernetes pod alongside your application container.

This is what makes Cloud Endpoints highly scalable, it automatically scales with your application.

Configuration

Configuring the ESP is pretty straightforward:

  1. Deploy the API endpoints definition (using the OpenAPI standard for REST and a generated service descriptor for protobuf) and Google Service Management.
  2. Add the ESP in the GKE template as below:
containers:
- name: my_esp
image: gcr.io/endpoints-release/endpoints-runtime:1
args: [
"--http_port", "9000",
"--backend", "grpc://127.0.0.1:50051",
"--service", "<NAME>.endpoints.<PROJECT_ID>.cloud.goog",
"--rollout_strategy", "managed",
]
ports:
- containerPort: 9000

Authentication

Cloud Endpoints offers built-in authentication via different mechanisms (Auth0, Firebase, JWT and API Keys).

For the sake of this project I chose to keep it simple and only use API Keys.

Monitoring

I must admit I was astonished by how rich Cloud Endpoints is, particularly when it comes to logging and monitoring. A few years ago, Google acquired Stackdriver and it’s now part of the Google Cloud ecosystem.

In a nutshell, Stackdriver is a multi-cloud logging and monitoring solution.

Most of the metrics that you’d want from an API gateway are available in Stackdriver using Cloud Endpoints. It even monitors specific details about gRPC streams.

Note: all the numbers below come from Stackdriver.


Goal

My goal is to compare gRPC performances against a classic RESTful implementation with JSON in a concurrent environment and with a relatively complex data structure (multiple nesting levels).

My focus is not just on latency— CPU load, memory footprint and bandwidth have been measured and compared as well.

Performance is not the only factor that matters — client-side integration, backward and forward compatibility are also crucial.

Implementation

I chose to write a simple ledger application that can do only one task; return a list of 50 transactions (therefore involving lots of serialisation).

Data Structure

When it comes to data structure, I used the REST API specifications of the TESOBE|Open Bank Project.

Architecture

For the purpose of this comparison, I decided to set up two different Kubernetes clusters — one for gRPC, another for JSON/REST.

They’re the same in terms of architecture, the only differences being the ports and HTTP version.

Both servers are written in Golang and the code is doing the exact same thing. Apart from the serialisation/deserialisation process, there is no difference.

gRPC backend architecture.

Testing Conditions

I triggered 1000 requests with a concurrency of 10 (n=1000, c=10). While being miles away from a high QPS environment per se, it already gives an idea of the differences.

I could have gone much higher than this, I just don’t think it would have made a big difference — the cluster automatically scales up the service (as well as the ESP).

Obviously the tests have been performed with the same connection within the same timeframe.

Cluster Configuration

Again, both clusters are totally similar (1 node, 1 pod). The only differences are the code and the Cloud Endpoints specifications.

Results

Latency

Response time, toe-to-toe.

The results above were to be expected — it’s faster and this is no surprise. The results may obviously vary from one test to another, but they remained consistent in the different tests I ran.

CPU & Memory

Let’s have a look at different metrics. JSON parsing is known to be memory expensive, so I was really looking forward to seeing the difference there.

CPU & memory usage

As opposed to the latency results, I must admit I was surprised by this outcome — the results are very similar. In fact, the gRPC service seems to consume more memory, which is interesting. I thought it could be a one-time thing but it was consistent across different tests.

Note: there are plenty of ways to improve JSON parsing. However, for these tests, I used encoding/json from Go without any specific optimisation.

Bandwidth Usage

As you can see above, for the exact same data, the fact that we are sending a binary to the client instead of a JSON payload makes a huge difference with a 46% decrease in the response size. If you think about it, for 10k requests, that is 48Mb instead of 89Mb of egress traffic.

Imagine the benefit for IoT and mobile use cases.

Wrap-up

Best Of Both Worlds

As of today, it would be impossible to pick gRPC exclusively — we are still living in a REST world.

The great news is that there’s no need to choose one or the other — it’s totally possible to make a project operating in today’s world as well as tomorrow’s, thanks to initiatives such as grpc-gateway.

It’s also worth mentioning that when compiling the proto file, JSON tags are added as well, which means that the generated classes can be used for both binaries and JSON payload.

Documentation

If you‘re used to OpenAPI or API Blueprint for documentation, you will need to think a bit differently. Your proto file acts as a documentation. It’s not as pretty and user-friendly, but what you see is what you get.

As you’re providing libraries to your consumers, the approach is different, think about it as an SDK documentation rather than a classic API documentation.

There are tools out there to generate HTML pages from a proto file but there is clearly some work to be done in this area.

Google Cloud Platform

I played around with Cloud Functions (the equivalent of AWS Lambda), Firebase and DataStore in the past but this was the opportunity to explore the very well known GKE. I was impressed by how easy it is, how well the services are work with each other and how everything works seamlessly with gRPC.

Tooling & Community

Relatively young, yet backed by a strong community, interceptors is a great showcase of gRPC’s tooling possibilities.

Interceptors allow you to expand gRPC potential and fit it to your own needs.

There are few very interesting ready-to-use interceptors allowing you to add things like monitoring with OpenTracing or Prometheus.

If you’re interested, I recommend looking at this repo. If you can’t find what you need, the good news is that you can create your own interceptors!


I remember playing around with protobuf about 2 years ago. One thing for sure is that it wasn’t that mature enough to be a solid alternative to JSON.

Google has been investing tremendously into gRPC. All the Google Cloud Platform services provide gRPC interfaces additionally to REST APIs. In fact, some services, such as BigTable only provide gRPC support.

Regardless of the results above, there‘s one thing that’s is difficult to measure yet really perceptible — the smoothness of updating a service while ensuring backward and forward compatibility. It’s easy and quick and that’s where the benefit for an engineering team will be.

If you’re interested in learning more about load balancing options, optimisation, other IDLs —there’s so much more to learn on the subject that it’s impossible for me to cover everything in a post.


Thanks for reading :)