RPC Thunder Dome

Part I

Everyone is moving to microservices today. It makes sense. Teams no longer must fight over a giant, bloated monolith. You can deploy code as quickly or slowly as you need to, without blocking others.

In the past, when you needed to scale one part of your application, you had to scale the entire monolith. Now, you can just scale the individual services as necessary. And when a team wants to use another team’s service, they aren’t locked into a single language. They just need the service contract to call it over the network.

Most people build their microservices using REST over HTTP, and JSON. REST and JSON are convenient. They’re being used for your web site, so why not use them for services too? The problem is, HTTP was designed for web browsers. And last time I checked, servers don’t browse the web.

When you use REST and JSON, you are taking binary data and turning it into text, so it can be turned into binary again. That’s not the most efficient way for machines to talk. I know what you’re thinking — but REST is easy, what are you talking about? I assure you, it’s not.

Familiarity with something doesn’t make it easy or effective. REST is not a formal protocol, but an architectural style that was slapped on top of HTTP. PUT to insert or update? Who knows? Need a call back? Setup Kafka or Webhooks. How do you figure out what URL to call or what JSON to send? Check a swagger doc if you’re lucky, but most likely you’re talking to someone else. The semantics of browsing the web are not a good fit for application development. There is a better way.

Early microservice proponents like Facebook, Google and Twitter found a formula that works — a language-independent interface description language (IDL), RPC framework, and a code generator. Over the years, Google has released pieces of its framework.

First it released Protocol Buffers, its IDL, and serialization format. More recently, they released gRPC. Google describes gRPC as, “A high performance, open-source universal RPC framework.” gRPC follows the formula method above: IDL, RPC framework, and code generator. All good, right?

Not quite. Decisions were made that are strange for a “high performance…universal framework”. As a result, gRPC fails to move the needle on microservices communication in important ways.

Microservice architectures are asynchronous by nature. gRPC makes this difficult with a bad API. Rather than leverage a standard for asynchronous stream processing such as Reactive Streams, gRPC opts for a callback-based API. Non-trivial applications require gRPC to be adapted to a framework or users are quickly forced to deal with callback hell.

Another odd choice is to wed gRPC to HTTP/2. HTTP/2 has great new features to address shortcomings in HTTP/1.1. That’s what it’s for — to be better at schlepping webpages, not the foundation of an RPC framework. From the HTTP/2 documentation:

“This effort was chartered to work on a revision of the wire protocol — i.e., how HTTP headers, methods, etc. are put “onto the wire”, not change HTTP’s semantics”

The result is gRPC is limited by HTTP. Ironically, gRPC doesn’t even work in browsers. It’s ridiculous when you think about it. They literally created an RPC for servers using a protocol for browsers — that can’t be used by browsers!

This brings us to the final point, the “wire protocol”. gRPC’s wire protocol isn’t really a protocol. It’s a set of HTTP headers. HTTP/2 doesn’t support binary values in the headers, so you have to Base64-encode any octet data or just send strings. Instead of sending things that can be represented as binary — version number for instance — a string that requires parsing is sent instead.

Enter Proteus RPC. Proteus RPC is also designed to be a fast and universal RPC framework. It is built on top of RSocket. Unlike HTTP, RSocket was built to fully model application interactions over a network — it’s built for microservice architectures. More information about RSocket can be found on rsocket.io. Proteus is designed to take advantage of RSocket’s powerful features, but at the same time, present the developer an easy to use API.

I wanted to see how Proteus RPC performed against gRPC on the JVM. In order get to a valid comparison, I needed a reference point using REST/JSON. I decided to use Ratpack. Ratpack is a high-performance HTTP framework built on Netty. This is good, because gRPC and Proteus can both use Netty for their network transport. With all three using Netty, we can see how protocols, framework design choices and serialization, affect performance independent of networking choices.

The test described in this post is a simple ping/pong style test, run on my MacBook Pro i7 laptop. I plan to write about more comprehensive tests in later blog posts. First, 1,000,000 messages are sent to warm the JVM. Then, 1,000,000 messages are sent whose throughput and latency are measured. HdrHistrogram captures the latency. gRPC has a blocking client, but to make the tests fair, I used the non-blocking client. Both Proteus and gPRC can support Protocol Buffers so I used the same IDL for both. For the REST/JSON test, I created a simple Java object that mimics the Protocol Buffers object.

I tested REST/JSON first with Ratpack. Ratpack does not have an external HTTP client, so I used reactor-netty’s HTTP client. Ratpack’s promise API works well, so I used that on the server side. It got decent performance results — 28k RPS and a p995 of 11.9ms.

Next, I tested gRPC. gRPC’s Java non-blocking API is non-existent. The first test crashed my JVM. It ran out of memory from creating too many threads. After some quick research, I found you need to call directExecutor on your builder to prevent gRPC from using an internal cached thread pool. Even with this added, the tests didn’t run well and caused exceptions. This is because gRPC doesn’t support application backpressure, so there is no signal to prevent a server from being overloaded.

I needed some lipstick for this pig, so I used reactor-core to emulate backpressure. That worked better. gRPC got better throughput than Ratpack — 67k RPS, and better p995 latency 1.9ms. After p995, its latency hockey-sticks to the 10ms range.

Finally, I tested Proteus RPC. Proteus RPC uses a Reactive Streams-based API, making async programing very simple. Non-blocking requires zero special configuration, and it supports application backpressure out of the box. Proteus’ throughput was 91k RPS, with a p995 of 0.64ms. It was 225% faster than Ratpack, and almost 40% faster than gRPC while achieving 70% lower latency. gRPC’s p50 of 0.856ms was worse than Proteus’ p995 latency.

Using the same networking library, Proteus RPC achieves higher throughput and lower latency, without special configuration. If you are moving from REST/JSON, try Proteus RPC. I have included some graphs below, and a link to the source for my test: https://github.com/netifi/rpc-thunderdome. In my next blog post, I will repeat the same tests, but in a cloud infrastructure to see how this plays out with real hardware.

Like what you read? Give Robert B Roeser a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.