REST and gRPC — Room For Both

99P Labs
99P Labs
Published in
3 min readMar 15, 2021

Written By: Ben Davis

Photo By Goran Ivos

Our plans started out as many others with wanting to put an API in front of our Data Lake. We ended up with a GraphQL API that returns rows of data in JSON format. This setup works well up to a point. Eventually, you run into practical limits on how much data can be sent back to a client as text. In our case, we have python notebook clients submitting requests and generating result sets that are in the million-row range and up. In ordinary times, connected to gigabit ethernet at the office, large result sets are not a big issue. But these are not ordinary times. With most of us working from home with our consumer-grade wi-fi and cable Internet, how do we get all that data back to the client quickly?

This question led me to look into gRPC and Protocol buffers. The rest of this post is neither a tutorial on gRPC or Protobufs nor is it a head-to-head challenge to see which one is better. The real question is why and how to apply the technology to solve a problem we have. There is no shortage of articles on the Internet about the merits of gRPC and it’s easy to predict that gRPC will be faster. But by how much? Does it warrant the overhead of introducing yet another component to our technology stack? While it can’t give a definitive answer, an experiment that models our typical scenario can help with the decision.

For our experiment, the REST and gRPC servers are actually a single server that can startup in either mode and is hosted on-premise. The data source is a Presto cluster hosted in an AWS VPC, accessible to the API server via AWS Direct Connect.

Two Python clients are running on the same machine as the API server and two are running remotely from my laptop at home. The idea is to run the same query from each client while varying the result set size and measuring how long it takes.

Our Platform and Tests

The queries were run several times and the numbers reported here are an average elapsed time in seconds. As you can see, down at the low end, the response times are all comparable. It’s at the high end where the numbers really start to diverge and the remote REST wait time gets exponentially worse.

gRPC leverages HTTP2 underneath. There are a bunch of performance improvements and features that come with HTTP2 but the one to note here is streaming support. In the REST implementation, the API server waits until it finishes reading all the rows from the database before it returns a single large response. The gRPC server batches up rows as they arrive from the database and streams them to the client.

I think there’s room for both approaches in our 99P Labs environment. People starting up a new data study project will want to work with a smaller amount of data to explore the contents and work out exactly how to get answers to their questions. A JSON payload over HTTP is the most common and easily understood way to receive data from an API. It’s human-readable and compatible with just about any software you can think of. It’s totally appropriate for this kind of activity. For some, this will be enough. For others, the next step is to scale up to millions of records. This really simple experiment shows how gRPC can help by cutting our wait time in half. With a thoughtfully designed SDK, I can imagine a python user experience where the user does not know or care about gRPC. It just works. If you want to see our progress with gRPC or just use one of our JSON APIs please head over to our 99P Labs Developer Portal.

--

--