Scaling up REST versus gRPC Benchmark Tests

7 min readJun 19, 2023

TL:DR

We build upon an excellent earlier study by Recep İnanç that performed a small scale performance comparison of REST (HTTP/JSON) versus gRPC for API implementations. Running test clients and endpoint implementations on the same local machine, the results of this earlier study revealed REST provided lower latencies for small payloads. However, as payload size increased, gRPC latencies were much lower.

Our benchmarks utilize the same API implementations and JMeter test specification as this earlier study, but execute the tests:

on distributed Google Compute Engine (GCE) instances
with increased client loads and request volumes
with an additional API implementation that utilizes Protocol Buffers over REST for payload transport (HTTP/protobufs)

The results described below confirm and strengthen the fact that REST degrades significantly for large payloads. In the worst case, the large payload size tests for REST provide approximately 1% of the throughput of the small input size for the same client load.

In contrast, both gRPC and HTTP/protobufs are more resilient as payload sizes grow, and thus exhibit much greater scalability. gRPC performance is comparable to HTTP/protobufs at lighter loads. As client load and payload sizes grow, gRPC provides roughly 15% to 40% greater throughput. For large payloads, gRPC throughput is 10x the throughput of the REST server.

Background

REST and gRPC are the two dominant protocols for implementing communications between microservices. REST executes over HTTP, typically using JSON as a format to transport request payloads.

In contrast, gRPC executes over HTTP/2 and formats payloads using Protocol Buffers. Protocol Buffers (protobuf) are a cross-platform data format used to serialize structured data. Protocol Buffer payload specifications are converted to a compact binary representation, which makes data transport more efficient. For example, in the benchmark tests we performed for this article, the protobuf payload was approximately 1/3rd the size of the same data formatted with JSON.

We won’t delve into a detailed comparison of REST and gRPC because Recep’s original post does an excellent job on that front. In the following, we’ll focus on the test setup and show the results of executing the benchmark suite from Recep’s article at a larger scale. Note that strictly speaking, the endpoints we refer to as REST are effectively CRUD HTTP endpoints using JSON as request payloads, and not full implementations of Fielding’s RESTful architecture pattern.

Test Setup

The basic benchmark strategy is to call gRPC and REST endpoints that return increasing numbers of a large object. The large object has a specification both as a protobuf and a Java POJO. Endpoints take a count parameter that specifies the number of large objects to return. Our tests use a count value of 1, 100, 1000 and 100,000, the latter as a stress test.

JMeter is used as the load tester. We utilize the same implementation strategy as the original study, which uses a seperate JMeter Thread Group for test setup. This basically initializes a cache in the servers that stores pre-generated large objects that are returned in subsequent requests. This sensibly eliminates the effects of generating the LargeObjectResponse (proto object) and LargeObjectPOJO (Java object) for each call, enabling the benchmark to focus purely on gRPC and REST performance for data transport.

We deployed the JMeter client on a Google Compute Engine e2-standard-4 instance, with 4 vCPUs and 16GBs memory. The endpoints were deployed on an e2-standard-16 instance with 16 CPUs and 64 GBs of memory running in the same subnet as the JMeter client. This test architecture is depicted in Figure 1.

We also ran each test multiple times to ensure stable results. Some daily (cloud related) variance was observed so we report test results that were all executed on the same day. The general trends we report did not vary from day to day.

JMeter Test Plans

We leveraged the JMeter test plans from the original benchmark. These basically ramp up to the maximum test thread (simulated users) load in 10 seconds, and run a sequence of Thread Groups to test each endpoint with different size (1, 100, 1000) payloads.

We made the following modifications:

Created JMeter Thread Groups that ramped up to a maximum of 100 and 500 threads
Increased the number of requests for a thread (loop count in JMeter) to 100 (from 1 in the original tests)
Added new Thread Groups to test very large object sizes (100,000 count). These ran 10 threads maximum, each executing 100 requests.

These modifications achieve our aim of scaling up the tests, exerting more load on the servers. We also ensured using GCE monitoring that the JMeter client CPU and memory utilization remained low (<50%), ensuring it was not an influence on the test results.

Results

In Figures 2 and 3 we show the throughput obtained for 100 and 500 client threads for each of the three test configurations. The most dramatic observation is that for the small payload size, REST provides highest performance in both tests, albeit marginally in the case of 100 client threads. However as the payload size grows to 1000, the dropoff for REST is severe. In fact with 500 client threads, the large payload size (count=1000) for REST provides approximately 1% of the throughput of the small input size (count =1).

Figure 2 Throughput Comparison for 100 Client Threads

Figure 3 Throughput Comparison for 500 Client Threads

In contrast, the performance degradation with the gRPC and HTTP/protobufs is much less severe. In both sets of tests, as payload size and number of client threads increases, gRPC’s performance is most resilient, outperforming HTTP/protobufs by around 25–30%, and REST by more than 9x for large payload sizes.

Figure 4 shows the same results but organized by protocol, with the left column representing the small payload and the right column representing the large (count=1000) . These clearly show the ability of gRPC and HTTP/protobufs to provide significantly greater throughput under increasing loads.

Figure 4 Throughput Comparison by Server Type for 100 Client Threads (top) and 500 Client Threads (bottom)

Figure 5 plots the response times for the three payload sizes with 100 client threads. It shows the mean response times for each test at a 1 second resolution on the x-axis. This starkly illustrates how REST performance deteriorates as load increases, while gRPC and HTTP/protobufs remains relatively stable with single figure response times for smaller payloads.

Figure 5 Response Times for 100 Cleint Threads

In Figure 6, the 99th percentile (p99) response times are shown. These follow the same trend of rapid increase for REST, with correspondingly much slower increases for gRPC and HTTP/protobufs. In fact the p99 for REST is roughly 11x the p99 for gRPC with the largest payload.

Figure 6 99th Percentile Response Times Comparison

To give a different perspective, in Table 1, we show more comprehensive response time statistics for 500 client threads and a payload with count = 100.

Table 1 Response Time Summary for 500 Client Threads and count = 100

Finally, it is always fun to do a stress test. These often reveal limits in implementations when memory or CPU resources are heavily loaded. In our case, we increased the payload to a count of 100K — we refer to this as a Very Large Object! We created a JMeter thread group size 10, with a 10 second ramp-up time, and each thread sending 100 requests.

Figure 7 Mean Response Times for Stress Test with payload count of 100K Objects

The stress test didn’t break anything. In fact it revealed little difference between gRPC and HTTP/protobufs. REST response times followed the trend of earlier tests and were approximately 5x the other two servers, with a p99 of 30 seconds (compared to ~7 seconds with gRPC and HTTP/protobufs).

Authors

Ian Gorton is a Professor of the Practice at Northeastern University’s Seattle Campus, and Director of Graduate Programs. He is the author of the ‘Foundations of Scalable Systems’ book published by O’Reilly in 2022. You can find supplemental videos and slides for the book here.

Yingyi Tong is a graduate student in the Masters of Computer Science at Northeastern University’s Seattle Campus. She has a Bachelor’s degree in Marketing and has worked as Machine Learning intern at Ping An Technology.