EXPEDIA GROUP TECHNOLOGY — SOFTWARE

Improving First Input Delay by Leveraging gRPC

Improving core web vitals at Vrbo

Gustavo Astudillo Romero
Expedia Group Technology

--

Side by side running lanes for a 100m sprint, labeled as “1” and “2”.
Race track — Photo by Tim Gouw on Unsplash

For Vrbo landing pages, performance and improving response times is always a priority. Whether it’s a real user finding a destination through a web search or a bot crawling web content, we need good response time to improve engagement and SEO.

Within the many calls needed to get the whole content of a landing page, the first one is link a path to its destination id in the system so this response is a blocker for the rest of the calls that depends on the destination identifier and it’s crucial how this first call responds considering metrics like First Input Delay (part of Core Web Vitals), which measures the time from first user interaction with a page to the browser being able to process handlers in response to that interaction. In the context of landing pages, the improvement of first call performance could have a direct relationship with landing pages First Input Delay improvement.

With that in mind, we started thinking on adopting gRPC in our platform beginning with that service. We built a gRPC server to replicate our http service, and we wanted to compare their performance to understand if pushing for gRPC in our platform and send more traffic to the gRPC version was something that could really improve our performance or not.

Our metrics and Datadog dashboards indicates that gRPC performance was quite promising, so we want to use a load test to stress both of our options, http and gRPC, with the same configuration and datasets to check their limits and differences.

Hypothesis

gRPC is designed for low latency and high throughput communication, which makes our service the perfect candidate to benefit from gRPC.

We’ve implemented gRPC server in our service, added metrics and started to serve production traffic controlled by an AB test; and now we want to compare http and gRPC performance under the same configuration and input data.

We’re expecting that the test reports will show an improvement in both latencies and throughput.

Application under test

The application subject to this test has two operations: lookup for destinations data (identifiers and attributes) for a given path and reverse lookup (getting a path for a given identifier).

It gets its data from two sources:

  • The primary one is RocksDB, populated with our paths Kafka topic.
  • As fallback if the request is not found in RocksDB, our Cassandra database.
Service connections diagram with three datasources (Cassandra, Kafka and RocksDB) and a single client that serve the content to any device.
Service connections diagram

Our http service response is a json payload. gRPC returns its response as a binary object, which usually has a smaller size of a json containing the same data. Considering the size of our json payload we don’t think we’re obtaining much improvement from the gRPC version, but operations with bigger payloads than ours could benefit more from using it.

Application performance

For measuring latencies we are using p99 and p95 metrics. These are percentile values that indicate the upper threshold for the defined percentage. E.g. a p99 of 35 ms indicates that the 99% of the calls are taking 35 ms or lower. Both lookups were originally build as http endpoints with quite good performance:

Latency graph over time with two http service metrics, p99 around 8ms and p95 around 3ms, in a one hour period.
http endpoints p99 and p95 latencies

The gRPC server added to our application replicates the same endpoints than the http version, and is receiving a 10% of traffic originated from our frontend client, also with good latencies:

Latency graph over time with two gRPC service metrics, p99 around 5–6ms and p95 around 2–3ms, in a one hour period.
gRPC services p99 and p95 latencies

Although this latencies improves the http version, before just increasing its traffic percentage we want to run a load test for both options using same configuration and dataset.

Performance testing

We’ve an integrated load test for our application through our CI pipeline. This test runs two scenarios (one for each lookup) that are defined using a Taurus yaml template, and this templates are used in our pipeline to run the test in Blazemeter.

Adding gRPC to our performance test

We want to be able to duplicate our current scenarios in our existing load test to run the same load in gRPC and http versions at the same time. Our http scenarios are defined using Taurus syntax, which allows to use different engines (executors) including JMeter, default one and the one we’re using currently for http.

Using just Taurus syntax you can define a scenario and behavior to call http endpoints.

So our first approach was trying to do something similar for gRPC, though gRPC is not natively supported by JMeter or Taurus so we can’t just define a gRPC scenario using only a yaml template.

JMeter doesn’t support gRPC natively, but it allows to create custom plugins or samplers to add new functionalities. We’ve implemented one of the JMeter abstract sampler definitions (AbstractJavaSamplerClient in this case) to define how a call should work. In our case for gRPC services it allows to use a client based in our service definition. The generated sampler class is referenced in a JMeter script (a JMX file using XML syntax) where test behavior is defined. These JMX files can be referenced from a Taurus template, so they can be executed in Blazemeter.

First iteration

Once we’ve created this new samplers (one for each lookup operation), we include them in our existing Taurus template to run the test with the four scenarios to compare results.

This first test shows no difference between gRPC and http endpoints, having almost the same p95 and p99 response time latencies (both cases quite high considering our server latencies) and similar hits per second distribution, obviously not the results we were expecting:

Latency graph over time with test first iteration p99 metrics, all of them around 7000ms.
First iteration p99 response time
Latency graph over time with test first iteration p95 metrics, all of them around 3000ms.
First iteration p95 response time

Both p99 and p95 graphs show similar latencies for http and gRPC scenarios, moving around 7500 ms in p99 graph (with peaks of 15000 ms) and around 1000 ms in p95 (with peaks of 3000 ms).

Hits graph over time with test first iteration throughput, the four scenarios moving around 300 and 500.
First iteration hits per second

Hits per second graph show the four scenarios moving around 300 and 500 hps, without much difference them.

The reason for this results is that we’re creating a new gRPC client every time we call the service, so we’re not getting the benefits of reusing the channel and the multiplexing of multiple HTTP/2 calls over a single TCP connection. So we decide to change the sampler in the next iteration to make the client to be reused by each thread.

Second iteration

After modifying our gRPC samplers to reuse the gRPC client per thread, we’ve obtained a new BlazeMeter report, and comparing response times and hits, we observe a huge improve from gRPC scenarios:

Latency graph over time with test second iteration gRPC metrics, p95 moving around 10–12 ms, and p99 around 20–30 ms.
Second iteration gRPC p99 and p95 latencies

gRPC response time latencies show a p95 line moving around 10 and 12 ms, and p99 around 20 and 30 ms, closer to our expectations and a logic result considering our server latencies.

But http scenarios are still showing the same numbers as the first iteration:

Latency graph over time with test second iteration http metrics, 1000 ms for p95 and 7500 ms for p99.
Second iteration http p99 and p95 latencies

http response times are moving around 1000 ms for p95 and 7500 ms for p99.

If we compare by hits, gRPC scenarios reach the top threshold set for the test (1500) while http scenarios move around 400.

Hits graph over time with test second iteration throughput, gRPC scenarios around 1500, and http scenarios around 400.
Second iteration http and gRPC hits

This huge difference led us to think that the http scenarios were creating a new client each new call so compare these two cases wasn’t fair and didn’t give us a real idea of their performance difference, so for the next iteration we decide to add a new html sampler that creates the client in the same way we prepare it for gRPC, being reused by thread. In theory, this allows us to compare types using their clients in the way we expect them to be used, and should show the improvement in performance that gRPC could add by how it manages connections and channels.

Third iteration

In this third iteration we are using our gRPC samplers along new http samplers using http4s as http client, both types are initialising their client so it can be reused by each thread. We’ve set a high top threshold that we don’t expect to be reach (3000 hps), so we can check their limits, and as in the others iterations we’ve run the four scenarios together.

While gRPC still having better performance, the obtained results make more sense than the previous ones and they give us a good idea about how much our performance could improve by using gRPC.

Latency graph over time with test third iteration p99 metrics, gRPC around 50ms, and http around 150ms.
Third iteration http and gRPC p99 response time

We observe that p99 response time is still quite better in gRPC scenarios, moving around 50ms the whole test while http scenarios reached to 150ms and peaks of 350–400 ms.

Latency graph over time with test third iteration p95 metrics, gRPC around 35ms, and http around 50ms.
Third iteration http and gRPC p95 response time

In p95 graph we can see gRPC scenarios moving around 35 ms and http scenarios around 50 ms.

Hits graph over time with test third iteration throughput, gRPC scenarios around 2400, and http scenarios around 1800.

Finally, comparing hit we can see how gRPC has been able to manage more traffic using the same config as http.

Results

Test results table for three test iterations, showing p99, p95 and hits per second results separated by http or gRPC. This results show that gRPC equals http in the worst scenario (not reusing client by thread), but it’s aprox 2.5 times faster for p99, aprox 1.5 times faster for p95, and gets 1.5 times higher throughput than http in the best scenario (reusing client by thread).
Load test results

Conclusions and next steps

  • gRPC equals http in the worst scenario (not reusing client by thread).
  • gRPC was found to be approximately 2.5 times faster than http for p99 and approximately 1.5 times faster for p95.
  • gRPC scenarios are getting approximately 1.5 times higher throughput than http scenarios.
  • This performance improvement could be superior to what we gain using a cache method, so it offers a different way to improve our latency.
  • One of Google Core Web Vitals metrics (adopted into their search algorithm) is First Input Delay: measuring the time from when a user first interacts with a page to the time when the browser is actually able to begin processing event handlers in response to that interaction. This application response is one of the first steps to load our frontend client content, because it returns if a path is valid or not and its association with an id, so improving its response time could help to our score.
  • gRPC performances improvements comes from its use of HTTP/2 and channel reusing, it depends not only on how the server has been created but also the client. Situations where a client needs to call the server frequently are the best to get this benefits from the use of gRPC, so a push for gRPC should be part of a common decision to build the applications in a way that they benefit from gRPC.

--

--