EXPEDIA GROUP TECHNOLOGY — SOFTWARE

Introducing gRPC to our Hotels.com Platform — Part 1

Learnings from our experiments with gRPC

Nikos Katirtzis
Expedia Group Technology

--

Hotels.com loves gRPC.
Introducing gRPC to our Hotels.com Platform. Source of right-hand side image: https://grpc.io.

At Hotels.com™ and Expedia Group™ we have been investing in tools that cover the 3 pillars of Observability (logs, metrics, and traces) and the help of those tools when debugging latency issues has been invaluable.

As an example, we have open-sourced our Haystack solution for distributed tracing and our Pitchfork forwarder, which lifts Zipkin traces into Haystack.

Dropdown of latencies using horizontal colored bars for each call.
Distributed tracing visualisation using Haystack. Source: https://github.com/ExpediaDotCom/haystack-ui.

One of the great benefits of distributed tracing is that it reveals interactions between services and essentially the complexity of a platform.

What’s clear when looking at traces, is that our platform has evolved a lot and we now have hundreds of back-end micro-services communicating with each other. These calls are made through REST/JSON endpoints over HTTP/1.1 which means that we’re bound by the limitations of the HTTP/1.1 protocol and JSON payloads.

In particular, JSON:

  • Is human readable but not efficient since it’s not a binary protocol.
  • Is not secure (clear text) by default.
  • Is not strongly typed and hence prone to errors.
  • Requires manual (de)serialisation.

While HTTP/1.1:

  • Is a textual (non-binary) protocol.
  • Is ordered and blocking; only one request can be outstanding on a connection at a time which can lead to the head-of-line blocking (HOL) issue.
  • Requires multiple connections for parallelism.
  • Doesn’t perform compression of request headers by default.

Furthermore, when using REST having a clear contract is optional; it can be done using third-party tools like Swagger but it’s not native as in gRPC.

There are ways to overcome each of those limitations separately (e.g. by encrypting the content of a request or by compressing request headers) but wouldn’t it be awesome if these concerns were addressed at protocol/framework level?

These were only a few of the reasons that made us look into gRPC and perform a Proof of Concept on it.

In this first blogpost we’ll provide an intro to gRPC focusing on services written in Java and deployed on Kubernetes.

We won’t dive into the implementation of the service and client itself or into performance benefits but we hope to do this in the next blogpost(s).

Basic concepts

HTTP/2

Downloading of many small tiles of an image takes only 1.63s over HTTP/2, in contrast to 17.51s over HTTP/1.1.
HTTP/1.1 vs HTTP/2 performance. Source: https://http2.akamai.com/demo.

gRPC leverages the HTTP/2 protocol which is the second major revision of the HTTP internet protocol. It’s a binary protocol that delivers significant enhancements primarily focused on improving the utilisation of underlying TCP connections.

The main benefits of HTTP/2 are listed below but if you’d like to get a better overview you can read this pretty extensive blogpost:

  • Single connection: One TCP connection per client-server is used, and that connection remains open as long as the channel is open.
  • Multiplexing: Multiple requests are allowed at the same time, on the same connection. Previously, with HTTP/1.1, each transfer would have to wait for other transfers to complete.
  • Header compression: In HTTP/1.1 many headers were sent with the same values in every request. That’s no more the case with HTTP/2.
  • Directly supports bidirectional streaming.
  • Server push: With this feature the server can send additional cacheable information to the client if it thinks that this might be requested in the future.

What does this mean in practice? Here is a nice demo showing HTTP/2’s impact on your download of many small tiles making up the Akamai Spinning Globe!

Protocol Buffers

Protobuf vs JSON. Source: https://dev.to/plutov/benchmarking-grpc-and-rest-in-go-565.

By default gRPC uses Protocol Buffers (or Protobuf/protos), a binary encoding format developed by Google. Briefly, it’s a way of encoding structured data in an efficient yet extensible format. Protocol Buffers come with their own pros and cons:

Service Definition

As of now, gRPC supports many languages including C++, C#, Go, Java, Objective-C , Python, Ruby, C#, and Node.js. For languages which are not officially supported, like Scala, you’ll most possibly find plugins that you can use.

In a gRPC architecture the first step is to define your contract which includes defining the gRPC service and the method request and response types using protocol buffers.

As an example, your .proto file for a Java service could look like the one below:

syntax = "proto3";option java_package = "com.hotels.service.reviews";
option java_outer_classname = "Reviews";
// The service definition.
service ReviewsService {
// Retrieves a review.
rpc GetReview (Request) returns (Response) {}
}
//The request message.
message Request {
int32 id = 1;
}
//The response message.
message Response {
string text = 1;
}

Although both proto2 and proto3 protocols are supported in gRPC, we’d suggest using proto3 since it supports more languages, its syntax is simpler, and allows one to prevent compatibility issues with proto2 clients talking to proto3 servers and vice versa.

In the example above we use a simple/unary RPC where the client sends a request to the server using the stub and waits for a response to come back. Other options here include:

  • Server-side streaming RPC: Client sends a request to the server and the server returns a stream of messages which the client consumes. The previous example would then look like below:
rpc GetReview (Request) returns (stream Response) {}
  • Client-side streaming RPC: Client writes a sequence of messages and sends them to the server, using a stream. Once the client has finished writing the messages, it waits for the server to read them all and return its response. The previous example would then look like below:
rpc GetReview (stream Request) returns (Response) {}
  • Bidirectional streaming RPC: Both the server and the client send a sequence of messages using a read-write stream and they can read and write in whatever order they like. The previous example would then look like below:
rpc GetReview (stream Request) returns (stream Response) {}

Client methods/modes

On the client side you have the following 3 options to call your RPC service’s methods:

  1. Blocking / synchronous stub: Client waits for a response from the server. This is a blocking call.
  2. Non-blocking / asynchronous stub: Client makes non-blocking calls to the server, where the response is returned asynchronously.
  3. Future stub / listenable future: With this stub type, as long as a thread is not blocked, new RPCs can be started. The returned value of RPCs made via a future stub is a GrpcFuture<ResponseType>, which implements the com.google.common.util.concurrent.ListenableFuture interface. Note that future stubs do not support streaming calls.

In case you’re interested, the official documentation for gRPC covers a lot more.

Deploying gRPC services on Kubernetes

At Hotels.com we deploy our services on Kubernetes (k8S) and use Helm to manage our apps. Intra-cluster communication is achieved by calling k8S services while for inter-cluster communication we use Ingresses (with Nginx Controllers).

In order for a client B to be able to call a service A (which runs a gRPC server on port 6565) over gRPC, service A needs to expose a gRPC port as part of its definition and also provide a gRPC Ingress.

A sample Service definition is shown below:

{
"kind": "Service",
"apiVersion": "v1",
...
"spec": {
"ports": [
{
"name": "http",
"protocol": "TCP",
"port": 80,
"targetPort": 8080
},
{
"name": "grpc",
"protocol": "TCP",
"port": 50051,
"targetPort": 6565
}
],
...
},
...
}

For the Ingress, as mentioned before, we need a separate definition with some special annotations that apply to the Nginx Controller we’re using. If you’re using other controllers such as Traefik or Istio as Ingresses, you might need to provide different configs.

{
"kind": "Ingress",
"apiVersion": "extensions/v1beta1",
"metadata": {
"name": "app-grpc",
...
"annotations": {
"kubernetes.io/ingress.class": "nginx",
"nginx.ingress.kubernetes.io/backend-protocol": "GRPC", //for nginx-ingress >= 0.21.0
"nginx.ingress.kubernetes.io/grpc-backend": "true" //for nginx-ingress < 0.21.0
}
},
"spec": {
"tls": [
{
"hosts": [
"<app-grpc-host>"
]
}
],
"rules": [
{
"host": "<app-grpc-host>",
"http": {
"paths": [
{
"path": "/",
"backend": {
"serviceName": "app",
"servicePort": 50051
}
}
]
}
}
]
},
...
}

We mentioned intra-cluster communication using k8S services earlier. Note that for gRPC, even in the case where both your service and the client are deployed on the same cluster you should use an Ingress instead of a Service name. This is due to the fact that gRPC uses HTTP/2 and you need to load balance the requests at Layer 7. Read this blogpost for more info.

Command-line tools for interacting with gRPC servers

Although you cannot use the usual command line tools like cURL to interact with a gRPC Server there are plenty of options that aim to be used for that purpose.

The first option is the official tool that comes with the gRPC repository, the gRPC command line tool (grpc_cli).

A popular alternative is grpcurl with its main benefit being that it’s easier to create a Docker image for (unless you have brew installed, grpc_cli requires building from the repo) and that it’s easier to make it work over TLS.

Since in Kubernetes you can have multiple pods running your app, we found it useful to create a Docker image that has grpcurl installed and add this as a sidecar to our apps, when debugging Ingress issues. This way we managed to get an interactive shell with grpcurl in the same pod our app was running.

FAQ

When presenting our PoC outcome to our engineers we received great questions. Here are a few of those questions with answers (to the best of our knowledge).

JSON is elaborated natively by browsers and mobile devices. Is it the same for gRPC responses?

Although the state of gRPC in the browser has changed a lot during the last years and there are now libraries for JavaScript, gRPC is very focused on backend services and that’s where we focused our investigation on. For testing purposes you could have proxies that translate between JSON and proto. In any case, remember that gRPC primarily targets backend and edge services.

Do you need a Circuit Breaker with gRPC?

gRPC allows you to set deadlines (and you should always do so). However, you might still need a Circuit Breaker for connection back-off if this hasn’t been implemented for your language (it’s already part of the gRPC interface).

Do you need to set a deadline on the gRPC client when using a Circuit Breaker?

If you don’t set a deadline your Circuit Breaker (e.g. Hystrix/Resilience4J) will kill the thread but the underlying request will still be running. This means that you’ll need both a deadline and a Circuit Breaker.

gRPC sounds perfect. What are the disadvantages of it?

As every single technology, gRPC has disadvantages as well. A non-extensive list follows:

  • gRPC is not as highly adopted as REST.
  • Request / response is binary and so not human readable. At application level though you can still log or debug in the same way you do with REST.
  • Tooling around gRPC is still limited and except from Go where this is pretty well-established you might face challenges. In the meantime, you might find this list useful!
  • There’s no 1:1 mapping between gRPC and HTTP status codes. Error messages are also not quite useful. But you can work around that by implementing interceptors or/and by using Google’s gRPC error model.
https://lifeatexpedia.com/brands

--

--