Exposing our applications with GCLB and Istio

Published in

BlaBlaCar

10 min readJun 24, 2021

At BlaBlaCar, one of the most important goals that we, the Foundations team, are trying to achieve is to let our service teams be able to handle the public traffic in full autonomy.

In order to do so, we are using the power of Istio mesh and Google Cloud Load Balancers (GCLB). Having done this journey, we think it is interesting to share our implementation and how our end-users, the service teams, can benefit from it.

*This represents all the things we will put in place throughout this article.*

Note that all the following takes place in a Google Kubernetes Engine (GKE) cluster assuming you deployed Istio service mesh. We also use a regional setup for our GKE cluster to have availability across three separated zones.

What is a service mesh and what is Istio?

A common use case of a service mesh is to handle the network traffic in a Service Oriented Architecture. In our case, the service mesh is built on top of Kubernetes. The service mesh is offering several features like advanced traffic management, authentication, authorization, observability, and so on.

The BlaBlaCar’s engineering team decided to use Istio for service mesh in our migration to the cloud. Among other qualities, we were seduced by the built-in observability of the mesh and the traffic management features.

Istio is composed of multiple parts but we can consider that Istio is organized into three parts: the control plane, the data plane and the gateways.

The Istio data plane is composed of Envoy proxies that are auto-injected into most of the deployments in our cluster by the Istio control plane. These proxies will receive configurations coming from the control plane. The gateways being in charge of handling the ingress and egress traffic between the mesh and the rest of the world.

*Istio service mesh with the ingress and egress gateways.*

I will make a focus on the ingress gateway, since we want to explain how the traffic is routed to the application from the internet.

Istio ingress gateway

An Istio ingress gateway is a standard Envoy container receiving its configuration from the Istio control plane. (The configuration is set up through VirtualServices, which we will explain later on, and Gateway resource.)

In order to achieve high availability, we decided to have each gateway deployed on at least 3 distinct zones of our cloud provider: GCP. We are implementing this by configuring pod anti affinity on our ingress gateway Deployment. This can be seen on the gateway part in the istio-operator.yaml resource, of course if you do not have a regional cluster or you do not need high availability do not bother with that setting.

Once the ingress gateway is deployed, we need to configure it to accept our domains (e.g. www.blablacar.fr). To do so, we will use the Gateway custom resource.

Using the previous configuration, the gateway would allow incoming internet traffic for www.blablacar.fr, we only need HTTP port 80 since the GCLB is handling the TLS session.

*The ingress-gateway deployment with the 3 replicas and the gateway object + representation of the control plane actions.*

At this stage, ingress gateways pods are deployed on the GKE cluster. Thanks to the affinity setup, pods are spread on different zones of the cluster, providing by design high availability. Thanks to the configuration added at the previous step, gateways accept the traffic coming from the www.blablacar.fr domain.

GCLB setup

Even if the ingress gateway is set up it is still unreachable from the internet, I will now explain how to have it connected using Google Cloud Load Balancer.

The load balancer offers a couple of great features that can be useful to serve traffic to the BlaBlaCar Community:

Anycast IP: the LB level 7 is using an anycast IP, thanks to this feature the IP announced to BlaBlaCar users is the same anywhere in the world. An additional benefit is that, instead of being routed to a single point of presence (which would introduce high latency for some of our regions), users will enter the Google Network at the closest Point of Presence of Google.
Container-native LB / Network Endpoint Group (NEG): the LB will be aware and target only the pod of the ingress-gateway deployment. That will reduce the number of network hops (the number of network devices through which data passes from source to destination) between the external IP of the GCLB and the ingress gateway, both latency and throughput are improved.

*Container-native Load Balancing vs standard setup.*

How do we set up a GCLB in front of our ingress gateway?

First, we will need a standard Kubernetes ingress resource and two other resources: the FrontendConfig and the BackendConfig. (Keep in mind that the feature we use in this article require at least GKE 1.18.10-gke.600, cf. doc)

We will set the GCLB’s health-check configuration thanks to the BackendConfig. Istio ingress-gateway is exposing a monitoring endpoint on a dedicated port and path, which is perfect for us.

The FrontendConfig will then be used to do HTTPS redirection directly at the GCLB level. Before that, we needed to do the redirection on the ingress gateway and that was not ideal for our users and the infrastructure, having back and forth non added-value traffic. Indeed, we usually promote managing redirections on the top of our architecture, avoiding the HTTP request to go all the way down to the ingress gateway to be redirected to HTTPS.

The FrontendConfig allows you to perform several other configurations for the GCLB, among other things you can for example also configure the TLS version you want. (cf. doc)

Both resources will be referenced via annotations: in the ingress resource for the FrontendConfig, and in the backend service for the BackendConfig.

# in the ingress gateway Service resource
cloud.google.com/backend-config: '{"default": "ingress"}'# in the Ingress resource
networking.gke.io/v1beta1.FrontendConfig: http-redirect

After setting these two config resources it’s time to create our Ingress. It’s a standard Kubernetes Ingress object with annotations to work with GCP. We are using an ingress with annotations kubernetes.io/ingress.class: gce. in GKE cluster there is a controller that watches that annotation to create the GCLB based on the config we choose.

The annotation with static-ip value in the code above is a reference to our cloud global public IP address. Not doing it, a dynamic IP address would be assigned to our GCLB. Keep in mind that this IP changes if you recreate the GCLB. We reserved the IP address as code using Terraform but it can be done via glcoud CLI or the console interface.

As you can see the Ingress resource is referencing some TLS settings. We are using cert-manager on our production cluster to get certificates from Let’s Encrypt. To achieve that, cert-manager is watching the cert-manager.io/cluster-issuer annotation to create a valid certificate for all the hosts listed. The certificate is issued in a Kubernetes Secret with its name referenced above in secretName value.

Having set servicePort value to HTTP 80, the TLS traffic will be terminated on the GCLB.

We want to enable NEG (as mentioned before), so we need to add some information for GCP on our services via an annotation :

annotations:
  cloud.google.com/neg: '{"ingress": true}'

This diagram summarizes all the resources we set up. At this point, we have a Google Cloud Load Balancer using a certificate generated by cert-manager and Let’s encrypt, the HTTP traffic reaching our GCLB is redirected to HTTPS. The traffic is then routed directly to the ingress gateway pods in our cluster thanks to the NEG integration. Our gateway is now accepting traffic for the www.blablacar.fr domain.

Virtual Service / Destination Rule

For applicative services to receive internet traffic, the ingress gateway still needs to know where to route the traffic in the cluster. To do so we need to add a few more configuration files in our Kubernetes cluster. Keep in mind that our goal here is to give the software developers the ability to direct the traffic to their internet-facing application.

We are going to use the traffic management power of Istio, and in particular the VirtualServices and DestinationRules resources.

Service teams can now set up a Virtual Service in front of their Kubernetes Service (the Service that exposed their application inside the cluster). This VirtualService gives information to the Istio mesh on how the proxies are supposed to route the network traffic.

By default, a VirtualService will apply configurations only to the mesh, not to the gateways. As we want the application to be available from the internet, we will need to attach the VirtualService to our ingress gateway.

Using the above resource, service teams can easily define through the VirtualServices hosts on the ingress gateway and route traffic to the needed destination. Note that in the spec we specify in the gateways array, the istio-ingress gateway, that way the configuration will be attached to the specified gateways and not the mesh network.

We can also define some headers to be added to the response (here we have a setup for HSTS as an example).

Traffic coming to the app with Istio routing

Quick stop here, we have everything we need to route traffic from the internet to the GCLB, then to the ingress gateway, and finally to the pods of the service team application. Considering for instance that the application is the frontend application of BlaBlaCar, a user would be able to have the website displayed on his computer or mobile device.

The Istio VirtualService is not magical, every configuration we write in yaml will end up as a configuration in Envoy (proxies or gateways). And at this point we can have a look of the Envoy configuration inside the ingress gateway pods for our domain. Using a pod name of one of our ingress gateway, we can query its proxy configuration thanks to istioctl:

istioctl pc route $POD -n istio-system --name http.80 -o json | jq '.[] | .virtualHosts[] | select(.name|test("www.blablacar.fr"))'

Empowering services teams with few additional features will leverage the stack we just deployed. Here are some of the sexy features that might be interesting to consider.

So we just deployed all that new stack but what if the service teams want to deploy a new version of the application? What if they need to set up retries or timeouts? Those advanced configurations are done thanks to the VirtualService and/or with a DestinationRules resource.

And thanks to our setup it is quite easy to deploy.

In the resource above we ask to route the traffic for the uri /v2/profile to pods with the version: v2 labels. And we set a trafficPolicy on that route regarding: retries, load balancing strategy (loadBalancer) and HTTP2 (h2UpgradePolicy). The service team can update all these settings within a yaml file on their own, that is what we are looking for: to give the most possible flexibility to the developers.

Traffic coming to the app with Istio advanced routing

As we said before there are a lot of possibilities to handle the traffic with Istio (circuit breaking, mirroring traffic …).

Small patch on the ingress gateway

Everything is not working out of the box and we had to make some adjustments to the ingress-gateway to make it work with GCLB.

The GCLB is in front of the ingress gateways and therefore they do not trust it, so we are missing some precious information: the X-Forwarded headers.

The GCLB introduces complexity as there are two hops in front of (i.e. upstream) the ingress gateway (even if we are using NEG load balancing setup), and both are using a public IP address. The first one is our anycast IP address, the second one is Google managed and we have no hands on it.

Eventually, using a dedicated EnvoyFilter, we were able to have Istio relying on Envoy Proxies. Thus to configure some specific features, you may need to use EnvoyFilters to customize Envoy itself.

NB: You should be able to configure this setting with annotations or global mesh settings in the Istio Operator (This feature is still marked experimental and was not working for us at the time)

So we need to configure Envoy to trust the 2 upstream hops:

Thanks to this setup the ingress gateway trusts the GCLB and does not sanitize the X-Forwarded headers.

Conclusion

Having been through all those steps, we now have a Load Balancer at the Edge that handles HTTPS and routes the traffic up to the application container through the Istio mesh. All the configuration is made following GitOps principles. And we are leveraging the full capacity of Istio traffic management.

With this setup we, the Core-Infrastructure team, are able to configure the networking, the setting of TLS or change the IP address of the LB … and it’s transparent for the service team.

On their side, service teams can manage how traffic is coming to their application and choose from a simple to a very fine grain and advanced configuration. They do not need to know about TLS certificates, NEG, nor anycast IP addresses. They can handle everything by code, autonomously, with a limited level of complexity.

I hope this article helps you to understand how to expose your workloads in GCP with GKE using GCLB. There is much more to do with GCP and Istio, more articles will come.

Keep calm and istioctl