The Power of Cloud Load Balancing
Run and scale your services behind a single external IP address
Apolitical’s mission is to help build 21st-century governments that work for people and the planet. Our web platform upskills public servants across the globe to tackle the pressing challenges they face in their jobs.
To scale our web platform to serve that global audience, our engineering teams are constantly improving the quality of its features, security, and availability — amongst other things. In particular, to achieve our mission, we need to be able to provide public servants with fast and reliable access to our system.
Apolitical’s infrastructure runs on the Google Cloud Platform (GCP), which offers multiple hosting alternatives and networking products. In this blog post, you’ll learn how we improved our infrastructure using GCP’s managed services to simplify our deployment and seamlessly deliver the scale and high availability that we need.
TL;DR: To scale our web platform we revamped the way we handle traffic by using Cloud Load Balancing. We now enable Cloud CDN to allow caching of web assets. And we also define Cloud Armor rules for protection against DDoS attacks.
How we used to handle traffic
In the past, all incoming traffic to the Apolitical web platform was handled by our Reverse Proxy, built with Traefik Proxy (a modern and open-source implementation of a reverse proxy). By definition, a reverse proxy is a server that sits in front of web servers and forwards client (e.g. web browser) requests to those web servers. It typically sits behind the firewall in a private network and provides an additional level of abstraction and control to ensure the smooth flow of network traffic between clients and servers.
Our Reverse Proxy also included a Load Balancer Service, which is the standard way to expose a Service to the Internet from a Google Kubernetes Engine (GKE) cluster. A Load Balancer Service will spin up a network Load Balancer with a single IP address allowing all incoming traffic to be directly forwarded to the Reverse Proxy Pods.
Take a look at the diagram below to see what our previous architecture looked like:
The big downside of a Load Balancer Service is its lack of integration with GCP networking products (technologies that make the infrastructure easier to scale, secure and modernise) such as global anycast external IP address, Cloud CDN, Cloud Armor, and more. This means that we were limited to a reduced set of features and an increasing cost of maintenance. For that reason, using a Load Balancer Service became an obstacle to scaling our web platform.
How we revamped our setup
In GKE, an Ingress defines rules for routing HTTP(S) traffic to Workloads (applications) running in a cluster. See the Service networking overview to learn more about how an Ingress exposes applications using Services.
Unlike the Load Balancer Service, an Ingress is not a type of Service. Instead, it sits in front of multiple Services and acts as a “smart router” within a GKE cluster. The default Ingress controller will spin up a GCP HTTP(S) Load Balancer, and because it’s a native GCP component, it can use the networking products mentioned above out-of-the-box.
It’s important to note that VPC-native traffic routing is essential for you to be able to use an Ingress. Older versions of GKE didn’t come with this function enabled by default. Furthermore, the VPC-native traffic routing feature is immutable, which means that if a GKE cluster has been created without that feature, the only way to use an Ingress will be to recreate the GKE cluster from scratch.
For us, that meant a huge opportunity to revamp our previous architecture to a more robust and scalable architecture. But, as simple as it sounds, replacing the Load Balancer Service with an Ingress involved a few technical challenges and trade-offs. In the following sections, we’ll unravel each of these and explain the solutions that we came up with.
Our routing is configured to use Traefik Proxy Kubernetes Customer Resource Definition (CRDs). We thought that Traefik Proxy was providing a good level of service for our needs. So we decided to keep the routing untouched and take a “catch-all” approach, where the GKE Ingress delegates all the routing decisions to Traefik Proxy.
Transport Layer Security (TLS)
TLS is a security protocol that provides privacy and data integrity for Internet communications. Previously, our TLS configurations were built into the Traefik Proxy. By replacing the Load Balancer Service with a GKE Ingress, the TLS configurations would have to sit at the Ingress level. To do so, we simplified our Traefik Proxy setup to only define and use HTTP connections and to delegate all the TLS configurations to the GKE Ingress.
Cloud CDN offers a massive improvement to a web platform’s scalability and performance, and GKE Ingress makes it easy to plug this feature in. Consequently, we decided to use Cloud CDN to cache all of our static content, e.g. web assets (.js, .css, etc), fonts, images, and more.
GKE Ingress also makes it easy to enhance security by simply plugging in Cloud Armor. Consequently, we decided to use Cloud Armor to protect the web platform against DDoS and other attacks.
See how this differs from our previous architecture in the diagram below:
What we learnt
I’m pleased to say that although implementing our load-balancing improvements was challenging, it was very much worth it! The benefits of handling our web platform’s traffic through Cloud Load Balancing are evident.
To show the results in numbers:
- 85% of total traffic is now served from Cloud CDN
- Page loads are at least 3 times faster in EU regions (and estimated to be even faster in non-EU regions)
- API latency is at least 2 times faster for non-EU regions
This has greatly contributed to getting better performance scores for Core Web Vitals metrics on Google Search Console. And more importantly, this allows Apolitical to keep expanding its reach to public servants, providing them with a great experience.
To see what our web platform is like, visit us here!