Removing the need for caching servers, with GCP’s load balancers

Colt McAnlis
Aug 10, 2017 · 5 min read

Tax Lemming is a startup out of Vancouver, BC which focuses on helping you make sense of your purchases, taxes, and basic bookkeeping for your small business needs. While on a backpacking trip to the Tongass national Forest, I had a chance to share a camp site with an engineers who let me in on a small problem they was working on : Tax Lemming has got too many instances spinning up, and are need cut that number down before they go for another round of VC capital.

Like most web based applications, Tax Lemming’s architecture looked something like this:

At first glance, it’s easy to see that this can become expensive quickly : All the static content is being sent through the server instance, so they end up paying for compute hours, and each request requires the apache server to re-hit the relational DB.

Once we got back on the grid, I had a chance to sit down with Tax Lemming, and see what we could do to fix this.

The common answer

For most developers of web based applications a solution to this problem looks something like this:

Generally, add nginx as a reverse cache proxy to the apache server, and modify the source assets so the client fetches the big files from the CDN so that they don’t end up eating server time (and are sent out faster).

Although a very tried-and-true solution, There’s a few issues I have with it:

  • The reverse proxy (aka Nginx, Varnish, Squid etc) needs a whole new instance to be setup per region. Technically, it’s still cheaper than the load on the server instance itself, but that’s still a lot of overhead in terms of cost to cache & send content.
  • Static assets using a de facto CDN. This typically requires a whole new URL scheme most of the time though. (e.g. instead of url=”./abc/tacos.jpg” we get url=””) This isn’t so much a performance issue, it’s more an aesthetic one ;)

Given these two nuances, I think there’s a few things Tax Lemmings can do to improve.

GCP LB = less headache & less cost

GCP load balancer is already awesome in that it allows you to split traffic between instance groups, regions etc. But it’s also got two nice features that aren’t discussed as much, which could help Tax Lemmings reduce instance count further, and reduce some upkeep costs as well.

Step #1 : CDN the Cacheable requests.

Turns out that GCP’s load balancer can easily cache common requests, and move them to Google Cloud CDN to reduce latency, as well as reduce the number of requests needing to be served by the instance. As long as the request is cached, it will be served directly at Google’s network edge.

Setting up this feature with Google Cloud’s load balancer is really straightforward. All you need to do is check the “Enable Cloud CDN” button while creating the backend definition for your HTTPs LB, and google takes care of the rest.

Here’s what the performance graph looks like when fetching this request through the load balancer to the instance directly vs checking the “enable cloud CDN” button

You can see in the graph above that once the asset gets cached, the response time drops significantly. All you had to do was check a box!

What’s even better about this, is that you can see that there’s no extra instances needed for this process. While NGINIX, Varnish, and Squid require dedicated hosting on a VM, google’s LB + CDN is serverless.

Step #2 — Combine the CDN+GCS for static assets

If content is static, you can also reduce the load on the web servers by serving content directly from Google Cloud Storage. Typically, your compute url is separated from your cdn url (e.g. vs However, with Google Cloud Load balancer, we’re able to create a host routing path, so that will route over to fetch assets from a public Google Cloud Storage bucket, which is also cached to google’s CDN.

The setup for this was straightforward. I was able to copy the exact steps from here . The trick was setting a new backend bucket in the Load balancer (meaning you can have a backend service, which poitns to an instance group, or a backend bucket, which points to a gcs bucket

Furthermore, you can improve the performance of your backend bucket, but checking the “Enable cloud CDN” button when you create it on the loadbalancer UI:

Here’s a graph showing the results ; We can see that once again, the asset eventually gets cached on the CDN, and are more available.


Armed with this new information about the power of google’s Load Balancer, Tax Lemmings made a few changes, and updated their architecture:

This change resulted in lots of new caching, both for dynamic request (with proper headers) which helped drop the # of requests to the backend, spinning up less instances to service the same load.

HEY! Listen!

Which is faster, TCP or HTTP load balancers?

Did you know there was a connection between core count and egress?

Want to know how to profile your kubernetes boot times?

Colt McAnlis

Written by

DA @ Google; | | | |

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade