Varnish — A Caching Love Story

6 min readMar 20, 2017


Author: Razvan Grigore

Writer’s note
You can discover this article and our job postings on our own blog!

Google says “speed matters”. They said that in 2009, but nothing really changed, on the contrary, it got more and more important with the increased adoption of really fast broadband connections, faster PCs and higher expectations of the users.

The current Google PageSpeed Tools Insights provide a complete list of things one can do to improve the overall website speed, but we are going to discuss mostly about server side response times in this article. 200ms. This is the acceptable time that your backend has to respond in order to be considered fast. To put this into human reality, one blink of an eye takes approx. 400 ms, so less than half a second.

As any startup out there, features usually have higher priority when it comes to development time, so our shop got to a point where we had a beautiful design, rich features our users enjoyed, but the response time was going up as we added more and more features.

We had to find solutions.

Since development time is “expensive”, we first tried to scale horizontally and vertically. This basically means adding more servers or increase the resources of the existing ones. Sounds easy, since hardware resources are pretty cheap, but some things like databases are very hard to scale and some requests will just be too slow, even when there are enough resources available. This bought us more time, but clearly is not the solution.

Next, we moved the personalisation to the frontend and obviously started caching things on the server side. This can be split in two big categories:

1. Application cache used by developers directly in the code (memcached and Redis) for caching things that did not change so often and could be easily (but slowly) reconstructed from the database, like the shop category tree, product attributes, URL routes etc.

2. HTTP cache that prevents identical requests hit the backend twice, returning the same, un-personalized response (HTML or JSON) to multiple clients in a predefined time frame.

The first method is widely used by almost any medium to large application, Redis being very popular nowadays, so in this blog post I will cover the second method, using Varnish HTTP Cache.

Spoiler alert, this turned out very effective, keeping our most accessed pages very fast, lowering our average response time from 120 milliseconds to under 60 milliseconds. To understand the impact, our PHP backend response time without any HTTP cache layer is around 200 milliseconds, but can become slower with peak traffic spikes like TV spots or too many concurrent customers browsing our website.

The way this works in principle is better explained using The Varnish Book, a free, comprehensive, nitty-gritty technical “bible” for all things varnish can do. There are multiple things that can happen when the user requests one of our pages:

1. Page is not marked as cacheable by us so it will always hit the backend. Examples include the basket, wishlist, personal recommendations, customer data endpoint, and other things that are unique for each customer.

2. Page is marked as cacheable, since it is un-personalized, as we call it, so the response can be identical for all customers while the expire time is valid. This includes category pages, outfit pages, JS & CSS assets, static images, similar products recommendations and other API endpoints.

a. First user that hits this page while the cache is empty (or expired) will trigger a backend request in the background to fetch it (alike case 1). All other users that are trying to request the same page or API, even in the same moment, will wait for this backend request to finish. The response will be saved (cached) in varnish memory, and then broadcasted to all waiting clients. This is called a cache-miss and it would be represented like this:


b. The second, ideal case, is when the data is already in cache, so the response is delivered very fast, directly from varnish RAM memory, without even our backend to know about it. This is called a cache-hit, and is represented like this:


Cache hit-ratio is the measurement of how effective this method is, defined as the percentage of requests delivered from the cache memory (hit), divided by the ones that reach the backend (miss). The higher the number, the better. For, we are around 48% of all cachable pages. You can learn how to increase the hit-ration in many tutorials online, but the most important advice would be to keep the number of GET parameters in control, normalizing the requests. To do this, we first order them alphabetically, avoiding duplicated pages in the cache. This can be done very easily with the “std” module, included in the standard varnish package.

Next, we had many GET parameters that are used only in the frontend, like UTM parameters, that were breaking the cache. This happens because varnish uses, by default, the full URI + hostname as the cache key to determine if a page is in the cache or not. To fix this, we excluded those parameters from the hashing method, causing one page with different “utm_source” parameter for example to return the same response from the cache. The VCL code to do this, production tested:

As you might have already guessed, varnish cache is positioned in our stack in front of our backend, but before our NGINX SSL termination proxy, since varnish does not support HTTPS for objective reasons described back in 2011 and reiterated in 2015 after the popular HeartBleed bug was disclosed.

Since our website has thousands of customers online at any point in time during the day, we must serve the PHP backend using multiple identical servers. Varnish has a nice feature that allows you to configure multiple backends (also called “origins”), and you can choose to construct a director out of them, using, for example, round-robin for increased performance and resilience. There is also a random director which distributes requests in a, you guessed it, random fashion.

But what if one of your servers goes down? Can Varnish direct all the requests to the healthy server? Sure it can. This is where the Health Checks come into play. Those are defined in the backend definition config and will be called at predefined intervals to check if servers are up. One can also manually mark one backend server sick, using varnishadm cli tool, very usefull for rolling updates of the servers without failing any requests to the client. Our backend config looks like this:

Next, the most important part is to configure the actual caching, choosing what URLs to cache. This is done in two steps, one in vcl_recv sub, at the end, where we clean the client request of the cookies header, to avoid any personalization:

The second step happens in vcl_backend_response sub, where we have the chance to clean the response before it gets saved into the cache. We need to define here the TTL (expire time) and make sure no cookies are being set by the backend:

The config examples shown above should be enough for a basic varnish caching setup, but we went a few steps forward, and used the power and speed of VCL to do more interesting stuff like:

· geoip redirect based on customer IP address

· mobile detection based on User-Agent header

· a lot of temporary or permanent redirects

· URL manipulation

· Basic-Auth

· Request tagging with UUID for logging

· A/B Testing

· Setting cookies

· CORS pre-flight request headers

· Access logs in JSON with varnishncsa & graylog

· Varnish server provisioning with ansible

Please let us know in the comments what you think our next varnish blog post should be about. Eager to work with this cool technology? We’re hiring — have a look at our open positions (!




ABOUT YOU Development and Engineering articles. Check out more content and tech job openings on!