Fake and free Bypass-on-Cookie, with CloudFlare edge cache workers for Wordpress
--
Let me tell you a story… Story of TTFB < 100ms. I’ve used CloudFlare services for quite a while now, and i am a big big fan. I’ve used their products in a range from free, to enterprise packages. And they’re great, they push the envelope and are mostly the best in what they do.
Yet, since they offer so many free services, somebody has got to pay for them, and that somebody is the customers on higher plans, especially the Business plan, that actually cannot live without features such as “bypass-on-cookie”.
Wait, what is Bypass on Cookie, and why do i need it?
Bypass on cookie is a common HTTP caching technique. For example, if you wanted to spin up a Varnish server, it automatically caches all GET requests. ALL of them. That means admin panel, country pages, profiles etc.
Lets make an example from practice -> Wordpress, and i spin up Varnish in front of it. Remember how WP has that little top bar that it shows to logged in users, while browsing the site? Well, since Varnish cares not if i am logged in or not, it will cache it, and all other users or guests will see it as well.
The exact mechanism behind the scenes is that varnish removes / ignores all the cookies from the request, thus making it / considering it stateless. This is why, from the perspective of the cache itself, all requests are the same and should have the same output. Anyways, the common way of fighting this is setting up a “bypass on cookie” rule, read more about it here. In short, it will add a rule that skips cache, if any of the mentioned cookies is found on the request.
For example, in the case of wordpress, we expect these common cookies to mean that the user is admin, logged in, customer or whatever else: wp-*, wordpress-*, comment_*, woocommerce_*
. If you got any of those, it means that you are not an ordinary guest, and the cache is not a valid way to handle your request.
So, if we imagine a news site, the cache will cover 99% of requests, with a performance of a good horse and TTFB in miliseconds, but any of the editor/admin browsers will always be treated as special and skip cache.
“Bypass-on-cookie” — Cache exception
Isn’t CloudFlare a CDN? Like, caching and stuff?
Oh yes, and they do offer this option. It is super-giga-mega easy to use and solves everything in a flash. But, it costs quite a lot of money. At the time of writing this, it is only available for users on the Business plan ($200/mo).
Dont’ get me wrong, the plan pays off magnificently, if you have a need for it. The level of support and dedication you get is tremendous! But, for small blogs, wp websites, local business, it is an overkill. At least until they face their first DDoS attacks…
For example, there is a similar deal with the really important custom-cache-key option. Its in a forever-beta, and you can get it only if you are Business or Enterprise. In return, it allows you full control over the caching mechanism.
“Cache-key” — cache variation.
Enter, serverless workers
Back when CF announced their support for serverless workers, i was verry happy, but a bit confused, as it seemed to contradict their business strategy of gatekeeping the heavyweight features.
In theory, i could make a worker that does all of that. They save on performance you say? Nah, they support Rust workers.
In the end, i found out that they just decided to lock us (freebies) out of the really fancy features. For example, one is not allowed to change/provide the cache key while asking the cache to store the response. Furthermore, one is not allowed to skip cache or pagerules at all. So, workers are there, but they are immensely less powerful, and CF gets to keep their big-game paying customers. My enthusiasm dropped.
Wordpress Edge HTML Cache Worker
At one point, CF released a repository with example workers for common tasks. One of the examples was very important to me, it has a misleading title of “Edge-caching HTML for not-logged-in users”, which kind of sounds like what we saw earlier with the Cache-bypass-on-cookie.
Bear in mind that there are two official Wordpress Plugins and make a distinction here:
- “Cloudflare” — is their general purpose plugin which i usually use, which just generates some simple page rules and firewall rules. Not that much really.
- “Cloudflare Page Cache” — is a specific plugin intended to append purge cache header to the very next pageload coming from the worker. In simpler terms, if you edit any post, it will automatically purge cache for it.
The installation is very simple, you copy the worker code, make a KV storage and “it just works”. I decided not to look at the code too much, just try it out on a small live production. But like many other WP plugins, it has no clear directions about how exactly do i know it works? What are the edge cases, possibilities etc. So i started digging. CF-Cache-Status.net was showing that i am still not caching any html, which was weird to me. I figured i must have misconfigured something, so i started looking around.
I found user-submitted info about properly setting up page rules, which did not make that much sense since i don’t want to cache everything, thats what the worker is there for but i decided to play along. Wow! It worked, CF-status says everything is a hit, TTFB is 120–340ms…
Not really, what i was seeing was completely expected behaviour of the page rule i’ve just set up. Everything was being cached with no exceptions or variations. I blamed myself for not looking at the code or not reading patiently, so i even opened up an issue.
Bored and waiting for a response, is where looking at the code happens. I started browsing the API docs only to discover that overriding cache (forcing a fetch from origin) was not documented, and you can find comments here and there about it being “in forever-development” — which is obviously just a front for saying “yeah, we still got people paying for those, sorry maybe later”.
What does it do then?
I have no idea whatsoever. I am still waiting to hear anyone respond to my issue on GitHub. Ok, it obviously handles the purging mechanism, combined with the free CF Wordpress plugin, which is awesome, and even lets us control cookies by handing headers from origin (which is also awesome, although i fail to see it being used in reality), for example you can send a X-HTML-Edge-Cache
header which will override the default one — but in reality are you going to do it that often, why, and how?
One thing it does pretty good (and we are going to keep) is using KV storage to differentiate between cache versions. In simple terms, the worker only looks at text based responses, which means it does not parse images and assets. Good. Now add versioning to it, and we have no need to ever purge the whole cache (thus removing the biggest part of it), vecause it appents versions to the url. WP plugin handles the version incrementation independently. My only desire here is to manage to implement single-page purge, so that not all of html changes version whenever i update any post on site.
Anyways, there is still quite a lot of unanswered questions for me and i will update the article as i get more info.
Fake it till you make it.
First of all, there is definitely no way that we can force a cache-layer bypass. Simply put, the layer is always there and every request goes through it. But, we do have dirty mechanisms of making it beleive that we are serving a different kind of request.
Lets list a couple:
- We can bypass it via a page rule. CF offers an option to bypass cache when a url matches a regex query.
- Force it to forget the response by specifying a 0 TTL.
fetch(request, {cf: {cacheTtl: 0}})
although i havent been successfull in making this one work. And even if i did, it would just kill the cache for the entire url, instead based on cookie. - Send a header from the origin that states cache-control is no-cache. This, again works for the whole url, since we cannot split cache keys.
So, what i did was define an additional page rule which searches for the exact query string (since only GET requests are cached anyways). When this random/custom string is there, the rule is hit and we bypass the cache. Of course, we still need to make it there only on cookie, and we need for it not to be seen by the user. But for testing purposes, this worked.
TLDR: At this point, if a request containing the abovementioned query string hits the cache, it will always be proxied to the origin.
Teach the worker about the new query parameter
The only way that we are able to force this without the user knowing is by doing a url rewrite within the worker. We will take the original request, and add the little query string to it, no harm done. Btw thanks for this js.
function createBypassCacheRequest(request) {
const req = request instanceof Request ? request : new Request(request);let url = new URL(req.url);
let query_string = url.search;
let search_params = new URLSearchParams(query_string);
search_params.set('wdtcack', '1');
url.search = search_params.toString();
let new_url = url.toString();
return new Request(new_url, req)
}
Now, where exactly do we need to make this little switcheroo? We will amend the processRequest method to verify if it should do the switch at all:
if (bypassCache) {
request = createBypassCacheRequest(request);
}response = await fetch(request);
At last, there will be some edge cases. Because they made the cookie override via header (mentioned above) possible, they decided to require that a cached response is provided to shouldBypassEdgeCache() method. But when a page is uncached, the response is not available, so we cannot let it just return false always:
if (request) {let bypassCookies = DEFAULT_BYPASS_COOKIES;
if (response) {
const options = getResponseOptions(response);
if (options) {
bypassCookies = options.bypassCookies;
}
}
The above segment of code does the same thing, but it is ready to take on a situation where the cache response is not yet available, thus it still verifies cookies and everything even if there is no previous headers.
I have opened a pull request asking for the team to remove constraints put on by having to always wait for the cached response even if the request itself should not be cached.
Additionally, i made a forked pull request, which will never be merged, obviously, just to demonstrate the full solution in practice. Be careful though, remember that you have to set up the bypass page rule as advised above.
If you need to learn more about WP and CF, please read this awesome text by Gijo:
A couple of Gotcha’s in the end.
One thing to keep on mind is that CloudFlare Edge cache is sharded, based on datacenter location. So unless you are on enterprise plan with Argo turned on (which is like, woow), each CF datacenter spawns its own worker, and keeps a separate cache store for that datacenter. On enterprise plans, datacenters talk to eachothers, so there is eventual cache consistency.
Why is this important? Measuring TTFB.
My median TTFB is now down to 120–320ms on all 4 production websites i tested this on. But each “first” load (from another datacenter) takes longer as CF has to pull from the origin over and over in order to populate that datacenter’s cache. This means that the strategy is especially good for single market websites as most if not all traffic will come through a single datacenter. of course it is still viable for other uses, you just wont get to that 95% cached responses target mark as easily, unless you have quite a lot of traffic, in which case what are you doing here, go and buy a serious CF plan.
Also a small gotcha here, data shows that no one i have ever spoke to, on the free/business plans is getting to sub-100ms numbers, ever. It does not matter if they use workers or not. I beleive that CF tunnels only tier 1 traffic through the super fast route, and leaves us, freebies to use the normal route. 170ms is still good, i really don’t care.
Security implications
So far, i havent gotten word of any. But i can say that allowing this particular querystring to bypass cache can in fact be abused from the outside, and someone can write a bot to spam your origin with a bunch of requests. On the other hand, they can do that anyways, so you should probably set up a firewall rule preventing the same IP from asking for too much resources.
Future prospects, poor-man’s cache keys?
In my particular case i do not need custom cache keys, but here is a proposal, so i hope someone makes it before i do.
First, lets revisit what it means, as i’ve stated above “cache keys=versioning”. Lets imagine we have a homepage which shows different top list content for visitors from Germany and Russia. Maybe some of the items are forbidden, maybe its just a different order, no matter — its different. Since it is the exact same url “/” CF sees it only as a single cache key. So only one version can be stored.
If we take a look at what CF already uses to store “global” html versions in the worker example, we will see that they use the exact same system i used above for bypassing cache. Oh wow, so i’m not the only one abusing workers, CF does it as well.
function GenerateCacheRequest(request, cacheVer) {
let cacheUrl = request.url;
if (cacheUrl.indexOf('?') >= 0) {
cacheUrl += '&';
} else {
cacheUrl += '?';
}
cacheUrl += 'cf_edge_cache_ver=' + cacheVer;
return new Request(cacheUrl);
}
Ok, so seeing this, it meas we could easily create/add new query strings for our cache key permutations. By default CF generates cache keys in this manner: ${header:origin}::${scheme}://${host_header}${uri}
. So, the query string obviously participates in the party, and if we add another query parameter, for example country_restriction=US_CA
it will be treated as a key used only by the visitors from US_CA. of course, in order to construct it, we will need to use Cloudflare’s built in GeoIP within the worker.
What i dislike about the current solution, but could easily be fixed?
Yeah, well here’s a shortlist:
- I would like to have per-page KV keys. Each
GenerateCacheRequest()
would create a key based on the url of the request (or other paremeters), while Patrick Meenan’s (Patrick Meenan) CloudFlare WP plugin could be modified to accomodate for this. - I think that i might have broken global html-only versioning by removing the
accept: text/html
segment, tho i have not yet investigated. - Its 2020, Remove the use of API for cache purge, and concentrate on KV only.
- Reformat the JS code of the worker as it seems overly complex and non SRP for no apparent reason. It will be much easier to handle and modify.
- Perhaps even rewrite it in Rust just for fun and performance.
There it is, have fun! If you figure out where i went wrong, or have a better solution, or idea, or whatever, just comment! 🍺