Caching POST Responses with Nginx

…because you can’t do it with Varnish

Caching the response of a HTTP POST request is very niche, and most of what you’ll find online will tell you that you shouldn’t be doing it. First of all, the RFC says you really shouldn’t be doing this, only do this if you have either no alternative or know what you’re doing.

Ultimately the result of a POST request should be a cache invalidation, not an addition or the return of a cached result. I was in a situation where I was providing a HTTP service where if you make a request the response won’t change for at least the next X minutes, no matter who requests it. I couldn’t adjust the way requests were sent to this service, which were POST SOAP requests, so I had to find a way to cache the results without changing the input.

Attempt #1 - Varnish

So I thought that since Varnish is pretty much the caching software, it would solve my problems (spoiler alert… it didn’t). However I had a few problems to work out before I found that out.

Firstly Varnish out of the box doesn’t give you access to the POST data and secondly I’d have to find a way to construct a key to store the cached result. I did however find two Github repo’s that would allow me to do pretty much that…

With those both compiled and Varnish running on my laptop I wrote a simple Varnish VCL config file, this is what I ended up with…

import parsereq;
import digest;

backend default {
.host = "127.0.0.1";
.port = "4567";
}

sub vcl_fetch {
set beresp.ttl = 5s;
}

sub vcl_recv {
parsereq.init();

# only try to cache POST requests let GET’s fall through
if (req.request == "POST") {
return (lookup);
} else {
return (pass);
}
}

sub vcl_hash {
hash_data(digest.hash_md4(parsereq.body(post)));
return (hash);
}

Quickly going through it, the first two lines are our imports for the compiled vmods we found. The backend just specifies where our actual app is (I was hosting one on port 4567). For testing I only wanted to cache for five seconds, which is defined in the vcl_fetch block. To make the hashing functions available to us we need to initialise them, that’s the first line in vcl_recv.

Since we’re only interested in caching POST, we’ll just deal with that here, everything else can just drop through to the backend. We can’t use a traditional key for the cache store, we need to make our own. This is done in vcl_hash and will md4 the POST body (I would have used md5 but it wasn’t functioning correctly at the time).

Now this looks simple enough to actually work, but Varnish will strike you down with great vengeance. It’ll read your config, it’ll even follow the rules you’ve set, but when it goes off to the backend to get the answer to your first response, it will convert your lovely POST request into a standard GET. Problematic, and very unhelpful, but just to make things worse, it’ll go ahead and throw away the POST body data, so even if your app was capable of dealing with that there’s no longer any data to work from. Awesome.

That’s really as far as I got with Varnish, it was quite disappointing to get that far only to be shot down with a technicality at the final post. I can’t really blame them for simply enforcing the RFC though.

Attempt #2 - Nginx

You’ll have to ignore the Ruby specifics below, but this is the Nginx config I used to get all this working with my test environment. My setup was very simple, a Sinatra app that returned random numbers whenever a POST request was sent to it. When building things like this, it’s easier just to keep things very simple otherwise you’ll just end up fighting against yourself.

worker_processes 1;
daemon off;

events {
worker_connections 1024;
}

http {
include mime.types;
default_type application/octet-stream;
sendfile on;
keepalive_timeout 65;
client_max_body_size 50k;

passenger_root /Users/lloyd/.rvm/gems/ruby-1.9.3-p385@nginx-test/gems/passenger-3.0.19;
passenger_ruby /Users/lloyd/.rvm/wrappers/ruby-1.9.3-p385@nginx-test/ruby;

proxy_cache_path /Users/lloyd/Code/nginx-test/cache levels=1:2 keys_zone=small:1m inactive=1m;

server {
listen 8080;
server_name localhost;

location / {
try_files $uri @passenger_backend;
}

location @passenger_backend {
proxy_pass http://127.0.0.1:8081;
proxy_cache small;
proxy_cache_methods POST;
proxy_cache_key "$request_uri|$request_body";
proxy_buffers 8 32k;
proxy_buffer_size 64k;
proxy_cache_valid 5s;
proxy_cache_use_stale updating;
add_header X-Cached $upstream_cache_status;
}
}

server {
listen 8081;
server_name localhost;
root /Users/lloyd/Code/nginx-test/public;
passenger_enabled on;
rack_env development;
}
}

The way this works is that Nginx is listening on 8080 for requests and our backend is on 8081. We’re going to be directing all our requests at 8080. When we do that, it will automatically pass on the request to the passenger_backend location and then the proxy configuration begins.

For a request that either doesn't get cached, or one we don’t have a cached response for we want that request to be sent onto the main app, which is on 8081, that’s the proxy_pass. We explicitly tell Nginx to cache the POST, however it will also cache GET and HEAD requests by default, this can’t be disabled.

You will need to add additional logic here to either avoid those requests being cached, but for this blog post, we’ll be ignoring that side of the problem. This config will assume that every request is a POST, and everything will be cached the same way. We specify that the cache lifetime is 5 seconds, but it will only cache valid responses, so if you send back a 404, 50X etc, those won’t be cached at all.

proxy_cache_key "$request_uri|$request_body";

To actually cache POST requests we need to adjust the cache key, as by default it won’t include anything to do with the request body. So we include that here, the result of this is hashed by Nginx and used to lookup future requests. Bear in mind if you have a large request body you’ll be hashing it every time and looking that up. You’ll need to play around with the size of the buffers so that it can fit your request body, otherwise it’ll just error and drop the request (not what you want), but don’t go too overboard with the size, keep it realistic, as this will affect memory usage.

add_header X-Cached $upstream_cache_status;

I’ve also added a new header called X-Cached, all this will contain is either a HIT, or a MISS. This just lets you know how the response was handled, either by a HIT on the cache, or a MISS to the backend. Can save hours of debugging, especially once in production where you may need to know what happened.