Caching CouchDB requests with Nginx

Caching frequently-performed searches can make your app run much faster with very little change in your code

Glynn Bird
6 min readNov 23, 2018

Apache CouchDB was born on the web. Its HTTP/HTTPS API is not a bolt-on afterthought — it is the way of interacting with the database built in from the ground up. Let’s take the use-case of CouchDB being used as a back-end database in a traditional client/server web app:

Web-app architecture schematic

Web users interact with a web page, sending HTTP requests to one of a number of application servers. The application, needing data to render the page will make an HTTP request to CouchDB to get fetch the data and then respond back in kind to the client.

If the same request is being made to CouchDB over and over again in a short time frame, then the database simply answers each request. Under production loads and to avoid overworking the database, developers may choose to cache data in their app rather than make a round-trip to the database. This is suitable for:

  • data that doesn’t change very often e.g. a database of US zip codes
  • slices of data that are accessed frequently but where it doesn’t matter when the user sees a slightly stale version of the query. This is very application-dependent but let’s imagine your e-commerce site is to have a list of three special offers on the front page. As the front page is accessed frequently, it makes little sense to query the database for every page render.
Caching is like hoarding stuff that you need quick access to later. Photo by Denise Johnson on Unsplash

There are many ways to implement a cache. In this article I’ll show how a Nginx proxy can be created to cache HTTP requests to take some of the load off your CouchDB service and to get data to your app quickly.

What is Nginx?

Nginx is an open-source web server. At its simplest it can serve out a tree of static files over HTTP. It can also be configured as a “reverse proxy”, that is it can sit between a client and server and transparently route traffic between them, caching some of the content to allow a future repeat request to be serviced from the local cached data.

Web-app architecture with Nginx as a reverse proxy.

In our application we’ll be configuring Nginx as a reverse proxy and placing it between our application servers and CouchDB. Instead of our application connecting directly to CouchDB, it will instead connect to Nginx which will either return some cached content or make the CouchDB request and return that.

Nginx can be installed in two places:

  • on the same machine as your application code (your app will connect to port on “localhost”.
  • or, on a separate machine your network and shared between multiple instances of your application server.

The former approach is simpler, but the second allows multiple application servers so share the same cache pool.

Installing Nginx

Follow the installation instructions for your platform — on my Mac I used brew:

brew install nginx

Configuring Nginx

The configuration for Nginx belongs in a file called nginx.conf. We're going to leave the installed configuration as is and create a new one with the following content:

N.B Change the hostname to the hostname of your CouchDB service in the proxy_pass line of the configuration file.

We can then run nginx with the command:

$ nginx -c $PWD/nginx.conf

and stop it with:

$ nginx -s stop

To monitor nginx's logs, simply tail the log file:

$ tail -f /usr/local/var/log/nginx/access.log

Testing the nginx proxy with curl

I like to setup an environment variable containing the URL of my CouchDB/Cloudant service to save typing. In this case, the URL needs to be of the form:

$ export COUCH_URL="http://USERNAME:PASSWORD@localhost:8080"

Notice that:

  • We are using http not https. Nginx is serving out HTTP only - it will use HTTPS to communicate with CouchDB from there but uses HTTP to service its clients.
  • We need to include our CouchDB username & password in the URL. Nginx will pass on the authentication headers we supply.
  • We use localhost on port 8080 as our hostname when we want to communicate with Cloudant via the proxy.

Now we can test the connection by visiting the top of the Cloudant API service:

$ curl $COUCH_URL/
{"couchdb":"Welcome","version":"2.1.1","vendor":{"name":"IBM Cloudant","version":"7410","variant":"paas"},"features":["geo","scheduler","iam"]}

It we repeat the request with the -i command-line switch, we can see whether the data is coming from CouchDB or via the cache by looking at the X-Cache-Status header:

$ curl -i $COUCH_URL
HTTP/1.1 200 OK
Server: nginx/1.15.6
Date: Fri, 16 Nov 2018 10:07:44 GMT
Content-Type: application/json
Content-Length: 144
Connection: keep-alive
Cache-Control: must-revalidate
X-Couch-Request-ID: 6204aa106f
X-Frame-Options: DENY
Strict-Transport-Security: max-age=31536000
X-Content-Type-Options: nosniff
X-Cloudant-Request-Class: unlimited
X-Cloudant-Backend: bm-cc-us-south-11
Via: 1.0 lb1.bm-cc-us-south-11 (Glum/1.66.0)
X-Cache-Status: HIT
{"couchdb":"Welcome","version":"2.1.1","vendor":{"name":"IBM Cloudant","version":"7410","variant":"paas"},"features":["geo","scheduler","iam"]}

In the nginx logs, you should see "HIT" or "MISS" against each entry:

127.0.0.1 - MISS  [16/Nov/2018:10:06:41 +0000]  200 "GET / HTTP/1.1" 144 "-" "curl/7.54.0" "-"
127.0.0.1 - HIT [16/Nov/2018:10:07:44 +0000] 200 "GET / HTTP/1.1" 144 "-" "curl/7.54.0" "-"

The first fetch was a “MISS”, the second a “HIT”.

Try fetching some data and repeating the request to get the cached version. We can use the time command to get an idea of how much the cache is speeding things up e.g.

$ time curl -s $COUCH_URL/cities/_all_docs?limit=500 > /dev/null
real 0m0.849s
user 0m0.007s
sys 0m0.007s
$ time curl -s $COUCH_URL/cities/_all_docs?limit=500 > /dev/null
real 0m0.020s
user 0m0.007s
sys 0m0.006s

The first request took 850ms, the second (cached) request took 20ms.

Putting cache to work in your app

Using the Nginx-powered cache in your own app is as simple as feeding a different URL to the Node.js library:

sample code that directs reads to Nginx and writes to the database directly.

The above code makes two objects: one to handle read-only requests via the Nginx proxy, the other for writes that connects directly to the database. The root path of this app performs a query via the proxy, outputting the result.

Running this app has the same performance profile as the curl tests: cached data is retrieved much faster than running a query on a database cluster on the other side of the world.

When to use caching

Employing caching is a trade-off between speed of returning the results against the freshness of data returned. If you know your data isn’t changing frequently, then a generous cache window (say an hour or a day) may be used. If it’s important that fresh data is surfaced to your users quickly, then a shorter window (say 5 or 10 minutes) may be better.

Caching works well when handling “peaky” traffic: let’s say a particular page on your site becomes popular because of the success of a marketing campaign. It’s better in this case to cache the pertinent content and deliver the results quickly, rather than wasting your database resources producing the same results over and over again.

Caching can help take the load from your expensive primary data store by bring cheaper and faster resources to bear instead. Oh and cached data is returned faster.

The nginx configuration caches all GET & HEAD requests by default. I added POST to the proxy_cache_methods configuration to catch query API calls which use the POST /db/_find method. This may have unintended consequences if you route writes through this proxy e.g. POST /db/_bulk_docs or POST /db. I would recommend only sending read requests through the proxy and any API calls that modify data should be sent directly to CouchDB.

reads go through Nginx, writes go directly to the database

--

--