Linux Page Cache Perils

[Reposted from Blogger]

I am sharing my experience with benchmarking the Voldemort Server- effectively a Java key-value storage server application on Linux (referred as ‘server’ from here on). The application handles put(k,v) and get(k), like a standard HashMap, only difference being these are calls over the network, and the entries need to be persistent.What the post talks about is generic and could apply to most java server applications.

The goal here was to run a workload against the server, in a manner that it comes off disk, so we exercise the worst case path. As easy as it may sound, certain things come in our way

  • OS Page Cache — Linux caches files the server writes/reads (unless you do direct I/O and build your own cache on top of it)
  • JVM memory management — the JVM heap shares the same physical memory as the page cache and you don’t have very fine control over how much memory it will take.
  • This interplay is tricky as we will see below, in terms of actually controlling the page cache size (I know you are not supposed to), so our test parameters be met.
  • In my case, we have a BDB JE cache on top of the JVM, which makes the page cache pointless, since we are double caching, the same data. But, its another separate issue. Lets leave that out now.

Lets us some notations, for different knobs/parameters that play together here.

RAMSIZE — actual physical memory on the machine
JVMMAX — Maximum size of jvm you configure using -Xmx
JVMSTART — Start size of the jvm, -Xms
JVMUSED — actual memory that the JVM uses.

Lets explore our options.

Generate lots of data

This is a practical option since if you generate data several times higher than the RAMSIZE, a large portion of your workload will come off disk. Problem is that with modern servers (+ slow SAS disks) with close to 100GB of ram, this process takes a long time. i.e You need to generate 1TB of data to make sure only 10% of the workload will come off cache on an average.

Use a memory hogger (see mlock()) to clamp down RAMSIZE — JVMMAX
The idea is to clamp down a portion of the ram, virtually shrinking it, so that Linux feels the shortage of memory and does not populate the page cache as much.
But, the jvm does not consume JVMMAX bytes right away, hence JVMMAX — JVMUSED is available for Linux, to use as it pleases.

Set JVMSTART=RAMSIZE 
The idea here is to force the jvm heap to be as large as the RAM, leaving little space for the page cache.
But, as we saw before, jvm does not allocate this many, even if you explicitly set the start size.

Hog up RAMSIZE — JVMUSED
The idea is to examine the actual amount of memory the server uses for the workload, with a dry run, and then clamp down the rest using a hogger.
But, the jvm thinks that its crunched for memory (lets not go deep into GC tuning here), and starts GCing a lot, potentially affecting the experiment.

Tune GC so you won’t GC
The idea (a bit far fetched) is to tune the server gc settings so it won’t gc in the case above.
But, its quite hard to do in practice and what if you want to run different workloads and your gc settings are workload specific.

Disable page cache altogether
The idea is to periodically do a sync; echo 0 > /proc/sys/vm/drop_caches which should flush all of the page cache.
But, this takes up some CPU and if the server is cpu bound, you might be potentially altering the experiment.

So, whats the verdict you ask ? :) Attempt one of the above if in your particular scenario, the “buts” are non existent somehow or simply generate lots of data.