Linux Page Cache Perils

Vinoth Chandar
Sep 16, 2018 · 3 min read

[Reposted from Blogger]

I am sharing my experience with benchmarking the Voldemort Server- effectively a Java key-value storage server application on Linux (referred as ‘server’ from here on). The application handles put(k,v) and get(k), like a standard HashMap, only difference being these are calls over the network, and the entries need to be persistent.What the post talks about is generic and could apply to most java server applications.

The goal here was to run a workload against the server, in a manner that it comes off disk, so we exercise the worst case path. As easy as it may sound, certain things come in our way

  • OS Page Cache — Linux caches files the server writes/reads (unless you do direct I/O and build your own cache on top of it)

Lets us some notations, for different knobs/parameters that play together here.

RAMSIZE — actual physical memory on the machine
JVMMAX — Maximum size of jvm you configure using -Xmx
JVMSTART — Start size of the jvm, -Xms
JVMUSED — actual memory that the JVM uses.

Lets explore our options.

Generate lots of data

This is a practical option since if you generate data several times higher than the RAMSIZE, a large portion of your workload will come off disk. Problem is that with modern servers (+ slow SAS disks) with close to 100GB of ram, this process takes a long time. i.e You need to generate 1TB of data to make sure only 10% of the workload will come off cache on an average.

Use a memory hogger (see mlock()) to clamp down RAMSIZE — JVMMAX
The idea is to clamp down a portion of the ram, virtually shrinking it, so that Linux feels the shortage of memory and does not populate the page cache as much.
But, the jvm does not consume JVMMAX bytes right away, hence JVMMAX — JVMUSED is available for Linux, to use as it pleases.

The idea here is to force the jvm heap to be as large as the RAM, leaving little space for the page cache.
But, as we saw before, jvm does not allocate this many, even if you explicitly set the start size.

The idea is to examine the actual amount of memory the server uses for the workload, with a dry run, and then clamp down the rest using a hogger.
But, the jvm thinks that its crunched for memory (lets not go deep into GC tuning here), and starts GCing a lot, potentially affecting the experiment.

Tune GC so you won’t GC
The idea (a bit far fetched) is to tune the server gc settings so it won’t gc in the case above.
But, its quite hard to do in practice and what if you want to run different workloads and your gc settings are workload specific.

Disable page cache altogether
The idea is to periodically do a sync; echo 0 > /proc/sys/vm/drop_caches which should flush all of the page cache.
But, this takes up some CPU and if the server is cpu bound, you might be potentially altering the experiment.

So, whats the verdict you ask ? :) Attempt one of the above if in your particular scenario, the “buts” are non existent somehow or simply generate lots of data.


Occasional quips on technology/programming/work/life

Vinoth Chandar

Written by



Occasional quips on technology/programming/work/life