Caching: File system

Leon Fayer
Web Performance for Developers
3 min readNov 21, 2017
Toronto, Canada © Leon Fayer

Perhaps the most effective caching mechanism is in-memory cache. Tools like memcached and redis give a great out-of-the-box capability to store and retrieve your content to/from memory and effectively manage TTL. But what can you do when you don’t have those tools available?

Never fear, you can implement your own on-disk cache. While not as effective as in-memory cache, it would still provide a significant relief to system under heavy load.

The basics

The premise is simple. Take any function that retrieves, generates and returns content from any source (such as database queries, API calls and third-party content providers) and add a conditional to it.

sub generateContent() {
if (has_cache($cache_file, $cache_ttl)) {
return _read_cache($cache_file);
} else {
return _write_cache($cache_file);
}
}

The conditional will check if the cache file exists and if it does, get content from cached file instead. If cached content doesn’t exist, generate content and store it in cache. Simple, right?

_read_cache and _write_cache functions are as simple as you imagine them to be. It’s a basic file manipulation: open, read/write, close.

###### FS cachingsub _read_cache {
my $cache_file = shift;
open (my $fh, '<', $cache_file)
or die "Can't open file doe reading $!";
read $fh, my $cache_content, -s $fh;
close $fh;
return $cache_content;
}
sub _write_cache {
my $cache_file = shift;
my $content = generateContent();
open(my $fh, '>', $cache_file)
or die "Could not open file '$cache_file' $!";
print $fh $content;
close $fh;
return $cache_content;
}

The only remaining missing piece is cache validation. Unfortunately, because you’re building a cache system from scratch, unlike in-memory tools that manage your TTL for you, you need to implement your own TTL management. You can simply check the last updated timestamp of the cache file against your desired cached time. Done!

sub has_cache {
my $cache_file = shift;
my $cache_ttl = shift; #in seconds
if (-f $cache_file) {
# in seconds
my $last_updated = (stat($cache_file))[9];
my $now = int (gettimeofday);
my $diff = $now - $last_updated;
return 1 if ($now - $last_updated <= $cache_ttl);
}
return 0;
}

The caveat

There are only two hard things in Computer Science: cache invalidation and naming things — Phil Karlton

The advantage of this cache is that it’s persistent, even if you reboot the machine it’s on, the cache files will still be available. The drawback — it’s locally disk-bound. Which, among other things associated with disk bound processes, means you need to have a separate cache on each of the nodes. Which, in turn, means that you need to make sure you invalidate the cache across every node on-change. There are numerous ways of doing it, like setting up listeners on each node or implementing timed or on-demand cache purge instead of on-change triggers. Whichever way you decide to implement it, be mindful of the problem you’re facing. Otherwise you’ll be left with a problem worse than stale content — inconsistent content.

--

--

Leon Fayer
Web Performance for Developers

Technologist. Cynic. Of the opinion that nothing really works until it works for at least a million of users.