A Reliable Approach to Unreliable Caching Environments

This article is part of the Academic Alibaba series and is taken from the WWW 2019 paper entitled “Counterintuitive Characteristics of Optimal Distributed LRU Caching Over Unreliable Channels” by Guocong Quan, Jian Tan, and Atilla Eryilmaz. The full paper can be read here.
When you first open an app, you’re pulling a bunch of data from a server — pictures, text, and all kinds of other objects that make the app work. If you had to repeat that process every time you accessed the app, you’d waste valuable seconds that you could have spent checking your email, battling virtual Vikings, or doing whatever else you do with apps.
Fortunately, we can use caching to avoid downloading data from a server over and over. With caching, some data from the server is stored locally, meaning you can access it faster the next time.
One of the most common types of caching is least-recently-used (LRU) caching, in which only the most recently used data is cached. When new data is coming in but the cache is full, the data that hasn’t been used for the longest time is kicked out to make room. Ensuring that these caching functions work well is key to maintaining system performance.
The Catch in Caching
Caching doesn’t always work perfectly. Sometimes data is requested but isn’t found in the cache. This is called a cache miss. To minimize cache misses, researchers focus on optimizing two main areas: cache organization (that is, the way memory is allocated in the cache) and data placement (that is, dispatching requests for data to the right caches).
Plenty of valuable research has been done on cache optimization for data centers and wired Web servers. But less-reliable environments, such as edge computing, wireless, and mobile networks, come with various new challenges: communication errors, mobility, fading, and more. Cache misses can occur even when the requested data was indeed in the cache. For this reason, in unreliable environments, each data item is generally stored in multiple caches.
Considering these characteristics, should unreliable environments use the same caching techniques that work in reliable environments? In collaboration with Ohio State University, Columbus, the Alibaba tech team set out to answer this question.
Hashing Out the Answers
The team considered a single-hop multi-cache distributed system that dispatched data items to multiple cache spaces. This process relied on a technique called random hashing, assuming each data item was mapped to a set of caches instead of a single one — as would be the case in a typical unreliable environment. Through analyzing this multi-cache system, the team developed a model that considers cache organization and data placement simultaneously. They also derived an effective method for calculating cache miss ratios.

The team encountered surprising differences between what works best in reliable and unreliable environments. The most counterintuitive results came from looking at how to optimize cache allocation. One might expect that allocating space equally in each of the multiple caches would yield the best performance, as has been verified for reliable environments. But the team actually found that in unreliable environments, allocating space unequally was best for performance, even when the caches were otherwise identical.

The team’s research also analytically showed that in unreliable environments, data items of disparate popularities should be copied differently across unreliable nodes for better performance.
The Future of Caching
Alibaba’s insights into caching techniques can help engineers develop and implement more-efficient caching solutions. What does this mean for you? Fewer cache misses means fewer delays in fetching data, which means less waiting around for apps — and more time for virtual Viking battles!

The full paper can be read here.
Alibaba Tech
First hand and in-depth information about Alibaba’s latest technology → Facebook: “Alibaba Tech”. Twitter: “AlibabaTech”.