Sushi — a Machine Learning Based Cache Algorithm

By Zhenyu Song, Fei Gao, Qizhe Cai

Published in

Princeton Systems Course

9 min readFeb 9, 2019

0. Motivation

Caching is a general technique in computer system design. Memory cache system and content delivery network (CDN) are two popular network caching applications today. Memory cache system is usually deployed in a datacenter. It offloads traffic from backend database server and reduces application server request latency. Content delivery network leverages edge server which resides near end users to provide low latency internet access. Analysis of Facebook’s photo caching shows the caching layer consumes 90.1% of total traffic.

However, previous cache algorithms are designed based on either worst case guarantee, or simple statistical generative model, or heuristic. None of them mines the request pattern and leverages it. It is plausible to say that an algorithm A works well enough on major cases, but it is unclear whether it is the best (or approximate the best) under trace B. Here we argue that machine learning helps. By use machine learning, we can construct cache model that exactly fits the access pattern! The intuition here is we can explore a larger algorithm space (model space) in order to get higher performance. The tradeoff will be the amount of metadata information keeping and computation overhead versus the accuracy improvement.

To formulate cache eviction process as a reinforcement learning problem. The cache state represents all objects in the cache and their request timestamps. And when each eviction happens, the algorithm needs to select one item based on cache state and evict it from the cache, such that there is minimum number of misses (miss rate). And note that the cache state is influenced by the algorithm decision, which makes the problem not a simple supervised training problem.

To simplify the problem, we can assume that there exists certain way that each object can be mapped to a score, which determines the rank of eviction, without knowing the request timestamps of other objects. In fact, we already know that the optimal offline algorithm Belady only need the next access information, which is an independent information among objects in settings like CDN cache. This means we can build a deep network taking the timestamps of one object as input, predicting the score of the object. If we think of classic algorithms such as LRU, FIFO and LFU, they are indeed models implemented this way.

To be more specific, let’s look at the network implementations. Here pi represents ith past interval, which is the time length from the most recent ith request to current timestamp. The higher score means higher probability of being evicted. For FIFO, the score only depends on the oldest past interval, and with a positive coefficient. For LRU, the score only depends on the most recent interval, with a negative coefficient. For LFU, a hidden layer will first compute sub scores which represents whether pi is within some threshold T. And the final score depends on how many pi are within the threshold, with a negative coefficient. Therefore, many heuristic cache algorithms are equivalent to some network structure. However, since they didn’t explore the network structure with the request trace, it is highly possible they are not optimal solution. In a word, our key insight is to use the trace data to deep explore the possible network structure in order to get better cache hit rate.

1. Design

Firstly, we notice that we do not need to include the request timestamps for all the items in the cache state (that we mentioned in the general form). Imagine that if you need to read every cache line’s request timestamps to choose one to evict, it will be an enormous cost which is not affordable. Therefore, our strategy is that when we need to evict an item in the cache, we just randomly sample several cache lines, and select one item from them to evict based on different policies. The number of the cache lines that we sample is called “sample rate” in our design. The typical sample rate is set to 64, and in the experiment section we will further evaluate how sample rate influences the hit rate.

Based on the general form of cache algorithm, we implemented three different policies using linear regression, ranking model, and reinforcement learning respectively.

Linear Regression

If we know the future access trace to the cache, then an optimal policy (Belady policy) can be used to achieve the highest possible hit rate. The algorithm is simple: kick out the item that will be accessed again in the farthest future. In other word, if we know the future request timestamps for the cache lines, we can build the optimal policy for cache replacement. So, our goal can be transformed to the prediction of the future request timestamp.

Here, we simply assume that the future request time interval (future request timestamp — current timestamp) can be predicted by the linear combination of the past intervals (current timestamp — past request timestamp). The number of the past intervals we store is set to 16 by default. Given an existing cache access trace, the first step is applying Belady policy to it and generate the training data. When we need to evict a cache line, Belay algorithm will look at the future access time of all the items in the sample set and choose the one with the largest future interval to evict. Then we collect the past intervals and the future interval for those items in the sample set to train our linear model. After a period, amount of time, we could get enough data to do the linear regression, get the 16 coefficients, and use it to predict the future access time.

Rank Model

Actually, the linear model does not fully utilize the information we get in the sample set. Totally we have 64*16 (64 sampled items and 16 past intervals for each item) past intervals in a sample set. However, we just calculate the future intervals separately, with the 16 past intervals for each item. The information of the relationship between different items is discarded. The rank model is used to solve this problem.

In this rank model, we just treat the sample set as a whole. We do not predict the future interval for each item any more. Instead, the model directly gives you the result of which one should be evicted, according to all the past intervals in the sample set. The training model is a 3-layer MLP (multilayer perceptron). Each past interval is an input data to the MLP, and each output from the network is the probability of choosing the corresponding cache line to evict. Because it is a fully connected network, we reduced the sample rate to 4 and the number of past intervals to 3, in order to simplify the model.

Reinforcement Learning

A severe problem lies in the previous two models: they all rely on the training data we get from the Belady policy simulation. When we actually use our linear model or rank model in the cache, the items kept in the cache is obviously different from the result that Belady policy has generated. So, it is not appropriate to use the model trained by Belady result to predict based on the data we get from our own policy. A typical example is that if the past intervals show that this item was frequently accessed in the past but got cold recently, it should be evicted. However, we will never learn this pattern from the Belady result, because according to the Belady algorithm this item should have been evicted right after its last access, and the “cold” period will never happen in Belady result.

Since the training data cannot always be appropriate, we don’t generate the training data in advance. Thus, we utilize the reinforcement learning in the new model. The basic thought is to use the reward of an action to update the weight that generate the action. After each eviction, we’ll evaluate the reward of this action based on the change of byte hit rate later and use this reward to calculate the “label” for that eviction. The model is continuously being trained during the whole simulation process, and the training direction is always to enhance the byte hit rate.

2. Implementation

We implement our caching algorithms by modifying Redis. Redis is a distributed in-memory key-value store. We can think Redis is a more complex version of memcached, where the operations are not just SETs and GETs, but operations to work with complex data types like Lists, Sets, ordered data structures, and so forth. Also, Redis supports durability by taking snapshots periodically or logging changes as operations that modifying the memory are processed. Redis supports random sampling for eviction and allows users to configure the sampling size.

We store the most recently visited timestamps per item. Currently, we store sixteen timestamps per item. We add support for evicting keys based on our linear regression policy. Our prototype achieves similar hit rates as our simulation of the linear regression policy. All these features require about 150 lines of C code. In the future, we will add other policies such as our ranking and reinforcement-learning policies to the store.

3. Evaluation

We evaluate our algorithm in both simulator environment and modified Redis environment. We use memcachier trace, which consists of tcpdump from their memcached server. After cleaning there are about traces of more than 400 applications. After sorting the 400 traces in ascending order, we pick up trace no. 50 to do the experiment because of the good size. For traces bigger, it takes too long to run the entire trace. Smaller traces are considered less accurate. Here since the object size is not of our interest (and the algorithm considering variable cache size can be easy implemented from uni cache size), we set each object to be size 1. And there is no difference between byte-hit-rate and object-hit-rate in this case, so we just name it hit rate.

Simulation Evaluation

For simulation, we evaluate our proposed algorithms with five other algorithms: LRU, GDSF, SLRU, Belady, Random. GDSF uses a global aging variable to “inflate” the out-of-date objects in the cache, SLRU uses multiple segments of LRU queue to capture multiple hit information. Belady is the optimal offline algorithm. Random chooses a random object out of the cache on eviction.

The simulation evaluation aims to show the hit rate performance for different algorithms when the size changes. For different cache algorithms, the hit rate grows as cache capacity increases. We can find Belady is always far higher than other algorithms, while GDSF, SLRU and Linear Regression are better than LRU. Interestingly, as cache capacity increases, Linear Regression’s performance gradually outperforms SLRU and approaches GDSF. And Ranking, Reinforcement has similar performs as random, which may be caused by implementation bugs.

Experiment Evaluation

throughput vs sample size

In this experiment, we evaluate how the sample size for the eviction affects the throughput of the store. We perform cycles of 1 second with 50% writes and 50% reads and measure the throughput by counting the number of requests is sent to the store. The horizontal axis is the sample size; the unit of the axis is the number of objects. The sample size increases from 4 to 64. The vertical axis is throughput, measured by the number of requests received by the store per second; the unit of the axis is requests per second. This graph shows that as sample size increases, the throughput of the store gradually decreases.

hit rate vs cache size

In this experiment, we compare the performance of the linear regression policy to LRU. The horizontal axis is the cache size; the unit of the axis is the number of objects. The vertical axis is object hit rate, measured by the percentage of successful get requests. This graph shows that regardless of the max object capacity is, the object hit rate of the store using linear regression policy is higher than the one using LRU.

Sushi — a Machine Learning Based Cache Algorithm

By Zhenyu Song, Fei Gao, Qizhe Cai

Linear Regression

Rank Model

Reinforcement Learning

Written by Zhenyu Song