We at Vimeo offer tools and technology to host, distribute, and monetize videos. And with so many videos being created every day, our storage requirements are always on the increase. Did you ever wonder how we tackle this problem? Like you (probably), we look to the cloud; that’s where we store our video source files. Additionally, we include the location of the source file in a database with the metadata of every video on Vimeo.
Which means that serving a video is a two-step process. The first step is to query the metadata database so that we know where to look for the video file. The second step is to access that file in cloud storage.
The problem is, as the number of requests goes up, we need to query the metadata database more and more. You’d hardly notice it on a smaller scale, but as the load increases, performance goes down. Furthermore, to handle the load, we’d eventually need to increase the number of database nodes, which is an expensive proposition.
One way to overcome this dilemma is to cache the metadata, because it rarely changes. Caching not only lowers our latency, but it also increases our capacity, because we can serve more requests in a shorter amount of time. To implement our latency and capacity caching needs, we chose groupcache.
Groupcache is a caching and cache-filling library that is available for Go. Written by Brad Fitzpatrick, the author of memcached, groupcache is an alternative to memcache that outperforms memcached in a number of scenarios. Like memcached, groupcache is a distributed cache, and it shards by keys to determine which peer is responsible for the key. However, unlike memcached, groupcache has a cache-filling mechanism during cache misses. The cache filling is coordinated with peers so that only the peer responsible for the key populates the cache and multiplexes the data to all the callers. The singleflight package helps with the cache filling by coalescing multiple requests for the same key, so only one call is made to the database, mitigating the thundering herd problem.
Another important feature of groupcache is that it doesn’t need to run on a separate set of servers. It acts as a client library as well as a server and connects to its own peers. This reduces the number of configurations required to deploy an application, which makes deployments easier. However, this presents a new challenge when applications are running in a dynamically provisioned environment: discovering and monitoring groupcache peers.
Peer discovery and monitoring
The groupcache instances need to communicate with each other to request data from peers who are responsible for the specified key. We use Kubernetes for container orchestration, and Kubernetes assigns an IP address to every Pod (the smallest deployable Kubernetes object). We can’t pass the addresses of all the Pods to the application, because IP addresses are dynamically assigned every time that we deploy. Furthermore, we wouldn’t be able to inform the instances of the changes in the peers (namely the peers that were added or removed) if we passed the IP addresses to the application at startup. How can the groupcache instances discover other running instances and monitor for changes in peers?
We leveraged Kubernetes features to track groupcache peers to overcome this obstacle. Each Kubernetes Pod is an instance of the application, and since groupcache runs within the application, each Pod also serves as a groupcache peer. We deploy the Pods with application-specific labels and namespaces, which enables us to track changes (creates, updates, and deletes) with the watch API. By tracking the Pods, we can add the newly created or updated Pods to our groupcache peers and remove the deleted Pods. Each instance scans for changes in the Pods and gets the IP addresses of these Pods. After acquiring the IP addresses, groupcache instances can request data from peers and access the distributed cache.
With groupcache running without separate servers and Kubernetes tracking Pod changes, we have one more benefit: we can horizontally scale the application. Groupcache uses consistent hashing, so we can easily add more peers without significantly impacting performance and use the cache that already exists. Additionally, the allocated memory for the groupcache cache is divided and distributed across all instances. As we get more requests and need to handle the increased load, we can deploy additional Pods without incrementing the memory for each Pod. This keeps Pod size to a minimum while still affording us the ability to cache the data and handle the increased volume.
While groupcache has quite a few improvements from memcached, there is one area where it kind of falls short: groupcache doesn’t support versioned values. Without versioned values, a key/value pair is immutable. While this is great for ensuring that the cache isn’t overwritten, it becomes a problem when the cached data changes; we would keep serving stale data until the cache is evicted from groupcache using the least recently used (LRU) policy. While we can temporarily serve stale data when the metadata occasionally changes, we need a technique to invalidate the keys based on time so that we don’t serve stale data forever.
We came up with a cache key creation technique that involves adding a caching window to the cache key to invalidate the keys over time. This solution is based on an expiration period constant, which acts as the maximum amount of time that the cache key would be used in the cache before expiring. We added jitter to the cache window to spread out the cache expiration and prevent all the requests from going to the database at the same time, because all keys would be invalidated at the beginning of each cache window. The jitter and the cache window number were calculated as follows:
jitter = key % expiration_period
cache_window_number = time.Now() + jitter / expiration_period
We can perform mathematical operations on our key, because it is a universally unique identifier (UUID) consisting of 32 hexadecimal digits. We use the modulus operation to calculate the jitter, so that we always get a number less than the maximum caching time. This technique calculates the same jitter every time that we get a request for the same cache key and changes the jitter based on the key. The jitter is then used to determine the caching window number.
The cache window number is appended to the end of the original key (key_cache-window-number), and the appended string acts as the cache key. By utilizing the current time, we ensure that the same cache window number is calculated for some period of time. By design, the cache window number quotient changes after enough time has passed, resulting in a new cache key (key_cache-window-number-2). The new cache key causes a cache miss, forcing the request to go to the database for data. The constant periodic expiration of cache keys ensures that we are continually renewing our cache and not serving stale data.
You might be wondering how our performance changed after implementing groupcache, inserting the groupcache peer discovery, and utilizing the cache window solution to create cache keys. To measure the difference in performance, we decided to use the orijtech fork of groupcache, because this fork has OpenCensus metrics and other improvements.
The figure above captures the moments after deploying our application with groupcache to production, when we first started seeing requests with latency below 1 ms. Following the traces for these submillisecond requests confirmed that the data being served was from the cache instead of the database. Cache misses still took place, as seen by the higher latency requests. However, the cache-filling mechanism stored the data from the cache misses in the cache and served that data for future requests. We also witnessed the cache-hit ratio rise as the overall cache size increased. Observing this behavior over a few hours also proved that the cache keys weren’t expiring at the same time with our cache window solution. We also noticed that when the request volume decreased, the cache size decreased, as the LRU policy evicted the cache that we were no longer using.
Overall, groupcache is a great solution for implementing our capacity and latency caching needs and maintaining our performance with increasing request volume. But we also see where the implementation can stand a little improvement.
We believe in contributing to open-source projects, so we decided to fork from orijtech and create our very own Galaxycache. Be on the lookout for a post about this, which adds new features to groupcache and improves usability and configurability.
Do you like working on intriguing projects? We’re hiring!