Implementing Cache With Go

Jaeeun Cho
7 min readJul 22, 2022

--

Previous Story: A Newbie Gopher’s Server Developing Experience

Client-Server Caching

Introduction

Concept

Cache is one of the important concepts in computer science. Think of a component that need some external resources, it request the resources to external source, receive it, and use it. These steps takes some time. When the component needs the resources again, it is inefficient in time-wise to request it again. Instead, the component can save the request results somewhere locally and use it again. Using local data would be always faster than requesting external one. These strategy is basic concept of cache. We can find these examples in memory, CPU cache, and server caches like Redis.

Different Use Cases

Cache in web service is used to reduce latency of data request. Instead of querying DB, it saves the first query execution and use it afterward. There are some different situations for caches depends on the characteristics of data. There can be relatively static data such as stats, calculation result. There also can be frequently changing data such as comment section or SNS.

The best case is caching some rarely changing data. Take monthly statistics. Stats of last month will not be replaced for month. If we cache it, we may not have to check DB for month.

Some Dumb Design

For fast changing data, it’s better to take cautions when there are duplicated servers. Check out above design. Take some comment section service example. There are some case that User A comment something, then A decides to delete the comment, User B tries replies the comment. There are some cases where A and B request to different server. A’s deletion may not be propagated to B’s server’s cache. The result will be something like this; Cache A, Cache B has different data and DB doesn’t know which one is the truth. The integrity of data has broken here.

Better Way

In that case, using single external cache like above picture will do the problems. Redundant servers only uses single cache.

Restrictions

Cache is faster than DB but it is significantly lesser in size. This is because DB stores data in drive, cache stores data in memory. They follows the same characteristics of each. They follows the same characteristic of volatility; If host stops, all data will be lost for cache, but not a single one for DB.

Since cache sits on memory, its space is restricted, and it need to choose which data to cache. Back in CS lecture, I heard some words like LRU(Least Recently Used), LFU(Least Frequently Used), and FIFO(First In First Out). These are the “which one to choose” criterions, and it is called eviction policy.

Design & Implementation

Requirement

  • Key-Value Storage: Cache should provide read function that input key and output value as well as write function that input key, value. These functions should be done in average O(logN) time where N is number of key.
  • LRU Eviction Policy: Since cache is restricted in space, some data should be evicted if cache is full. Implement with LRU algorithm.
  • Time To Live(TTL): Each key-value has TTL. If TTL is passed, it should be evicted.

API Design

Key-Value storage means that, if I request key, cache will return value of those key if exists. Hash-map abstract data type is the same kind. Take an application that provides following API concept.

func Get(key string) (hit bool, value []byte)
func Put(key string, value []byte) (hit bool)
  • Get: A read with key API. If provided key exist in cache, return equivalent value. If not, it just returns hit=false. For LRU policy, the key will be marked the most recently used; that makes this key the most far ordered from eviction.
  • Put: A write with key API. If provided key exist, value will be replaced with new value. If not, it will just create new key-value to storage. The execution of this function can cause overflow since it can add one data. In that case, by the LRU policy, the least recently used key-value will be evicted. The newly added/modified key will be marked the most recently used.

Data Structures

Design Concept

I used two different data structure; hash-map and doubly linked list. This is to implement feature of both key-value read/write and LRU policy.

  • Hash-map: Hash-map is the most widely used key-value data structure. It is provided as ‘map[<type>]<type>’ in Go as out of box data type.
  • Doubly Linked List: LRU cache can be implemented with doubly linked list.

Binding these two data structures can provide both key-value feature and LRU policy. Refer to above picture of design concept. Hash-map’s key would be string key, and value will be pointer to linked list’s node. The node will hold value of the key.

If user called Get(), this cache application will search key in hash-map, follows the pointer to reach one node in linked list, get values, do LRU things and returns the value to user.

Likewise, if Put() is called, it search the key in hash-map, follows the pointer and replace the value, do LRU things, or insert new key to hash-map, and insert new node to linked list.

Concurrency Control

Since cache is designed to be accessed much frequently, there will be multiple access at the same time and there always exists a chance of concurrency problem.

In my design, there exists two different data structures, and it doesn’t always be synchronized. In executions, there is a tiny time gap between modification of hash-map and modification of linked list. Check out following example.

One Case of Concurrency Problem
  1. It starts with this condition: currently cache is full, and least recently used key is 1. That means, if new key is added, key 1 and the equivalent value will be evicted.
  2. User A calls Put() with new key 101. Hash-map checks the key and finds out that 101 doens’t exist. It decides to evict 1 and add 101 to cache.
  3. In the meantime, User B calls Put() with key 1. Hash-map confirms key 1 exists, and decides to modify the value.
  4. A’s call proceeds. It deletes node 1 from linked list, and delete key 1 from hash-map.
  5. Just right after that happens, B’s call tries to access node 1’s address and finds out the address is out of allocated memory. It panics and application goes down.

The most easy way to prevent this from happening is to use Mutex. Checkout following code.

func (s *CStorage) Get(key string) (data []byte, hit bool) {
s.mutex.Lock()
defer s.mutex.Unlock()
n, ok := s.table[key]
if !ok {
return nil, false
}
if n.ttl.Before(time.Now()) {
s.evict(n)
s.size--
return nil, false
}
return n.data, true
}

This code is function definition of Get(). We can see there is mutex lock code in the first line, and mutex unlock code as defer(which is a Go keyword that postpone line execution to the end of this function) in the second line. These codes are applied to all other data storage accessing functions like Put, Delete Clear, and etc.

By using mutex, each execution will not be affected by others, ensuring safe data access.

Time To Live

Currently TTL is implemented as passive method. That means, if data access functions(Get, Put) are executed, it will check TTL is expired and deletes it. Also it means even if node is expired, it will still exists on data structure.

This method doesn’t require a big CPU execution of traversing all nodes periodically, but cache is likely to hold expired values.

Most of the time, it will be okay because expired node is likely to be ‘Least Recently Used’ state. However, it would be great to have functions to cleanup expired nodes through the data structure, so I made RemoveExpired() function.

func (s *CStorage) RemoveExpired() int64 {
var count int64 = 0
for key, value := range s.table {
if value.ttl.Before(time.Now()) {
s.Delete(key)
count++
}
}
return count
}

This function will periodically called to cleanup all expired nodes.

Results

The implemented Go packages can be imported by other Go projects. Also, I made standalone cache application that provides gRPC API. Check out this repository.

Conclusion

I had this great chance to revisit concept of cache, and implement it with Go. Cache is good tool to reduce latency of component. It is restricted in space but it is faster.

Implementing actual cache module is can be done with hash-map and doubly linked list. There was some tricky problems with concurrency, so I had to use mutex. Also, I mixed passive and active way to remove expired data.

As is widely known, it is different just reading concept from textbook then actually implementing it.

The Go stories continues.

Next Story: Weird Go Enum Leads to Lesson: Different Project Structure

References

https://medium.com/@ksholla20/design-and-implement-a-caching-library-e456b0f6449f

--

--