Understanding The Memcached Source Code — LRU I

Published in

source code

6 min readDec 12, 2018

Slab allocator is the core module of the cache system, which largely determines how efficient the bottleneck resource, memory, can be utilised. The other 3 parts, namely, LRU algorithm for entry expiration; and an event driven model based on libevent; and the consistent harsh for data distribution, are built around it.

Slab I
Slab II
Slab III
LRU I (this article)
LRU II
LRU III
Event driven I

More often than not, the LRU algorithem is combined with a hash map, and is referred to as a

LRU Cache

In a LRU-cache, the hash map enables the fast accessing of the cached objects; and LRU avoids the cache to grow infinitely by marking expired, or so called, least recently used objects. Next we look at how LRU works from a high level standpoint.

Linked List

Technically, LRU algorithm works on a linked list, whenever an list entry is used (accessed or updated), it is removed from the list and be attached to the list head. In this way, the closer an element is to the list tail, the less recently used it is. Hence it is easy to remove irrelevant or “expired” elements from the tail based on certain configuration.

Harsh map

Linked list is slow when it comes to element access, hence another data structure is employed. We have seen how linked list “strings” chunks in slabs to make free lists. In an LRU cache, the mechanism is similar, however, it is the hash map entries instead of chunks in slabs got wired up this time, which looks like:

We can also flatten the linked list, and make the structure a bit more clear,

Core data structure — item

typedef struct _stritem {
    /* Protected by LRU locks */
    struct _stritem *next;
    struct _stritem *prev;
    /* Rest are protected by an item lock */
    struct _stritem *h_next;    /* hash chain next */
    rel_time_t      time;       /* least recent access */
    rel_time_t      exptime;    /* expire time */
    int             nbytes;     /* size of data */
    unsigned short  refcount;
    uint8_t         nsuffix;    /* length of flags-and-length string */
    uint8_t         it_flags;   /* ITEM_* above */
    uint8_t         slabs_clsid;/* which slab class we're in */
    uint8_t         nkey;       /* key length, w/terminating null and padding */
    /* this odd type prevents type-punning issues when we do
     * the little shuffle to save space when not using CAS. */
    union {
... // scr: cas
        char end; // scr: flexible array member indicating the item header "end"
    } data[];
    /* if it_flags & ITEM_CAS we have 8 bytes CAS */
    /* then null-terminated key */
    /* then " flags length\r\n" (no terminating null) */
    /* then data with terminating \r\n (no terminating null; it's binary!) */
} item;

Properties in use

next, prev, slabs_clsid - item_link_q, item_unlink_q
it_flags - do_item_link, do_item_unlink
time, refcount - do_item_link
h_next - assoc_insert, assoc_delete
nkey - assoc_delete

Memory layout of an item chunk

We have mentioned item chunk in do_slabs_free. With the help of this data structure, we can now examine the chunk more closely.

Next we read the relevant code that performs the above discussed LRU operations.

do_item_link

int do_item_link(item *it, const uint32_t hv) { // scr: --------> 1)
...
    it->it_flags |= ITEM_LINKED;                // scr: --------> 2)
    it->time = current_time;

... // scr: stat

    /* Allocate a new CAS ID on link. */
... // scr: cas
    assoc_insert(it, hv);                       // scr: --------> 3)
    item_link_q(it);                            // scr: --------> 4)
    refcount_incr(&it->refcount);               // scr: --------> 5)
... // scr: stat

    return 1;
}

1) hv is supposed to be the shortened “hashed value”.

2) Set ITEM_LINKED in it->it_flags, and set current time to it->time.

The field it_flags is used in do_slabs_free and do_slabs_alloc

3) Insert the item to hash map.

4) Insert the item to linked list.

5) Increase the reference count.

This field is initialized as 1 in do_slabs_alloc

It is worth noting here that reference count indicates how many sub-modules are using the same resource, so as to determine when to actually deallocate the resource (In this particular case, item is referred by both slab and LRU). I have written this article that explains the a similar mechanism of C++.

item_link_q — add to linked list

item_link_q is a thread safe wrapper of the work horse method do_item_link_q.

static void do_item_link_q(item *it) { /* item is the new head */
    item **head, **tail;
    assert((it->it_flags & ITEM_SLABBED) == 0);

    head = &heads[it->slabs_clsid];           // scr: ----------> 1)
    tail = &tails[it->slabs_clsid];
    assert(it != *head);
    assert((*head && *tail) || (*head == 0 && *tail == 0));
    it->prev = 0;                             // scr: ----------> 2)
    it->next = *head;
    if (it->next) it->next->prev = it;
    *head = it;
    if (*tail == 0) *tail = it;
    sizes[it->slabs_clsid]++;                 // scr: ----------> 3)
    return;
}

1) Get the head and tail of the respective LRU linked list indicated by slabs_clsid. Note that the slabs_clsid is salted with the type of the queue, hence each slab group may enlist multiple lists.

2) Standard operations of “adding an element to the front”.

3) Increase the global array sizes.

assoc_insert — add to hash map

int assoc_insert(item *it, const uint32_t hv) { // scr: again, hv -> hash value
    unsigned int oldbucket;

... // scr: expanding related operations
    } else {
        it->h_next = primary_hashtable[hv & hashmask(hashpower)]; // scr:  1)
        primary_hashtable[hv & hashmask(hashpower)] = it;         // scr:  2)
    }

... // scr: expanding related operations
}

1) Deal with potential conflict. If there is no, the h_next will be set to null.

2) Set the item to the bucket in primary_hashtable.

The expanding logic omitted here will be covered in following articles.

do_item_unlink

void do_item_unlink(item *it, const uint32_t hv) {
    MEMCACHED_ITEM_UNLINK(ITEM_key(it), it->nkey, it->nbytes);
    if ((it->it_flags & ITEM_LINKED) != 0) {
        it->it_flags &= ~ITEM_LINKED;         // scr: ----------> 1)
... // scr: stat
        assoc_delete(ITEM_key(it), it->nkey, hv); // scr: ------> 2)
        item_unlink_q(it);                    // scr: ----------> 3)
        do_item_remove(it);                   // scr: ----------> *)
    }
}

1) Clear ITEM_LINKED in it->it_flags.

2) Remove the item from hash map.

3) Remove the item from linked list.

*) The actual releasing of an item will be covered in the next article.

item_unlink_q — remove from linked list

Likewise, item_unlink_q is a thread safe wrapper of the work horse method do_item_unlink_q.

static void do_item_unlink_q(item *it) {
    item **head, **tail;
    head = &heads[it->slabs_clsid];           // scr: ----------> 1)
    tail = &tails[it->slabs_clsid];

    if (*head == it) {                        // scr: ----------> 2)
        assert(it->prev == 0);
        *head = it->next;
    }
    if (*tail == it) {
        assert(it->next == 0);
        *tail = it->prev;
    }
    assert(it->next != it);
    assert(it->prev != it);

    if (it->next) it->next->prev = it->prev;
    if (it->prev) it->prev->next = it->next;
    sizes[it->slabs_clsid]--;                 // scr: ----------> 3)
    return;
}

1) Same, get the head and tail of the respective LRU linked list indicated by slabs_clsid.

2) Standard operations of “removing an element from a linked list”.

3) Decrease the global array sizes.

assoc_delete — remove from hash map

static item** _hashitem_before (const char *key, const size_t nkey, const uint32_t hv) {
    item **pos;
    unsigned int oldbucket;

... // scr: expanding related operations
    } else {
        pos = &primary_hashtable[hv & hashmask(hashpower)]; //scr:1)
    }

    while (*pos && ((nkey != (*pos)->nkey) || memcmp(key, ITEM_key(*pos), nkey))) {
        pos = &(*pos)->h_next; // scr: -------------------------> 2)
    }
    return pos;
}

...

void assoc_delete(const char *key, const size_t nkey, const uint32_t hv) {
    item **before = _hashitem_before(key, nkey, hv);

    if (*before) {
        item *nxt;
...
        nxt = (*before)->h_next; // scr: -----------------------> 3)
        (*before)->h_next = 0;   /* probably pointless, but whatever. */
        *before = nxt; // scr: ---------------------------------> 4)
        return;
    }
    /* Note:  we never actually get here.  the callers don't delete things
       they can't find. */
    assert(*before != 0);
}

1) Get the hash bucket using hv.

2) Go through the conflict chain and compare the key. Note that the result value is the address of the next member of the element before the found one. When there is no conflict, the address is the bucket itself.

3) Set the next element after the found one to temporary variable nxt.

4) Update the next member of the element before the found one.

Take home

Try this.

Originally published at holmeshe.me.