Understanding The Memcached Source Code-Event Driven III

Published in

source code

6 min readJun 2, 2019

Slab allocator is the core module of the cache system, which largely determines how efficient the bottleneck resource, memory, can be utilised. The other 3 parts, namely, LRU algorithm for entry expiration; and an event driven model based on libevent; and the consistent harsh for data distribution, are built around it.

Slab I
Slab II
Slab III
LRU I
LRU II
LRU III
Event driven I
Event driven II
Event driven III (this article)

We continue examining the other two operations, i.e., create and delete, in the event-driven context. Now it’s a good chance to revisit the core data structure and look at the

Properties in discussion

wbuf - the address for write buffer for simple response output (e.g., STORED). used by out_string

wcurr - not very useful as it points to the same address as wbuf. used by conn_write

wsize - the total size of write buffer. used by out_string to determine buff overflow

wbytes - the length of data populated in write buffer. value is set in out_string; used by conn_write when writing it to the "real" output buffer iov

write_and_go - set to conn_new_cmd in the very last step to form a "state loop"

ritem - one of the essential properties for data reading. it is set to the address of the data portion of the actual item created by create command

rlbytes - one of the essential properties for data reading. it is set to the length of the data in need

item - record of the actual item created by create command

noreply - determined by the command. we assume it is set to false

Create

As usual, we start with a command sent to a Memcached server.

> add test 0 60 11 (\r\n)
> hello world

As mentioned in LRU III, two passes are involved in the command processing, the first pass creates an empty object after reading the first line, and the second populates the object with the concrete value contained in the second line. Such division is for the multiple I/O triggered by line breaks in telnet.

In fact, most of the logic involved in this post has been discussed before such as in LRU III and Event Driven II. Hence this post will only resolve the missing parts and linking points.

For the first command,

> add test 0 60 11 (\r\n)

The Memcached instance outputs the following lines. This time we omit the output for accepting the new connection

27: going from conn_new_cmd to conn_waiting
27: going from conn_waiting to conn_read
27: going from conn_read to conn_parse_cmd
27: Client using the ascii protocol
<27 add test 0 60 11
27: going from conn_parse_cmd to conn_nread

The logic for command reading and parsing ( conn_new_cmd to conn_parse_cmd) are the same as what described in Event Driven II. The difference is that

process_update_command

is invoked after the command parsing. Though the method has been examined in LRU III, it is worth reminding that the last step is to update the session context for the next state ( conn_nread) which handles the actual data reading.

static void process_update_command(conn *c, token_t *tokens, const size_t ntokens, int comm, bool handle_cas) {
... // LRU III
    c->item = it;
    c->ritem = ITEM_data(it);
    c->rlbytes = it->nbytes;
    c->cmd = comm;
    conn_set_state(c, conn_nread);
}

Next we look at the second command

...
> hello world

and its associated output

> NOT FOUND test
>27 STORED
27: going from conn_nread to conn_write
27: going from conn_write to conn_new_cmd
...

The key code fragment for the state switching above is

conn_nread

...
static void drive_machine(conn *c) {
...
        case conn_nread:
            if (c->rlbytes == 0) { // scr: ---------------------> 5)
                complete_nread(c);
                break;
            }

...// scr: error handling

            /* first check if we have leftovers in the conn_read buffer */
            if (c->rbytes > 0) { // scr: -----------------------> 1)
                int tocopy = c->rbytes > c->rlbytes ? c->rlbytes : c->rbytes;
                if (c->ritem != c->rcurr) {
                    memmove(c->ritem, c->rcurr, tocopy);
                }
                c->ritem += tocopy;
                c->rlbytes -= tocopy;
                c->rcurr += tocopy;
                c->rbytes -= tocopy;
                if (c->rlbytes == 0) {
                    break;
                }
            }

            /*  now try reading from the socket */
            res = read(c->sfd, c->ritem, c->rlbytes); // scr: --> 2)
            if (res > 0) {
...// scr: stat
                if (c->rcurr == c->ritem) {
                    c->rcurr += res;
                }
                c->ritem += res;
                c->rlbytes -= res;
                break;
            }
            if (res == 0) { /* end of stream */ // scr: --------> 3)
                conn_set_state(c, conn_closing);
                break;
            } // scr: ------------------------------------------> 4)
            if (res == -1 && (errno == EAGAIN || errno == EWOULDBLOCK)) {
                if (!update_event(c, EV_READ | EV_PERSIST)) {
...// scr: error handling
                }
                stop = true;
                break;
            }
...// scr: error handling
            break;
...

1) Check if there are some leftover data (from the command read phase). If so, read directly.

More specific, say, if you can enter the command above fast enough (maybe copy paste it directly to telnet), the data portion will be coalesced in with the command by read.

2) Read the data to the memory pointed by ritem.

3) If the connection is closed ( FIN) in the middle of the read, close the session.

4) If the data is separated into multiple reads, then set the drive machine to listen to more data and suspend.

5) Normal termination — read finished, call complete_nread which is covered in LRU III.

The missing part in LRU III is out_string. Combined with conn_write, it functions as a simpler version of process_get_command for "simple response", and the actual data writing is handled by conn_mwrite (note the fall through... in the switch case) block which, as discussed, changes the state back to conn_new_cmd. Next we discuss the process in detail.

out_string

static void out_string(conn *c, const char *str) {
    size_t len;
    
...//scr: not applicable
    
    if (settings.verbose > 1)
        fprintf(stderr, ">%d %s\n", c->sfd, str);
        
    /* Nuke a partial output... */
    c->msgcurr = 0; // scr: ---------------------------> 1)
    c->msgused = 0;
    c->iovused = 0;
    add_msghdr(c);
    
    len = strlen(str); // scr: ------------------------> 2)
    if ((len + 2) > c->wsize) {
        /* ought to be always enough. just fail for simplicity */
        str = "SERVER_ERROR output line too long";
        len = strlen(str);
    }   
    
    memcpy(c->wbuf, str, len); // scr: ----------------> 3)
    memcpy(c->wbuf + len, "\r\n", 2);
    c->wbytes = len + 2;
    c->wcurr = c->wbuf;
    
    conn_set_state(c, conn_write); // scr: ------------> 4)
    c->write_and_go = conn_new_cmd; // scr: -----------> 5)
    return;
}

1) Initialize the iov. the mechanism and add_msghdr has been discussed in the last post

2) Calculate string length, and be paranoid for survival.

3) Populate wbuf with the output string and point wcurr to wbuf.

4) Indicate the next state conn_write.

5) Set the last state to conn_new_cmd and form the "loop".

conn_write

...
        case conn_write:
            /*
             * We want to write out a simple response. If we haven't already,
             * assemble it into a msgbuf list (this will be a single-entry
             * list for TCP or a two-entry list for UDP).
             */
            if (c->iovused == 0 || (IS_UDP(c->transport) && c->iovused == 1)) {
                if (add_iov(c, c->wcurr, c->wbytes) != 0) { //scr>1)
                    if (settings.verbose > 0)
                        fprintf(stderr, "Couldn't build response\n");
                    conn_set_state(c, conn_closing);
                    break;
                }
            }

            /* fall through... */
        case conn_mwrite:
...// scr: discussed
            switch (transmit(c)) {
            case TRANSMIT_COMPLETE:
                if (c->state == conn_mwrite) {
...// scr: discussed
                } else if (c->state == conn_write) {
                    if (c->write_and_free) {
...// scr: not applicable
                    }
                    conn_set_state(c, c->write_and_go); // scr: > 2)
                } else {
...// scr: discussed
                }
                break;
...

1) Add the content of “write buffer” to the actual iov.

2) Pick up the write_and_go (i.e., conn_new_cmd) and set it to the next state.

Next, we send the

Delete

command to delete the entry we just added.

> delete test

And the output this time is

28: going from conn_read to conn_parse_cmd
<28 delete test
> FOUND KEY test
>28 DELETED
28: going from conn_parse_cmd to conn_write
28: going from conn_write to conn_new_cmd
28: going from conn_new_cmd to conn_waiting
28: going from conn_waiting to conn_read

Like add the entry point of this command (after parsed) is process_delete_command which has been fully covered in LRU III. Furthermore, out_string is called within process_delete_command to trigger the state switches from conn_write, and back to conn_new_cmd.

To go

Originally published at https://holmeshe.me on June 2, 2019.