strlen(buf);

Not as simple as you’d think

Webmon
Late Night Programming

--

TLDR; don’t write your own custom libc function replacements unless you really know what you are doing. libc functions are highly optimized to take advantage of cache lines, pipelining and branch predictions.

Let’s take a look at strlen, one of the simplest functions in libc, or so you’d think.

A correct and simple implementation of strlen looks something like this:

size_t
my_strlen(const char* src)
{
const char* s = src;
while (*s) s++;
return (s — src);
}

It looks fine on the surface but performance wise it’s far from optimal. Walking the memory buffer 1-byte at a time is much slower than walking it word at a time.

Here’s the FreeBSD version (glibc’s implementation is similar)

/*
* Portable strlen() for 32-bit and 64-bit systems.
*
* Rationale: it is generally much more efficient to do word
* length
* operations and avoid branches on modern computer systems, as
* compared to byte-length operations with a lot of branches.
*
* The expression:
*
* ((x - 0x01....01) & ~x & 0x80....80)
*
* would evaluate to a non-zero value iff any of the bytes in
* original word is zero.
*
* On multi-issue processors, we can divide the above expression
* into:
* a) (x - 0x01....01)
* b) (~x & 0x80....80)
* c) a & b
*
* Where, a) and b) can be partially computed in parallel.
*
* The algorithm above is found on "Hacker's Delight" by
* Henry S. Warren, Jr.
*/
/* Magic numbers for the algorithm */
#if LONG_BIT == 32
static const unsigned long mask01 = 0x01010101;
static const unsigned long mask80 = 0x80808080;
#elif LONG_BIT == 64
static const unsigned long mask01 = 0x0101010101010101;
static const unsigned long mask80 = 0x8080808080808080;
#else
#error Unsupported word size
#endif
#define LONGPTR_MASK (sizeof(long) - 1)
/*
* Helper macro to return string length if we caught the zero
* byte.
*/
#define testbyte(x) \
do { \
if (p[x] == '\0') \
return (p - str + x); \
} while (0)
size_t
strlen(const char *str)
{
const char *p;
const unsigned long *lp;
long va, vb;
/*
* Before trying the hard (unaligned byte access) way
* to figure out whether there is a nul character, try see
* if there is a nul character is within this accessible
* word first.
*
* p and (p & ~LONGPTR_MASK) must be equally accessible
* since they always fall in the same memory page, as long
* as page boundaries is integral multiple of word size.
*/
lp =(const unsigned long *)((uintptr_t)str & ~LONGPTR_MASK);
va = (*lp - mask01);
vb = ((~*lp) & mask80);
lp++;
if (va & vb)
/* Check if we have \0 in the first part */
for (p = str; p < (const char *)lp; p++)
if (*p == '\0')
return (p - str);
/* Scan the rest of the string using word sized operation */
for (; ; lp++) {
va = (*lp - mask01);
vb = ((~*lp) & mask80);
if (va & vb) {
p = (const char *)(lp);
testbyte(0);
testbyte(1);
testbyte(2);
testbyte(3);
#if (LONG_BIT >= 64)
testbyte(4);
testbyte(5);
testbyte(6);
testbyte(7);
#endif
}
}
/* NOTREACHED */
return (0);
}

The implementation might look like an overkill for something as simple as strlen but once you understand it, it’s pretty straight forward and worth the extra mile.

Implementation details

  1. round down the string ptr to the nearest word.
lp = (const unsigned long *)((uintptr_t)str & ~LONGPTR_MASK);

E.g., if str is 0x800c00007, lp will be 0x800c00000

2. Check if there’s NULL in any of the word’s bytes, somewhere between 0x800c00000 and 0x800c00007, using Hacker’s delight trick per the comment (dry read but very useful book).

va = (*lp - mask01); vb = ((~*lp) & mask80);
if (va & vb) { ... }

If NULL is detected, then check which byte it is in the word and return the length, otherwise step 3. The check is done starting at the original str ptr handed to the function, of course. This means that we can detect a NULL in a byte that’s outside our buffer boundary (because we rounded down) but once we check the buffer we won’t find it, which is fine as it isn’t the common case.

3. Iterate over the rest of the str buffer word at a time, and check if there’s a NULL in the word. Once found, check which byte it is using testbyte MACRO and return the string length found.

That’s it!

Let’s do a quick performance test

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/time.h>
#include <assert.h>
const int BUF_SIZE = 1024 * 1024 * 256;size_t
my_strlen(const char* src)
{
const char* s = src;
while (*s) s++;
return (s - src);
}
int
main(int argc, char** argv)
{
char* buf = malloc(BUF_SIZE);
assert(buf != NULL);
for (int i = 0; i < BUF_SIZE; i++)
buf[i] = 'i';
struct timeval tm1, tm2;
size_t len = 0;
gettimeofday(&tm1, NULL);
len = strlen(buf);
gettimeofday(&tm2, NULL);
long usec = (tm2.tv_sec * 1000000.0 + tm2.tv_usec) - \
(tm1.tv_sec * 1000000.0 + tm1.tv_usec);
printf("strlen took: %ld usec (len: %ld)\n", usec, len);
/* measure my_strlen */
gettimeofday(&tm1, NULL);
len = my_strlen(buf);
gettimeofday(&tm2, NULL);
usec = (tm2.tv_sec * 1000000.0 + tm2.tv_usec) - \
(tm1.tv_sec * 1000000.0 + tm1.tv_usec);
printf("my_strlen took: %ld usec (len: %ld)\n", usec, len);
return 0;
}
strlen took: 54764 usec (len: 268435455)
my_strlen took: 259363 usec (len: 268435455)

Conclusion

Measure twice before you replace a libc function.

--

--

Webmon
Late Night Programming

Website monitoring and escalation service. Monitor your websites and online services, and be the first to know when your service is down.