A way to minimize errors and make your C code easier to read — The less trivial parts

Markus Gothe
HiMinds
Published in
13 min readAug 7, 2020

In the previous article, I went through some trivial, but not obvious, examples even if the examples related to the stack and the page size might have more in common with the following ones.

This time I will provide the reader with more understanding of what can go wrong with even basic idioms, in not so obvious ways, and give some more backgrounds on related problems and mistakes. I recommend reading the previous article before reading this to get a grip on the outlined principles and backgrounds.

Since much of the article will focus on standard C libraries and differences in them, it really helps to understand with at least 3 to 5 years of professional (or equivalent) experience; preferably coding on multiple software platforms.

Let’s start!

Compiler flags and linters

Make sure that your own code passes an aggressive use of compiler warnings, e.g. ’gcc -Wall -Wextra -Werror’ and optionally ’-Wpedantic’. The latter option is if you like to inflict pain in your soul and have unnecessary troubles at compile-time. Usually, you want to turn off certain, less serious, warnings as well via the ‘-Wno-’ prefix, e.g. ‘-Wno-pointer-sign’ to turn off pointer signedness warnings.

Using the compiler to find potential bugs have become popular with the use of clang which has a great linter built-in, clang-tidy. More recently with GCC 10 there does exist an a static analyzer which is invoked by ’-fanalyzer’. However, your mileage may vary since it’s new and mostly untested. If the compiler provides a linter my general standpoint is to use it when it’s possible. Cross-compiling and such usually require toolchain-support, otherwise, at least the clang-tidy linter is very helpful in finding potential issues. There are half a dozen of linters built upon clang’s LLVM as well, notably IKOS from NASA and Facebook’s infer.

A platform-independent linter is cppcheck, which is decent and simple to use but gives a lot of false positives and has got some internal bugs when handling complex preprocessor macros. Try it and see if you like it or not, it’s usually better than nothing.

There are commercial alternatives like Coverity, CodeSonar and PVS-Studio. In general, they are much more reliable for finding bugs than open-source alternatives, but the licensing tend to be semi-expensive. Depending on how critical bugs are the cost of licensing might be money well spent, especially for a larger company.

If working with third party code I advise against using too strict compiler flags/linting, as you might perceive something as a bug that is not or add a bug when trying to fix it the warning. If the code is well-tested and robust, the old saying “if it ain’t broke, don’t fix it” makes perfect sense.

The importance of endianness

Sooner or later when coding network-related code one will be aware of “endianness” or byte-ordering as it’s also called. Especially if one uses an x86 CPU as the hardware platform and then wants to interface the machine and the code with a network appliance. If we don’t account for endianness then we will be the guy who broke the expected behaviour and made interoperability difficult. We don’t want to be that guy.

To give a brief explanation of what endianness is I will begin with saying it’s how the CPU represents the memory, it’s a hard-coded wiring how it will interact with the system memory and then internally representing it in terms of bytes.

On what’s called a “big-endian” system the following, hexadecimal digit, ’0x12345678’ will be represented as is, but on a “little-endian” system it will be represented as ’0x78563412’, usually this is transparent to the end-user and programmer. However when using applications that interfaces two different endianness the problem becomes obvious; hence the convention that all “little-endian” systems must convert all data that are not represented as bytes (which are opaque by definition in this context) to “big-endian”. There are examples of formats where you do it the other way as well; notably the MAC-layer in the 802.11 standards for WiFi.

One should also be aware of the old and obscure “PDP-endian” for historical reasons. The hexadecimal digit above would be represented as ‘0x34127856’ in “PDP-endian”. This shows that the hard-coded wiring can be quite arbitrary selected by the hardware designers. To make things easier, avoiding some of the related issues, virtually all hardware vendors use either “little-endian” or “big-endian” for 40 years. Some chipsets are being capable of both (known as “bi-endian”) and are configured for a specific endianness via hardware layout.

Fortunately there are the standard functions for doing this conversion from and to “big-endian” in C, namely ntohs()/htons() for 16-bit numbers and ntohl()/htonl() for 32-bit numbers. They are defined according to the CPU-endian by the toolchain and are very convenient for making sure data always is in “big-endian” and convert from “big-endian” to “little-endian”. Note that you cannot use them to convert from or to “little-endian”, you will have to implement the same logic with bit-shifting or with the BSD-specific function pairs (also available in most Linux C libraries): htole16()/le16toh() and htole32()/le32toh().

When using the ntohX()/htonX() function pairs the trick is to read them as “network to host” and “host to network” followed by the type (short or long), this way one will know which one of them to use for the data.

IPv6 is a little bit tricky since it uses 128 bits for addressing and is per definition “big-endian”. The function pairs described above takes 16-bit and 32-bit integers which make sense for IPv4 addresses. However, there is a trick to use the function pairs on IPv6 addresses if necessary. IPv6 addresses are usually defined as a union of four 32-bit integers, eight 16-bit integers and sixteen 8-bit bytes. By accessing the individual 32-bit integers and changing the endian of them individually we are actually able get the correct results. The following example illustrates how to use ntohl() to implement this:

void ipv6_ntohl(struct in6_addr *ipv6_addr){
ipv6_addr->s6_addr32[0] = ntohl(ipv6_addr->s6_addr32[0]);
ipv6_addr->s6_addr32[1] = ntohl(ipv6_addr->s6_addr32[1]);
ipv6_addr->s6_addr32[2] = ntohl(ipv6_addr->s6_addr32[2]);
ipv6_addr->s6_addr32[3] = ntohl(ipv6_addr->s6_addr32[3]);
}

Bit-fields and bit numbering

A related concept to endianness is bit numbering. It’s how the bits in a byte should be packed and accessed. It’s usually opaque since most C functions operate on a byte basis, however, sometimes we need to address individual bits. Traditionally this has been done with bit-shifting the bytes, until the C89 standard. It wasn’t really much of standard but a recognition that structures needed more efficient handling of bits and introduced bit-fields for the purpose; from then on C also had to deal with bit numbering as well. Consider the following example from ‘netinet/ip.h’:

struct iphdr
{
#if __BYTE_ORDER == __LITTLE_ENDIAN
unsigned int ihl:4;
unsigned int version:4;
#elif __BYTE_ORDER == __BIG_ENDIAN
unsigned int version:4;
unsigned int ihl:4;
#else
# error "Please fix <bits/endian.h>"
#endif
uint8_t tos;
uint16_t tot_len;
uint16_t id;
uint16_t frag_off;
uint8_t ttl;
uint8_t protocol;
uint16_t check;
uint32_t saddr;
uint32_t daddr;
/*The options start here. */
};

In the example above we can see that the header length and version can be represented as a byte. When accessing the two different 4-bit fields we don’t want to use complex bit-shifting. Instead, two separate bit-field definitions are defined per supported endianness with the correct order of the bits, which allows the rest of the code to access the individual bits of the byte without caring about the endianness or the bit numbering. The compiler itself will implement the necessary bit-shifting during compilation and reduce the complexity of the code. Some people would, of course, argue that bit-shifting is not complicated at all and can be abstracted with macros. I agree with this criticism to a certain extent, it will, however, require some extra handling of the memory accesses and make the code more difficult to read and modify. It’s not to code for readability.

The example above is to illustrate bit numbering and its implications on bit-fields, but a more practical use of bit numbering can be seen in serialized communication buses together with “bit-banging” techniques where you actually have to keep the order correct. The most common usage is transmitting data over a GPIO or an I2C bus. The less obvious example is from a friend who implemented “frequency-shift keying” modulation via “bit-banging” to encode/decode a protocol sent over traditional telephone lines.

When using bit-fields in C keep in mind that the actual memory layout is not guaranteed to be the same on different compilers. Usually, the same results can be achieved between compilers by explicitly instructing them how to map a structure in memory. You may need to define some preprocessor pragma directives to guarantee the structure is always packed, if that’s desired. Another fact to consider is that a bit-field cannot span across the hardware’s word size. This behaviour might raise some issues when using a 64-bit workstation for development and then trying to run the code on a 32-bit system.

Bit numbering, in general, is always the same for “little-endian” machines, but there do exist some exotic “big-endian” machines which use the same bit numbering scheme as “little-endian” machines. GCC provides definitions for these corner cases. In the worst case, you will have to invoke the compiler with the correct definition, e.g. ‘gcc -D__LITTLE_ENDIAN_BITFIELD=1’. Making the example above slightly more portable would be as follows:

struct iphdr
{
#if defined(__LITTLE_ENDIAN_BITFIELD)
unsigned int ihl:4;
unsigned int version:4;
#elif defined(__BIG_ENDIAN_BITFIELD)
unsigned int version:4;
unsigned int ihl:4;
#else
# error "Please fix <asm/byteorder.h>"
#endif
uint8_t tos;
uint16_t tot_len;
uint16_t id;
uint16_t frag_off;
uint8_t ttl;
uint8_t protocol;
uint16_t check;
uint32_t saddr;
uint32_t daddr;
/*The options start here. */
};

Reentrant functions

Recently, the past 25 years, there has been an increasing demand for reentrant functions and the POSIX standards provides a lot of them as alternatives to the non-reentrant functions, and sometimes they are provided as an extension to the standard C library.

Even if the code you write is single-threaded and non-asynchronous, try using the reentrant variants and familiarize yourself with them. They usually hold an extra argument that you need to pass. When using multi-threaded code always use them, or your results may vary. If calling a function in nested loops the non-reentrant behaviour might show itself as well.

A common example is char *ether_ntoa (const struct ether_addr *addr) with the not so portable GNU extension char *ether_ntoa_r (const struct ether_addr *addr, char *buf). As one can see here the latter variant is the reentrant variant which instead of returning a pointer to a static buffer takes the buffer as an argument, puts the result in it, and returns a pointer to the buffer. This means we can create the following fully reentrant code.

char macbuf[8] = {0};
struct ether_addr *broadcast = ether_aton("ff:ff:ff:ff:ff:ff");
char *macbufptr = ether_ntoa_r(broadcast, &macbuf);

Now the observant reader will notice that it is not fully reentrant! Good catch! So let’s make it fully reentrant once and for all.

char macbuf[8] = {0};
struct ether_addr tmp_broadcast;
struct ether_addr *broadcast = ether_aton_r("ff:ff:ff:ff:ff:ff", &tmp_broadcast);
char *macbufptr = ether_ntoa_r(broadcast, &macbuf);

Finally, we don’t have to worry about multithreading, asynchronous calls and potential concurrency issues. At least not in terms of reentrancy. To further illustrate the problems with non-reentrant functions I will mention the strtok() function as a prime example. Consider the following code:

tokptr = strtok(bufptr, ":");
while(tokptr != NULL) {
subtokptr = strtok(tokptr, "\n");
tokptr = strtok(NULL, ":");
}

This logic is not even possible to implement correctly do due to strtok() being non-reentrant and internally saving the state. So for reentrancy, we need to use strtok_r() which saves the actual state in a separate pointer. A more functional variant of the example would be:

tokptr = strtok_r(bufptr, ":", &saveptr1);
while(tokptr != NULL) {
subtokptr = strtok(tokptr, "\n", &saveptr2);
tokptr = strtok(NULL, ":", &saveptr1);
}

Premature optimizations — A tribute to Donald Knuth

One common micro-optimization used 20–30 years ago, and still today in some cases like compression/decompression of data, is to make a for-loop always compare with zero (which is how branching usually is implemented in hardware and hence in assembly language, comparison with 0).

The statement if(i == 100) will actually compile into:

int tmp_i = i - 100;
if(tmp_i == 0)

This gives us an extra instruction, and if there are many iterations in a loop you might get a noticeable (but really small) overhead, especially on embedded systems.

The common way to optimize this is to take a loop like this:

for(int i = 0; i < 1000; i++) {
...
}

and rewrite it as:

for(int i = 1000; i > 0; i−−) {
...
}

In some cases it does work! But it depends totally on the logic carried out inside the loop and one should avoid these kinds of premature optimizations unless the variable ’i’ is not used inside the loop; any decent compiler these days will be able to figure out if it is safe or not to do this by itself.

likely()/unlikely() or how I learned to stop worrying and love sanity checking

If you look at some Linux kernel code you might see something like:

if(likely(ptr != NULL)) {
...
}

Is this a new keyword in C? One might ask themselves. No, it’s not but it’s a cleverly defined macro. If supported by your version of GCC it is defined as this:

#define likely(x) __builtin_expect(!!(x), 1)
#define unlikely(x) __builtin_expect(!!(x), 0)

Which is a branch prediction hint to the compiler, it might do nothing on certain architectures but for those who support it will optimize the branch as taken or not taken. On the MIPS architecture, these instructions are commonly referred to as “branch-likely” variants meaning the hardware CPU pipeline will try to start fetching the code inside the branch and stall and restart if it is not true.

I’ve received some response to the statement above in relation to the x86 CPU architecture. One reader pointed out that GCC will rearrange the actual assembly output, on the x86 architecture, in order to optimize the CPU-caching mechanisms for the actual branch being taken or not.

It is really difficult to know if the code will benefit for this kind of optimization unless it’s regular sanity checking if it is and the code is time-critical I’d say use the macros otherwise don’t. Compilers these days are pretty advanced and swing modulo scheduling alone might be a more helpful performance tweak, however, you need to explicitly enable it in GCC with the ‘-fmodulo-sched’ option.

A related concept in the Linux kernel is “jump label keys” which is documented at www.kernel.org/doc/Documentation/static-keys.txt.

Dynamic memory — Thanks a heap!

An example of dynamic allocation gone bad:

char *buf = malloc(256*sizeof(char));
*buf = ’\0’;

A valuable lesson on dynamic memory is that one should always check that
malloc()/calloc()/realloc() doesn’t return NULL. They might fail under heavy load and this needs to be handled. It might not be evident on a desktop or on a server, but usually is on an embedded system and especially if the page size is huge. A more correct solution is to write:

char *buf = malloc(256*sizeof(char));
if(buf == NULL) {
return −ENOMEM;
}
*buf = ’\0’;

The same goes for free(), it’s not uncommon to see code like:

free(ptr);
...

A more stable way is to write (note that free() should be able to cope with NULL since C99 but it is not always applicable to embedded implementations):

if(ptr != NULL) {
free(ptr);
ptr = NULL;
}
...

The switch-case idiom

Python lacking the switch-case idiom has led to a greater misunderstanding than ever, many decent programmers believing that it always evaluates to if-else-statements. For GCC this is sometimes true and sometimes it’s not; more important to understand is that is a way to implement both jump tables and (via jump tables) computed goto in C. The properties of computed goto can themselves be used to implement co-routines in C; notably A. Dunkels’ protothread implementation.

So why does this idiom matter? Performance mostly; but as said it is the way to implement co-routines in plain C as well, which is a form of co-operative multitasking. Don’t try to implement it yourself for production use, use the protothread macros instead.

Tom Duff came up with the in-genial solution to interweave the switch-case idiom with the do-while idiom, 40 years ago, and figured out it would really speed up some code. Today memcpy() is typically optimized so the solution doesn’t make much sense for this kind of task. But it’s one of the most beautiful pieces of C code ever written:

send(to, from, count)
register short *to, *from;
register count;
{
register n = (count + 7) / 8;
switch (count % 8) {
case 0: do { *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while ( −−n > 0);
}
}

It’s a weird construct, called “Duff’s device”, that’s not trivial to understand at first sight. What’s actually done is that the code will avoid unconditional branching and reduce the number of instructions via loop unrolling of the inner loop and jumping to the correct label in the unrolled loop. It’s both beautiful and mystic, it’s certainly not coding for readability.

A bonus however with the idiom per se is that, unless you are using it for Duff’s device or co-routines, it will increase the readability of your code as well. I’d say use the idiom as much as you can, but no more!

However if you do use it make sure there is a default-handler and that there is no implicit fall-through by accident (cut ‘n paste, typo, etcetera). Sometimes a fall-through might be justified, just remember to write a comment that it is intended!

switch(c) {
case ’a’:
...
break;
case ’b’:
...
break;
case ’c’: /* fall through */
default:
...
break;
}

The built-in linters in GCC and clang will recognize this comment as well and avoid emitting a warning (or an error when ’-Werror’ is used).

Summary and moving forward

The main ideas from the article can be summarized as follows:

  • Consider the endianness of the hardware architecture
  • Be careful when using dynamic memory allocations
  • A competent compiler, correctly used, is a great friend
  • Always keep reentrancy in mind
  • “Premature optimization is the root of all evil”

The examples used in this article are more complex than the previous and requires some more understanding on how different standard C libraries implement certain functions, usually one takes it for granted that a portable application shouldn’t need to care about this; however, this has been proven wrong over and over during the years in my professional experience.

The next article will provide some examples on par with these and have a focus on multithreading.

--

--

Markus Gothe
HiMinds
Writer for

Avid SGI/IRIX enthusiast and embedded MIPS specialist...