Choosing “Some C++” Over C

This article has been updated. Details are at the bottom.

Or, maybe I should have called this “C++ for Lovers of C,” as this is a call to maintain straightforward, easy-to-reason-about approaches while we solve complex problems. I want the ability to reason — without hand waving — about the code I write, and I find C falling short.

Traditional C++ Earned Its Bad Reputation

The problem with just picking C++ (in an unqualified way) is that most criticism of it has been legitimate, at least for traditional C++. Whether it was the obsession in the ’90s with object orientation and exceptions or the template errors that often took up an entire terminal window, there are rough edges to both the early C++ standards and how people used them. When someone’s experience is with ’90s C++, this comes to mind:

A massive template error from building a C++ project. Image Source

But, these rough edges (and intractable template errors) are avoidable, unlike the problems in C that get worse with modern event and library programming. Let’s look at how C++ can fix the weakest parts of C without implementing any classes, exceptions, or complex templates in your code.

Libraries and Event-Driven Architecture

I don’t need to make the case for using events in application design; software like nginx, Varnish, and curl have become indispensable parts of the modern web.

Let’s use curl, one of the most popular HTTP client libraries, as an example for how C’s limitations are bad in event-driven designs. If a program wants to make a request with libcurl and use a callback function to handle the result (along with other data), the process is:

  1. Pack any necessary state for the callback function into a struct.
  2. Initialize a curl handle.
  3. Set the handle to use the callback function via a function pointer.
  4. Set the handle to send a pointer to the struct. The handle expects a void *, so there’s an implicit cast from the struct’s pointer type to, effectively, nothing.
  5. Have the callback function receive the pointer and cast it back to the type of the struct.

If you’d like to see a full demonstration — including the same limitation in expat — look no further than an example I wrote that ships with curl. Here’s another example that doesn’t even involve a library.

In an effort to support generic callback functions with “user data,” type safety dies in the process. If I change what type the callback expects without changing the type of pointer I set on the curl handle, I get undefined behavior (and probably crashes).

C++ has better answers, all of which provide type safety. Here’s an example using lambdas to handle both user data and data from the library’s callback:

#include <iostream>
#include <string>
#include <vector>
#include <functional>
// Library code
using library_callback_t =
std::function<void(const std::string& event_data)>;
void library_func_with_callback(library_callback_t cb) {
std::string lib_data("hello");
cb(lib_data);
}
// Program code
typedef std::vector<int> user_data_t;
void my_callback_impl(user_data_t& user, std::string from_lib) {
user.push_back(from_lib.size());
std::cout << user.size() << ',' << from_lib << '\n';
}
int main() {
user_data_t user;
  // Create the lambda:
auto my_callback = [&] // <- Use a ref for local vars, like "user"
// The rest is similar to a normal function
// but with no name and an "auto" return type:
(const std::string& event_data) {
my_callback_impl(user, event_data);
};
// my_callback now matches library_callback_t, which requires
// a std::function with one parameter of type "const std::string&"
// and a return type of "void".
  user.push_back(1);
library_func_with_callback(my_callback);
return 0;
}

The output (with the & in the lambda’s [] making user a reference):

2,hello

Of course, you could also just write the full callback handler into the lambda, but the implementation above allows a straightforward conversion from the traditional “function pointer and void * for user data” of C.

Scope and Cleanup

An illustrated broom inside curly brackets. Image Source

Many modern C applications already avoid memory leaks by using non-standard variable attributes like cleanup. They’re so awkward to directly use that programs resort to deep stacks of macros to automatically free memory when variables go out of scope.

Even then:

  • Forget about handling nested allocations (like containers) generically; the limitations of macros and C types means there has to be a macro for each permutation of container and content, like _cleanup_linked_list_of_strings_.
  • The macro has to be employed every time a pointer gets instantiated:
    _cleanup_free_ char *e = NULL;
  • There’s no guarantee that the pointer is allocated (or even set to NULL) when it runs cleanup. Cleanup can check for NULL, but what if it’s not initialized at all? It’s easier to solve initialization and cleanup together.
  • It only works with Clang and GCC. If you want your code to work with other compilers, you have to switch between manual and automatic cleanup.

C++, of course, has destructors, which address all of the concerns above with no macros, no boilerplate, and standardization among compilers.

Allocation Ownership and References

Destructors and cleanup only handle specific, simpler cases. Things get much muddier with libraries, where both the program and library may touch the same data, facing these challenges:

  1. Moving ownership (and freeing) of data from the caller to the library.
  2. Moving ownership from the library to the caller, often requiring the caller to use a bespoke function to free it.
  3. Distributing ownership between parts of the library and caller. Requiring an implementation of explicit reference counting, which callers also have to participate in.
  4. None of the above, as when data is passed by reference.

All of these challenges exist in large, modular applications as well. The solution in C usually ends up being a combination of:

Reasoning About Hand-off

Two relay participants hand off a baton. Image Credit

C++ used to be nearly as bad until C++11 landed and added move semantics (which C++14 improved on further). For these examples, I’ll use std::string, but any type will work (assuming it allows the necessary move and copy operations, which any simple class or struct will).

You’ll need these includes for the examples below:

#include <iostream>
#include <memory>

1. Hand-off from Caller to Function

In modern C++, a function can explicitly take over ownership:

void library_func(std::string give_it_to_me) {
std::cout << give_it_to_me << '\n';
}

A caller can then, without “copying,” hand off its object. (Internally, this rips out the guts of my_giveaway and transplants them to a freshly constructed give_it_to_me. Themy_giveaway object is a valid std::string — but a useless one for most purposes.)

int main() {
std::string my_giveaway("hello");
library_func(std::move(my_giveaway)); // Hands off my_giveaway
return 0;
}

At a lower (pointer) level, this can also happen with std::unique_ptr:

void library_func_p(std::unique_ptr<std::string> give_it_to_me) {
std::cout << *give_it_to_me << '\n';
}

I can now allocate data behind that (smart) pointer and hand it off:

int main() {
std::unique_ptr<std::string> my_giveaway
= std::make_unique<std::string>("hello");
library_func_p(std::move(my_giveaway));
return 0;
}

Or, because we can rely on std::make_unique to return the type we want, it can be less messy:

int main() {
auto my_giveaway = std::make_unique<std::string>("hello");
library_func_p(std::move(my_giveaway));
return 0;
}

2. Hand-off from Function to Caller

So, you want your caller to take over responsibility for something you’ve allocated? No problem (and an opportunity to show C++14 support for auto return values):

auto library_func_ret() {
return std::make_unique<std::string>("hello");
}

The caller looks like this:

int main() {
auto my_value = library_func_ret();
std::cout << *my_value << '\n';
return 0;
}

Not only do we avoid copying around the data inside retval and my_value, but C++ copy elision actually constructs the std::unique_ptr<my_data_t> within main() and works with it in library_func(). This is super efficient, easy to read, and has a compiler-enforced lifetime for the allocation.

3. Reference Counting

Sometimes, ownership can’t be as simple as a baton passing from function to, say, a library. You can’t entirely avoid the problems of reference counting (or, if you prefer, garbage collection), but it’s no good for every library and application to invent its own solution, often ones that are manual and clunky.

Modern C++ ties reference counting into copy and destructor operations, allowing data to flow between libraries and programs without constant reinvention and re-implementation.

A library can allocate/instantiate some data, track it internally, and return it to a caller:

auto library_func_ret2() {
auto retval = std::make_shared<std::string>("hello");
track_instances(retval); // Adds to a container the library uses.
return retval;
}

If we continue to use auto, the caller doesn’t change:

int main() {
auto my_value = library_func_ret2();
std::cout << *my_value << '\n';
return 0;
}

The data will get freed once there are no more references. To avoid circular references, there’s std::weak_ptr, which allows referencing a std::shared_ptr.

4. No Actual Hand-off

This isn’t a major challenge in C, but the C++ concept of references is just more semantic. It’s more semantic because, when it’s appropriate to use, the goal is aliasing access to the same data (however is most efficient), not forcing indirection through a pointer (as C does). Sure, a C compiler can look at a pointer being passed, determine that it’s only dereferenced, and optimize, but isn’t that kind of silly? It’s also easy to stumble into undefined behavior when using C-style const pointers, let alone the constant checking for NULL values.

A Precedented Change

Adopting a subset of C++ to smooth out C’s rough edges happened for GCC in 2013. A core member of PostgreSQL has asked if the same would be good for their own work. Other projects explicitly use C++ but establish a restricted subset.

In any case, there aren’t advantages to using “plain old C” versus C with some C++ features, especially if the code uses GCC/clang extensions that shed any portability benefits of C90 or C99.

Updates

  • Update 2017–04–03: Use a lambda (instead of std::bind) for the callback example.
  • Update 2017–04–02: Stylistic improvements using comments from Reddit.