Higher-Order Functions in C

Viktor Daróczi
Nerd For Tech
Published in
12 min readApr 17, 2021

The concept of higher-order functions is key in Functional Programming, but like recursive functions, they are also used to implement certain algorithms, programming patterns, or even writing kernel drivers. Although C is not the language that supports many FP concepts out of the box, higher-order functions can be implemented using function pointers and also with compiler support. In the present article, I attempt to illustrate this with easy to follow examples.

What kind of animal is a higher-order function?

A higher-order function is a function that accepts other functions as arguments or returns a function itself. Since C relies heavily on use of pointers, it also allows such pointers to refer to runnable code chunks like functions. In fact functions themselves are pointers to the code they encompass, so assigning a function to a function pointer is totally valid as we will see. Then such pointers can be passed as arguments to functions, or be returned by functions like any other variables. This is how higher-order functions are implemented in C.

Pointing functions

A function is declared according to the following:

void fun_name(void);
^ ^ ^
| | `----- parameter type
| `-------------- function name
`------------------- return type

This line declares a function called fun_name that takes no parameter, and returns nothing. If it took int and returned double, it would be declared like this: double fun_name(int param); where param is the integer parameter’s name (not mandatory to provide in declarations). Easypeasy.

Now we want a pointer to such a function, because we want to pass it as an argument to other functions, or return them from higher-order functions. Here’s a pointer to our latest function:

double (*fun_ptr)(int param);
^ ^^ ^ ^
| || | `------ parameter name
| || `---------- parameter type
| |`------------------- pointer name
| `-------------------- pointer marker
`---------------------------- return type

That means the declaration of a function pointer is almost the same as that of the function it points to. The only difference is that the pointer name is now parenthesized to avoid confusion with the return type (which could be a pointer itself). It also means that we need to know the type of the pointed function exactly. This provides some type safety.

The function jugglenaut

The previous section showed the type of a function pointer, now let’s see how we add that into the function signature.

void use_fptr(double (*fun_ptr)(int param));
^ ^ ^ ^ ^ ^
| | | | | `------- pointed fn param
| | | | `----------- pointed fn param type
| | | `-------------------- pointer parameter name
| | `----------------------------- pointed fn return type
| `-------------------------------------- function name
`------------------------------------------- function return type

Imagine our parameter list contains two function pointers, then they may look something like this:

void use_fptr(void (*fptr1)(void), void (*fptr2)(void));

Now the tricky part: returning a function pointer. To make things clearer, I assume our function will not take any params, but returns a pointer to a function that takes an int, and returns a double. It can be declared like this:

double (* fn_returns_fptr(void))(int param);
^ ^ ^ ^ ^ ^
| | | | | `------ returned fptr param
| | | | `---------- ret. fptr param type
| | | `----------------- function param type
| | `--------------------------------- function name
| `----------------------------------- ret. pointer marker
`------------------------------------------- ret. fptr return type

In the above examples, all the parentheses are mandatory, otherwise it cannot be told what are the asterisks referring to. So we’re not simply putting the return type to the left of the function name, but because of the return type is a pointer to a function, we put the pointed functions return type to the left, and parameter list to the right, embedding our function into its return type, like an onion. This is already enough to confuse anybody just learning function pointers, but this can be complicated even further. Imagine a function that takes more than one function pointers as arguments, and returns a function pointer, which takes two function pointers as arguments, and returns a function pointer… ad infinitum. A shorter example might suffice:

double (* fun(double (*p1)(int), double (*p2)(int)))(int);

A function called fun that takes two function pointers, each taking an int and returning a double, while itself returning one of those function pointers. Not that fun actually. Imagine it returned a function pointer which takes a function pointer. This is what James Hetfield calls ‘absolute horror’.

Luckily C provides means to simplify this. Type aliases come and save the day.

Distilled function passing

Since functions and pointers are just types, we can also define aliases for them, and later refer to them using that name, instead of the clunky types. In fact the function pointer syntax gets unbearably complicated so easily, that I wouldn’t be surprised having already made mistakes while presenting them. So get rid of that risk, and use type aliases whenever possible. We can create a type alias for our int taking double returning function pointer like this:

typedef double (*int_to_double)(int);
^ ^ ^^ ^
| | || `---- pointed function param type
| | |`------------------- type alias
| | `-------------------- pointer marker
| `---------------------------- pointed function return type
`------------------------------------ type aliasing keyword

This is not a new type, it’s an alias int_to_double to an existing type, which is a function pointer. Then we can use this in parameter lists and as return types, so that James Hetfield doesn’t need to sleep with one eye open. The same terrifying function declaration can be simplified like this:

int_to_double fun(int_to_double p1, int_to_double p2);

Much better. Then we can create more meaningful typedefs like (style intentionally changed to what may be familiar from other modern languages, but that doesn’t make a difference):

typedef int (*IntBiFunction)(int, int);
typedef void (*IntConsumer)(int);
typedef void (*Callback)(void);
typedef bool (*IntPredicate)(int);
typedef int (*IntSupplier)(void);

Passing them around is a breeze:

int op(int a, int b, IntBiFunction fun) {
return fun(a, b);
}
void iter(IntConsumer fun, int x) {
for (unsigned int i = 0; i < x; ++i) {
fun(i);
}
}
callback writeln(char *str) {
return lambda(void, (void), { printf("%s\n", str); });
}

Wait! Lambda? Yepp.

What we can do

We already can do a lot of stuff with our functions and function pointers.

Passing them as arguments:

int add(int a, int b);
int sub(int a, int b);
int result = op(2, 1, add); // => 3
result = op(2, 1, sub); // => 1

Assigning them to (pointer) variables:

IntBiFunction addptr = add;
IntBiFunction subptr = sub;
int result = op(2, 1, addptr); // => 3
result = op(2, 1, subptr); // => 1

Returning them from functions:

IntBiFunction op(char c) {
if (c == '+') {
return add;
} else {
return sub;
}
}

Putting them into arrays:

typedef void (*Action)(void);void pull(void);
void push(void);
void idle(void);
Action actions[] = { pull, push, idle };

Comparing them:

if (fptr == push) {

Etc.

There’s one thing we can’t do though, define them anonymously.

What we also can do

So while we still can’t define functions anonymously, compilers already went forward, and are capable of doing some magic which brings us closer to anonymous and lambda functions. By the way, it’s about time to say a few words about the terminology:

Anonymous functions are functions that are defined without a name, often inline, and therefore sometimes called ‘inline functions’ too, however that’s not exactly the point, and may cause confusion with C’s native concept of inline functions, which is a different kind of animal. The other name for anonymous functions is lambda functions, which comes from the math theory of lambda calculus, where functions have no name.Another term, frequently occurs together with the previous ones, is closure. These are anonymous functions that also bind the scope, so that they capture the variables from their environment in the function body to be used even when it’s called from outside of this scope.The support for assigning functions to variables, passing them as arguments, returning them from functions, and putting them in data structures, especially using lambdas and closures, is called having first-class functions.

Why do we care? Because such functions are more cost efficient being easier to be created — in fact, they can be defined on the fly — and passed to higher-order functions as arguments, which enables writing some algorithms and patterns, while also facilitates parallel computing. Having these concepts at hand, also helps us to write code in the declarative style, stating what we’re trying to achieve instead of giving instructions on how to achieve that. Without higher-order functions, implementing some logic would be more complicated than it’s necessary.

So how it can be done?

GCC’s approach

Using the GNU C compiler, we can actually define functions on the fly when we’re about to call them. But before I show you how, let me tell you about the features it requires. These are non-standard features, that GCC provides:

  1. Statement expressions

The simplest statement expression is something like this:

({ char *s = "Hello"; s; })

Where the brackets are the container for the expression, and the last expression (in our case only the variable s) provides the value that the statement expression evaluates to. The point is, that within the statement expression we can do more than just initializing a variable. Consider the following:

({ char *s; s = calloc(20, sizeof(char)); if (argc > 1) { strcpy(s, "We take no arguments"); } else { strcpy(s, "Welcome to GCC"); } s; })

This is the same expression, but the value it evaluates to, changes dynamically. And as any expression, it can be assigned to variables, or passed as function arguments.

2. Nested functions

“A nested function is a function defined inside another function.” — stated in the GNU compiler docs. We don’t need to know much more than that.

The real goodness comes when we combine these two features:

({ void prn(int x) { printf("x = %d\n", x); } &prn; })
^ ^ ^
| | `------- return value
`----------------------------------------`--------- nested func.

Now the value our statement expression is a pointer to a function we’ve just created nested.

C has more to offer, than it might seem first. Luckily, C supports macros, small replacement rules for the preprocessor, that — among other things — can transform the syntax, we like more, to the one that the compiler can understand. This permits us to write our own implementation of anonymous functions.

It’s not too difficult to see the pattern in the above expression, that we can trust on a macro, to create a statement expression, that returns a pointer to a function that has no name. Or it can call it whatever it likes, we don’t care. That’s our way to get anonymous functions. The original author did a pretty good job on writing that macro, and presenting its usage in this article, yet he received enough angry comments to fill a whole chapter on lambdas, so that I feel the need to clarify a few things here, but I still owe you to quote the macro and the usage here:

#define lambda(lambda$_ret, lambda$_args, lambda$_body)\
({\
lambda$_ret lambda$__anon$ lambda$_args\
lambda$_body\
&lambda$__anon$;
\})

The macro has three parameters, a return type, parameter list (args), and the body of the function, then those are placed into a statement expression, where a pointer to the newly created function will be the value of the expression. We can use this macro as follows:

iter(lambda(void, (int x), { printf("%d\n", x); }), 5);

That means our lambda function has only one parameter x of type int, and returns nothing. Of course the type of our anonymous function needs to match the type of the parameter we pass as. This provides some type safety.

Now the hard part: defending the concept. A few sidenotes on the margin of the comments posted below the original article:

  • If you don’t like lambdas, that’s OK — some don’t like macros either.
  • Not liking something doesn’t make it invalid.
  • If you’ve never used lambdas so far, it’s never too late to start, but…
  • Nobody forces you to use them.
  • Is it possible to use them wrong? Definitely.
  • Should we conceal this technique because it can be used wrong? Definitely not. Even malicious code is better to be exposed.
  • There’s a reason why statement expressions and nested functions are implemented in GCC — use them if you find them useful.
  • Nobody turns into a better developer by not knowing something.

With that in mind, we can proceed to our next section…

Clang’s approach

GNU is not left alone exposed to rants from angry developers because of implementing compiler features that can be used like lambda functions. In fact, Clang goes even further, and provides anonymous functions, that are also capable to capture the context that is actually used in the body, so they can be considered closures!

Let’s see that in action.

typedef void (^Callback)(void);Callback writeln(char *str);
void call(Callback callback);
// ...
Callback my_callback = ^ { printf("I was called!"); };
int y = 5;
iter(^ (int x) { printf("%d + %d = %d\n", y, x, y + x); }, 5);

As we can see the syntax is much cleaner than that of GCC. The key is ^ (=λ, lambda), which tells us we’re dealing with an anonymous function. The return type can be omitted if it’s void, and when it’s omitted, the parameter list also can be omitted if it’s empty — (void). So the syntax is essentially the same as of function pointers, the asterisk replaced with a caret, but we’re not restricted anymore to declare our function first, it can can be defined on the fly. In the last line, we can see how the local variable of y is captured in the function body, and becomes available the time our closure is called from within the iter function, even if it was not in its scope, which is pretty decent.

The feature is called Blocks in Clang, and Apple advocated it. Y tho? They call it ‘a better way to do multicore’. That means they saw the potential in combining the highly demanded features of Functional Programming, that facilitate parallel computation, and the speed and hardware closeness of C.

However, using Blocks outside of macOS is not trivial. BlocksRuntime is usually not part of the Clang (LLVM) distribution, and needs tweaking to enable. Even then it may come with bloated features. For this reason, there exists a debloated clone of the library, which can be downloaded and built separately.

Pushing the limits

These are cutting edge features, which push the limits of the C programming language. At the same time, it also means that their usability is limited to the extent the compiler is supported. That undermines portability, and should not be used in cases where portability is key. Using GCC’s approach requires to turn on level 2 optimizations and enabling extensions for the C99 standard. Compile like:

gcc -O2 -o fun.exe --std=gnu99 fun.c

Clang also requires a language extension, which is not trivial to acquire. We’re most likely need to compile it to ourselves, and then we can compile our programs like this:

clang -fblocks -o fun-clang.exe fun-clang.c -lBlocksRuntime

If we manage to make our program to compile, it still can crash if we tried to use local variables in GCC, or if we carelessly casted function pointers to void* and back, which can easily make development a pain.

Nevertheless, even when the usability of such techniques is limited, it’s definitely worth keeping an eye on them, for they are being used for certain purposes, and expected to be used even more.

Higher-order functions on the other hand are nothing new, and already can be found in the standard library (bsearch, qsort), in the Linux kernel where driver interfaces are defined using a table of first-class functions, and in popular libraries like GTK as support for callbacks. They solve certain issues ruling complexity without being complicated, help in parallel computing challenges, and are definitely worth learning them regardless of in which programming language do we actually use them.

Although C is not famous for supporting these concepts, and while there are more applicable languages out there, which provide better support for first-class and higher-order functions, we still can use them in C as well, if we find it reasonable, and learned the tools that are required to do so.

Conclusion

C is an extremely powerful language, and without much exaggeration, it’s pretty safe to say, that C runs the world. It’s everywhere, but so are programming paradigms. Hardly any serious programming language can successfully restrict itself to one paradigm. We can learn more paradigms and learn about the things in which they excel. If we find valid use cases for concepts from other paradigms, we can benefit from using them regardless of how the language is used most often. Luckily, C provides support for the most important features regarding higher-order functions and first-class functions, which even can be extended with the help of the great compilers available to us.

Nevertheless, we should keep in mind the pros and cons of applying these concepts in C, and that there are other options as well. C interoperability is key for many programming languages, that expands the range of our options even further.

For comparison, we can find out-of-the-box support for these concepts (and more) in C++, C#, and Java among others. For hardcore functional programming in C, we can embed Lisp into C. And if we’re willing to leave C behind, there are languages designed from scratch around these concepts, like OCaml and Haskell.

No matter where we put our focus on, we still can benefit from foreign concepts, and use them where they fit most. Getting to know more just makes us a better developer.

--

--

Viktor Daróczi
Nerd For Tech

A software engineer focusing on the fun part of computing.