Advanced C binding using ocaml-ctypes and dune

Romain Beauxis
7 min readDec 1, 2019

--

I was working on a OCaml binding for libsrt last summer, to add support for SRT real-time input and output to liquidsoap, and came across the need to access the sys/socket.h C API.

I had already decided to use the very elegant ocaml-ctypes module for the SRT binding so I went with it and created a ocaml-sys-socket module using it as well. It was a very interesting experience that I would like to describe here!

ocaml-ctypes

The idea behind OCaml ctypes is to create a binding against a C library without having to write C code, or as least as possible. The most straight-forward way of using it is via libffi , providing access to dynamically-loaded libraries.

The second way of using it is by letting the module generate the basic C stubs required to build and link against a shared library. This is the mode that we’re going to use here. In this mode, the programmer has to describe the C headers of the library they intent to bind to using dedicated OCaml modules, operators and types. From that description, ocaml-ctypes is able to generate the required glue for the binding.

One advantage of using ocaml-ctypes is that the created bindings make as few assumptions as possible about the OCaml C interfacing API. This is pretty nice, in particular since the OCaml compiler is moving pretty quickly these days (which is awesome!) and also if, perhaps one day, support for multi-core is added to the compiler, which will undoubtedly change the C interface API quite a bit.

dune

dune (formally jbuilder ) is a build system for OCaml projects that has recently raised to much popularity, particularly due to its tight integration with the rest of the OCaml ecosystem, such as ocamlfind and opam .

My personal motto in programming in general is that “Simple things should be simple, but complex things should be possible”. dune certainly does not fit into that category but, rather, makes some complex things extremely easy to setup. It’s the kind of tool that will make your life incredibly easier when what you intent to do fits well within their workflow but might not be easy to bend to some very specific niche use. We will see one such case below.

At any rate, it’s been an amazing experience getting to learn how to use dune and the resulting code and build system is remarkably short and elegant, yet very powerful.

socket.h

socket.h is the Unix header that describes the C API to various socket operations, IP version 4 and 6 as well as unix file sockets. There is also a windows API mimicking it, which makes most code using it easily portable to windows.

Most network-based C libraries refer to socket.h to describe the type of socket that can be used with their API so it’s an important entry point for a lot of network operations and one that would be nice to support as generically as possible in OCaml.

The catch, though, is that, most likely for historical reasons¹, the POSIX specifications only partially defines some of the required data structures and types, which makes it possible to write C code using them but does not give enough information to write C bindings without having to use the compiler to parse the actual system-specific headers of the running host.

For instance, here’s how the sockaddr structure is specified:

The <sys/socket.h> header defines the sockaddr structure that includes at least the following members:sa_family_t   sa_family       address family
char sa_data[] socket address (variable-length data)

Likewise, here’s what is specified about the size of the socklen_t data type:

<sys/socket.h> makes available a type, socklen_t, which is an unsigned opaque integral type of length of at least 32 bits.

Thus, in order to know the exact offset of sa_family inside the sockaddr structure or the actual size of a socklen_t integer, one has to include the OS-specific header, parse its definitions for that specific OS and, only then, is it possible to compute that offset or data size. Let’s see how it’s done in our binding now!

Putting it together

The C binding requires 4 separate passes:

  • The constants pass, which computes and exports some specific constant and data sizes, computed from the C headers
  • The types pass, which, given the system-specific constants and sizes exported in the previous phase, defines the actual C data structure bindings.
  • The stubs pass, where we define the actual bindings to the C functions that we wish to export in our API.
  • Finally, the last pass does a cleanup of the stubs pass to export a relevant and OCaml- (and ocaml-ctypes) specific public API that is to be used by users of the module.

dune makes each of these steps fairly easy to integrate into the next one, defining compilation elements and binaries to build before moving to the next pass.

Constants pass

During that pass, we compute and export all required C values defined in the headers. We also add our own constants, which give us the sizes that the POSIX specifications leave up to the OS. Here’s the OCaml code for it:

Pretty straightforward! Some of these constants are defined by the POSIX headers and some are custom defined for our needs, for instance SOCKLEN_T_LEN . Here’s how they are extracted, using the dune build configuration for gen_constants_c:

This OCaml code makes use of ocaml-ctypes to build a binary that exports the OCaml interface defined by Sys_socket_constants.Def . Once compiled, its output looks like this:

The files used to describe how to build this binary using dune are located in a separate generator directory. Here’s the entry to build this one:

This executable is compiled during the next phase. Let’s move into it now!

Types pass

During that phase, we use the constants exported during the previous phase to describe the various C structures and types. This is by far the most complex part of the code, making use of first-class modules and several OCaml tricks.

First, let’s look at how we tell dune that we need to generate the .ml file exporting our required constants from the previous pass:

With only this information, if the code refers to a Sys_socket_generated_constants module, dune will know that this module needs to be generated and how to do it. We will explain later the use of the exec.sh wrapper here.

Now that we can make use of the exported constants in our OCaml code, let’s see how we define the Socklen module, exporting abstract types and interface to use socklen_t integers:

As you can see, we make use of first-order modules and the size of the socklen_t integer to define the right API for the compiling host. Now let’s see how we define the sockaddr interface:

Here, too, we make use of the size of sa_family as exported previously to define the right structure fields.

Next step, we need to compile this interface again to export the right offset for the various structures that have been defined. That’s dune’s job again!

First, the generator code:

And the build instructions:

Once, compiled, the exported .ml looks like this:

As you can see, this exports all the offsets required to access the fields inside a sockaddr_t structure. We’re now ready to move to the final stage, which is the actual binding stubs!

Binding stubs

First step in this pass, just like with the previous ones, we need to configure dune to be able to build the exported .ml code from the types pass:

And we can now define the proper bindings. Here’s how it looks like:

As you can see, we’re exporting the getnameinfo function, taking various arguments, including a pointer to a sockaddr_t structure and a couple of socklen_t integers, making use of all the various data types and structures previously defined. The exact specifications of this function can be found here. We can now define out top-level API..

Final API

Building upon the previous modules, we export various OCaml idiomatic APIs that the binding user can now use to build new bindings against the socket.h APIs.

Just like with the previous steps, first we need to configure the build system:

This time, we need ocaml-ctypes to generate two compilation units: a .ml file describing the API exported during the stubs phase, as well as the C code to glue it with the C APIs. Here’s the code for that generator:

The exported .ml and .c files are omitted here for simplicity but the reader can generated them themselves from the ocaml-sys-socket repository if they are curious about their actual content.

We can now export our top-level API:

That’s it! We now have ocaml-ctypes specific data types and structures that can be used to interface with the host’s native socket.h APIs. Note that we also worked on top of the original low-level binding to getnameinfo to export a higher-level function more idiomatic to the OCaml language.

Lagniappe: cross-compilation to Windows

On windows platforms, liquidsoap is compiled using ocaml-cross-windows and, since windows does have compatible socket APIs, we wanted to also look at cross-compiling for the windows target, which is where we hit a snag on the current dune support.

The problem is that, at each intermediary steps, in the case of a cross-compilation, the compiled binaries need to use the target’s OS headers and not the host’s headers, otherwise we end up using offsets specific to e.g. Debian but for a windows binary.

In this case, this means that the compiled .exe binaries need to be windows binaries and that we need to execute them as windows native binaries, using wine .

dune has a truly amazing support for cross-compiling, which we do not cover here, but, unfortunately, its primitives for building and executing binaries do not yet cover this use case. Thus we had to trick it into compiling things the way we wanted to do, which why we are using the exec.sh wrapper. Here’s its code:

Now, you can go back to the previous dune files and see how this wrapper allows to execute binaries according to the system that the corresponding ocamlopt compiler has been configured to build for.

Conclusion

It’s been a fun time working on this binding! It’s amazing to see the level of details that can be built through ocaml-ctypes using their provided primitives. Ultimately, the binding is very clean and elegant, with very few low-level assumptions.

Likewise, the simplicity and power of the dune build system makes this very fluid to build. Without it, each of the described steps above would have been much more painful to execute and compile.

[1]: My bet is that, at the time the POSIX specifications were being written, there we already several inconsistent socket.h headers out in the wild among the various historical UNIX flavors..

--

--

Romain Beauxis

Senior Software Engineer, OCaml and media streaming enthusiast.