Jeff Hiner
Aug 17 · 10 min read

A practical guide to FFI using bindgen (part 1 of 2)

Cargo containers loaded on ships in front of a dock
Cargo containers loaded on ships in front of a dock
Photo by Nilantha Ilangamuwa on Unsplash

Today I want to dig into one of the difficulties we ran into while trying to rewrite our IoT Python code in Rust: specifically FFI, or the “Foreign Function Interface” — the bit that allows Rust to interact with other languages. When I tried to write Rust code to integrate with C libraries a year ago, the existing documents and guides often gave conflicting advice, and I had to stumble through the process on my own. This guide is intended to help future Rustaceans work through the process of porting C libraries to Rust, and familiarize the reader with the most common problems we encountered while doing the same.

In this guide we’re going to discuss how to expose C library functions to Rust using bindgen. We’ll also talk a bit about the limitations of this automatic toolset, and how to check your work. Fair warning: implementing FFI correctly is Rust Hard Mode. If you’re new to Rust, please don’t start here. Work through the book, write some practice code, and come back after you’re thoroughly comfortable with the borrow checker.

The motivation

To back up, I need to explain why we at Dwelo needed to do this in the first place.

For our rewrite project, we wanted to integrate with a vendor-supplied C library that would be responsible for talking to our Z-Wave chip via a standard vendor-specified protocol over a serial port. This serial communications protocol was complicated and difficult to implement correctly, and also subject to strict timing constraints — bytes sent to the serial port were essentially transmitted through the radio directly. Sending the wrong bytes at the wrong time could hang the radio chip completely. There was a reference document several hundred pages long containing a specification for transmission and acknowledgements, retransmission logic, error handling, timing intervals, and so on. The original Python code had implemented this protocol from scratch (incorrectly), and this implementation represented a sizable chunk of the bugs in our legacy stack. On top of that, the radio chipset vendor was pushing back on certification unless we could demonstrate we implemented the protocol correctly. By coincidence, the provided reference libraries (implemented in C) were guaranteed to comply with the spec. Quite plainly, the vendor C code seemed like the shortest path to business success.


Rust natively supports linking against C libraries and calling their functions directly. Of course, any function imported thus requires the unsafe keyword to actually call (because Rust can’t guarantee its invariants or correctness) but that’s an inconvenience we can punt until later.

The Rust Nomicon will tell you that you can import function definitions or other global symbols by declaring them in extern blocks, as long as the names and signatures line up exactly. This is technically correct but not all that helpful. Typing in function definitions by hand is completely stupid bonkers, and makes no sense when we have a perfectly good set of header files with the declarations in them. Instead, we’re going to use a tool to generate the Rust signatures from our library’s C header files. Then we’re going to run some test code to verify it’s working correctly, tweak things until it looks right, and finally bake the whole thing into a Rust crate. Let’s begin.

Bindgen

The most commonly used tool to generate Rust signatures from C headers is bindgen. Our goal is to create a bindings.rs file representing the library’s public API (its public functions, structs, enumerations, etc). We will configure our crate to include that file. Once the crate builds, we can then import that crate into any project to invoke our C library’s functions.

What you’ll need:

  • A functioning cargo setup. I assume if you’re compiling Rust code at all that you have this.
  • A working C compiler and pkg-config for dependency resolution.
  • Header file(s) corresponding to the library functions you want to use.
  • If you have the source code that’s great; this example assumes you are building the library from source. Otherwise you’ll need the path to the static or dynamic library you’re linking to, if it’s not in your system path.
  • An amount of patience corresponding to the size of the library’s API.

Installing the command-line bindgen tool is as simple as:

cargo install bindgen

On my Debian laptop I also needed to manually apt install clang as well, though your mileage may vary.

Setting up your crate

Our new library crate will contain the dirty business of building and exporting the native C library’s unsafe functions. Again, leave any safe wrappers for another crate — this not only speeds up compilation, but it also makes it possible for ̶m̶a̶s̶o̶c̶h̶i̶s̶t̶s̶ other crate authors to minimally import and use just the raw C bindings. The standard Rust naming convention for FFI crates is lib<XXXX>-sys.

We’re going to create a build.rs file that will be used with the cc crate to compile and link our bindgen exports. Let’s put our library source code in a subdirectory called src and our associated include files in a subdirectory called include. Next, let’s make sure our Cargo.toml is set up:

[package]
name = "libfoo-sys"
version = "0.1.0"
links = "foo"
build = "build.rs"
edition = "2018"
[dependencies]
libc = "0.2"
[build-dependencies]
cc = { version = "1.0", features = ["parallel"] }
pkg-config = "0.3"

Next we’ll populate the build.rs file. The following is going to look a bit weird — we are writing a Rust program that will output a script to stdout; cargo will directly use this script to build our crate.

If you’re linking against an already-compiled library guaranteed to be in the system path, your build.rs might be as simple as this:

fn main() {
println!("cargo:rustc-link-lib=foo");
}

Most of the time, though, you’ll want to at least use some sort of package configuration to ensure the library is actually installed and the linker can find it. In many cases, your library is small enough to be built as a static library by cargo itself. The pkg-config crate helps with library and dependency configuration, and cc handles the dirty work of building C code from within cargo. Both crates run configuration and build steps before they output the lines that cargo needs. In our example our source code uses zlib, so we use pkg-config to find and import an appropriate version. The sample code below also shows how to add compiler flags and preprocessor definitions.

fn main() {
pkg_config::Config::new()
.atleast_version("1.2")
.probe("z")
.unwrap();
let src = [
"src/file1.c",
"src/otherfile.c",
];
let mut builder = cc::Build::new();
let build = builder
.files(src.iter())
.include("include")
.flag("-Wno-unused-parameter")
.define("USE_ZLIB", None);
build.compile("foo");
}

Finally, you will need a src/lib.rs file to actually compile our bindings. Here we will disable warnings for C naming conventions that don’t line up with Rust, and then just macro include our generated file:

#![allow(non_upper_case_globals)]
#![allow(non_camel_case_types)]
#![allow(non_snake_case)]
use libc::*;include!("./bindings.rs");

Generating the bindings

While the bindgen user guide seems to guide you toward generating the bindings on the fly within build.rs, in practice you will need to edit the generated output before releasing it into a crate. Generating one or more files via the command line and committing the output to your repository will give you the most control.

The initial attempt at generation might look something like this:

bindgen include/foo_api.h -o src/bindings.rs

For a real header with more than a few API calls, this is unfortunately going to generate way more definitions than we want or need. The command line that generated part of the bindings.rs for our project at Dwelo wound up being something closer to this:

bindgen include/foo_api.h -o src/bindings.rs '.*' --whitelist-function '^foo_.*' --whitelist-var '^FOO_.*' -- -DUSE_ZLIB

Convincing the generator to give you only what’s necessary and not barf on undefined symbols is a trial and error process. Consider doing generation in stages and concatenating the results.

It’s powerful, but not perfect

When you pass a header to bindgen, it will invoke the Clang preprocessor and then greedily convert every symbol definition it can see. You will need to make adjustments at the command line, and refactor the resulting output.

Original Makefile/CMake extras

After the -- on the bindgen command line, you can add whatever flags you’d normally add to a compiler when building against the library. Sometimes these will be extra include paths, and sometimes they will be necessary when headers have #ifdef guarded definitions. For our vendor library, failing to define OS_LINUX hides a bunch of symbols we need. (What, did you think legacy code is going to use standard compiler defines like __linux__ instead of making things up? Sorry, comedy hour is down the hall and up the stairs.) If your generated output is mysteriously missing functions, check your defines.

Headers that include standard headers

Bindgen is very aggressive about generating definitions for every available symbol in the preprocessor output, even generating definitions for transitive system-specific dependencies that you do not want. This means if your header includes stddef.h or time.h (or includes another header that does) you will wind up with a bunch of extra crap in the generated output. It’s even worse when compiling C++ code, as C++ compilers apparently must export every symbol used from std even when it’s not necessary or desired.

Your crate should only expose what’s in the library API, not what happened to be in system header files or the standard library where you did your generation. This one is a pain, particularly if your library’s functions and constants don’t follow any kind of naming convention. The only way around this is with whitelist regex and lots of trial and error.

Preprocessor #defines

#define FOO_ANIMAL_UNDEFINED 0
#define FOO_ANIMAL_WALRUS 1
#define FOO_ANIMAL_DROP_BEAR 2
/* Argument should be one of FOO_ANIMAL_XXX */
void feed(uint8_t animal);

This looks contrived, but this is an obfuscated version of a pattern that’s pervasive through our vendored C library.

In C this works fine, because when you include the header into your source you can just use something like FOO_ANIMAL_WALRUS directly when a function calls for it. The C compiler will implicitly cast the literal 1 to uint8_t and the code works. Of course the original author should have created an enum typedef for clarity and used that, but they didn’t, and that’s still legal C code we have to deal with.

pub const FOO_ANIMAL_UNDEFINED: u32 = 0;
pub const FOO_ANIMAL_WALRUS: u32 = 1;
pub const FOO_ANIMAL_DROP_BEAR: u32 = 2;
extern "C" {
pub fn feed(animal: u8);
}

Although bindgen is clever enough to recognize the symbols as constants, there are still a few issues. The first is that bindgen has to guess the type for each FOO_ANIMAL_XXX. It’s apparently guessed u32 in this case (which not only doesn’t match our function parameter, but is also technically wrong). This leads to the other issue: Rust will require us to explicitly cast FOO_ANIMAL_WALRUS to u8 when calling feed. Not very ergonomic, is it? To fix this, we need to change the types on the generated consts to match the function definition. We’ll fix the enumeration issue later in the safe wrapper.

Some structs should just be opaque

Our vendored library passes a pointer to a context object for nearly every function other than initialization. (Let’s call it foo_ctx_t for now.) This is a widely-used pattern and perfectly reasonable. But because of an implementation flaw our header file defines foo_ctx_t instead of forward-declaring it. This unfortunately leaks the internals of foo_ctx_t. That leak then transitively forces us to know and define a bunch of other dependent types we don’t care about.

Rust doesn’t really allow separate declaration and definition for structs. Unlike C, we can’t just declare foo_ctx_t in Rust without providing a definition for it, and the Rust compiler has to recognize the name foo_ctx_t in order to use a pointer to it as a function arg. But we can use workarounds to avoid having to define it completely. Neither of them are perfect, but as of this writing there are two alternatives that are at least functional in practice.

We can replace the struct definition with an enumeration type that has no variants, which conveniently will give you a compile error if you accidentally try to construct it or use it as anything but a pointer target. This makes type purists upset because we’re technically lying to the compiler, but it does work:

pub enum foo_ctx_t {}

Or we can replace its innards with a private zero-size type field. This is what bindgen does by default, and it’s fine as long as you don’t rely on mem::size_of:

pub struct foo_ctx_t {
_unused: [u8; 0],
}

Const-correctness

Bindgen will convert C const pointers into Rust const * and undecorated C pointers into mut *. If the original code was const correct, this works out just fine. If not, it can cause headaches later on when trying to create safe wrappers. Fix the library, if possible.

The example below can be easily used inside a Rust unsafe block with a normal (immutable) reference to time_t and a mutable reference to tm:

// Generated from <time.h>
extern "C" {
pub fn gmtime_r(_t: *const time_t, _tp: *mut tm) -> *mut tm;
}

You don’t technically have to modify the C library to change a pointer to const * in an extern Rust definition. In fact, the symbol table for C libraries doesn’t even have a parameter list, so Rust’s linker has no way to confirm your function parameters are correct at all (this is not the case for C++ symbols, thankfully). If you do modify the Rust pointer types, you are responsible for verifying that the invariants for const pointers are in fact correct for the library.

Sharp edges

If your functions have return values for errors, do yourself a favor now and ensure the #[must_use] annotation is attached to each one. This will at least give some indication if callers forget to check the return value for errors, and it will help later when we wrap everything in safe layers.


Write a README.md file detailing exactly how you invoked bindgen, and commit it to the repository. Trust me, you’ll want this later when you realize something is missing.

Add a couple unit tests for sanity, then try running cargo test. Bindgen helpfully creates some tests of its own to make sure the generated struct alignments are okay. You can also run cargo doc --open on your crate to get a high-level view of what you’re exporting, and double check that you’re not accidentally exposing the wrong things.

All this being said, these manual steps are necessary because bindgen is doing the best it can with the information it has. The generation process will expose every small structural issue in your C library.

When you’re all finished, hopefully you’ll be left with a not-quite-abominable Rust package that exposes your raw library API via unsafe Rust. You’re halfway there! Next up, we’ll talk about how to take these bindings and guard them behind ergonomic and safe wrappers so our application code can’t use them incorrectly.

Dwelo Research and Development

All the Dwelo R&D news that is fit to render.

Thanks to kakaner

Jeff Hiner

Written by

I’m an IoT software engineer at Dwelo, a company that is working to make smart apartments a reality.

Dwelo Research and Development

All the Dwelo R&D news that is fit to render.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade