🏂 Ruby Bindings and Extensions
A practical guide to going native with C
Ruby is a wonderful language for so many reasons. It’s expressive, easy to read, ideal for modelling complex domains and there’s a broad and active community maintaining the language and libraries.
However, it can occasionally be preferable to reach down to a lower level language (like C) since Ruby’s expressivity comes at the cost of some overheads. You may have a particular method that gets called very frequently and does not benefit from any high-level constructs (raw mathematics calculations, for instance); or it may be necessary to integrate directly with a C library because it performs functions that are unavailable in pure Ruby.
The project was less than a year old when we started out and there were no Ruby bindings available. C isn’t a go-to language for most of the engineering team, but we’re always keen to learn and so we decided the best way forward was to build the bindings ourselves.
Aut inveniam viam aut faciam! (Find a way, or make one!) — Hannibal ⚔️
There are two common approaches to working with C from Ruby:
- Write a native extension directly in C and use MKMF to manage it;
- Bind directly with an existing C component using FFI.
We’ll briefly discuss MKMF with a self-contained example before taking a deeper dive into the world of FFI and how we’ve used it to integrate H3 support into our Ruby applications.
🚨 NOTE: It’s assumed that you have a working knowledge of Ruby and a cursory understanding of C. If you’ve never even looked at C, take a quick look at this primer before continuing.
🔌 Part One — Native Extensions with Make Makefile
Most commonly when interfacing with C from Ruby, people reach for MKMF, a core feature of the Ruby standard library. The convention is to have a subdirectory in your gem directory called
ext which contains your C code. This naming helps clarify that the code within is not Ruby like the rest of the codebase.
This directory also contains a file called
extconf.rb which is responsible for generating a
Makefile. In turn, this file is capable of compiling the C code into a format that Ruby can call functions on directly (known as a “Native Extension”).
MRI Ruby is itself written in C, so its header file provides a lot of helpful macros and functions to marshal data back and forth from Ruby and C.
Let’s work through an example (full source code is here).
This is a typical gem project layout with the addition of the
ext subdirectory containing our C. Let’s start with the C code, then work our way out to the supporting Ruby files.
Making sense of the C
There’s quite a lot going on here!
We begin by including the
math.h header files. The Ruby header contains a lot of useful type definitions and macros that we’ll be needing, and the math header contains standard mathematical functions.
The method declaration for
c_method takes two arguments,
rb_val. The first argument is always the Ruby receiver object (whether this is an object, class, or module) — this is because in C there are no objects.
You’ll notice also that the arguments are of type
VALUE. This is a special union type in C to represent a Ruby object. In Ruby we don’t care about explicit typing but since C is statically typed we need the
VALUE union type in order to have syntactically valid C.
However, we do need to convert the given
VALUE to something more specific before we can do anything useful with it. This is where the
NUM2DBL macro comes in. Assuming that the given argument is a Ruby
NUM2DBL converts it into a C-friendly double precision float. Once we’re finished doing our maths, we then convert the double back to a
DBL2NUM and return the result back to Ruby.
Init_XYZ() function is a “magic” function invoked by Ruby to initialise our C code where
XYZ is the name of our C module i.e. “example”. To initialise, we define our module using
rb_define_module and we use
rb_define_singleton_method to set up our
c_method method, linking it to the actual C method and indicating that we expect only one argument.
The equivalent code in Ruby would look something like this:
So that’s it for the C! Let’s figure out the rest of the support files in Ruby.
We’ll start with the all-important
MKMF can do a lot of things to set up a Makefile, but most of the time the simplest case (as above) does all we need. We just give it the
ext directory name and file name combo and it figures out the rest, provided our C file correctly defines that magic function and is syntactically valid.
Other features, like loading third-party libraries before the C code is linked, setting up additional headers, configuration etc can be configured using various other methods in MKMF if needed.
In order to compile our C code into a binary object that Ruby can use, we use the
rake-compiler gem. By setting up a Rake task to handle compilation, we simply run
rake compile and our C code is packaged up into the
lib/example directory, ready to be called alongside our Ruby code.
Building native extensions on gem install
.gemspec file, you can provide a path to the
extconf.rb file and it will build your C extension when the gem is installed. Just set the
extensions property in your spec and
gem install handles the rest.
And that’s about all we’ll cover for MKMF.
Once you have the basic structure, you can do whatever you wish with C code and call it via a new singleton method line inside the magic method.
🧲 Part Two — The Foreign Function Interface
This approach differs significantly to MKMF:
- Since the intention is to call functions directly within existing binaries, there is no need to compile any native extensions (except for the FFI gem itself when it gets installed);
- You don’t need to write any C, just FFI’s DSL for invoking C functions and building/accessing structs and data collections;
- It’s compatible with non-MRI Rubies like JRuby (if you use MKMF then MRI is your only available Ruby);
- There is extra protection from injury — FFI handles memory management for you, so memory allocated by C is released when the FFI object in question gets garbage collected. More on this later.
We’ll discuss FFI usage via examples from the bindings we wrote at Stuart for H3.
But first, let’s explain some more details about H3 so the upcoming code samples make sense in context.
What is H3?
From Uber’s own literature:
The H3 geospatial indexing system is a multi-precision hexagonal tiling of the sphere indexed with hierarchical linear indexes.
In a nutshell, H3 is a way of identifying any part of the surface of the earth with a unique hexagon. Each hexagon has a unique identifier, known as the
H3 index, and a resolution level. At the top resolution level (level 0), it’s possible to cover Earth with 110 hexagons and 12 pentagons (think of the way a football is stitched together! ⚽️💨). These top level hexagons are known as the base cells.
Each of these hexagons can then be broken into seven smaller hexagons. Repeating the process recursively down to the smallest supported resolution allows indexing to the square-metre level of accuracy. In total, there are over 600 trillion unique H3 indexes on Earth.
So, of all shapes, why hexagons?
Well, there are only three polygons that tessellate regularly: the equilateral triangle, the square, and the regular hexagon. Hexagons are unique in that each neighbouring hexagon is the same distance away (if you measure from the centre). Triangles and squares don’t have this property.
As shown below, a triangle has 12 neighbours at 3 differing distances, a square’s 4 diagonal neighbours are further away then its 4 remaining neighbours, and a hexagon’s 6 neighbours are all equidistant from its centre.
So why is this useful? Well, Uber uses H3 for surge pricing by tracking rides in realtime, bucketing them to their containing hexagons, and dynamically adjusting prices bases on supply and demand in those regions.
Hexagons were an important choice because people in a city are often in motion, and hexagons minimize the quantization error introduced when users move through a city.
If you want to learn more, check out this talk by Joseph Gilley:
Really cool stuff — major respect to Uber for open-sourcing this! Read the Uber Engineering H3 page to learn more.
Building the bindings
Ok, enough hexagons! Back to Ruby bindings.
We started out with the MKMF approach when we began writing our Ruby bindings for H3. It worked reasonably well but resulted in quite a lot of boilerplate. The decision was made to move over to FFI and take advantage of the DSL and automatic memory management.
Let’s get our hands dirty with FFI by wrapping a simple function.
The H3 library defines a function called h3ToParent, which takes a H3 index and a resolution, then returns the parent H3 index at the given resolution i.e. the larger hexagon which contains the given hexagon.
🚨 NOTE: These examples are purely demonstrative and may not work when executed in isolation. We encourage you to read the H3 Ruby source code on GitHub for a full working example, boilerplate included.
After telling FFI to load the H3 library, we use
attach_function to allow the method to be called from Ruby.
- The first argument* gives the name we want to use when calling the method (so snake-case 🐍 instead of camel-case 🐫);
- The second argument is the actual name of the function in the C library so FFI can find it;
- The third argument is an array of types which informs FFI of the argument types we expect to be passing in (in order);
- The last argument is the expected type of the return value.
This kind of approach has a lot of advantages!
It’s easy to read, and we push the burden of validating/converting data types down to FFI itself so we can focus on the details (if we pass a string where it’s expecting an integer, it will raise
ArgumentError for us).
*It’s also possible to call
attach_function without this alias, so it can be called with the original camel-case.
FFI also allows us to work with types in a more fine-grained way. There’s a
typedef method that functions as an alias e.g.
typedef :ulong_long, :h3_index
Now we can talk about h3 indexes instead of unsigned long longs when calling the
Complex Custom Types
We can also build up more complex types using FFI’s
Let’s say we want to validate the
int argument (which represents a resolution) to ensure the value is within an acceptable range of resolutions i.e. 0–15 inclusive.
Now we can use the
Resolution class in our
attach_function definitions and we’ll get validation errors if the number is out of range!
Passing structs to C functions
Simple functions with only native types for arguments and return values are pretty straightforward to integrate. But what about functions that expect to receive a pointer to a struct as an argument?
Well, FFI has us covered there, too.
Let’s wrap a function,
geoToH3, which expects a
GeoCoord struct containing a pair of latitude/longitude coordinates.
In this method, we tell FFI to expect a pointer to a
GeoCoord struct. We set one up using the DSL, populate the struct’s fields and then pass it right in.
You’ll notice we wrap the
geoToH3 function within a Ruby method,
geo_to_h3, rather than giving it a snake-case alias and calling it directly. This allows Ruby calling code to pass a 2-element array of degree coordinates, rather than needing to care about building a struct with radians coordinates.
So that’s passing structs in. How about a function that returns a struct?
h3ToGeo function does the inverse of
geoToH3 and returns a
GeoCoord struct with coordinates corresponding to the given H3 index.
There’s something subtle at play here.
The C function is declared to return
void i.e. it doesn’t return anything! This is weird territory as a Ruby programmer — we pass in a
GeoCoord struct as an argument by reference and the
h3ToGeo function updates the struct’s contents rather than returning us a fresh struct.
So, why is this?
Well, C libraries are often written in this way, and it’s to make the client code 100% responsible for memory management. If the function returned a pointer to a new struct, then the function would be responsible for allocating its memory on the heap. This means the client code would then be responsible for eventually freeing it later. This half-and-half responsibility can result in memory leaks, and also takes control away from the client code regarding how memory gets allocated in the first place.
Manipulating memory that the client code is responsible for is preferable, so that’s what good C library developers do. If you’re curious, read more about this (and other minimalist C approaches) here.
Pointers, Memory, and Arrays
At this point, let’s familiarise ourselves with a crucial difference between C and Ruby — how memory is allocated and managed.
Ruby is like staying in a hotel room where you needn’t concern yourself with cleaning up or cooking since you have house-keeping and room-service to look after you.
You simply create objects using
.new and get on with your life, with no need to worry about allocating memory (just ask for ice cream and it will be brought to you).
Similarly, you don’t (usually) need to care about destroying objects when you’re finished with them. This is because Ruby’s garbage collector keeps an eye on your allocated objects and destroys them for you once they fall out of scope (housekeeping will get rid of that used ice cream dish).
By contrast, C is being stuck home alone. If you don’t clean up, things get messy; if you don’t cook, you don’t eat; if you’re not careful, you’ll get burned.
Using Pointers with FFI
Let’s take a look at a more involved example where we have to concern ourselves with memory allocation. Thankfully, FFI makes this as painless as possible.
h3ToString function takes an H3 index in numerical form and converts it to the equivalent hexadecimal representation.
Now our Ruby is beginning to resemble C! 😱
We use FFI’s
MemoryPointer class to initialise a piece of memory for us. We tell it that we expect the contents to be of type
char, and that there will be a maximum of 17 characters (16 hexadecimal digits plus a null terminator character to indicate the end of the string has been reached). This allows FFI to calculate precisely how many bytes of memory it will need to allocate.
Once our memory buffer is ready, we pass the pointer into
h3ToString, along with the expected size. C populates the memory for us, then on the Ruby side we use
FFI::MemoryPointer#read_string to read all the characters until the null terminator character is encountered.
This approach does add a little overhead to writing client code, particularly when you have nested structs that need initialising, or if C returns a nested struct that you need to iterate over. However, it has a nice benefit in that
MemoryPointer objects are automatically garbage collected (taking the allocated C memory with them). This frees us from the obligation to manually release memory so (hopefully 🤞) we don’t get any leaks.
Building nested structs
Finally, let’s consider using pointer arithmetic to build nested structs.
GeoFence struct contains a pointer to an array of
GeoCoord structs plus an integer count of how many structs are in the array. This allows arbitrary-shaped regions to be described.
This struct requires a bit more legwork to build!
The Ruby method
build_geofence accepts an array of coordinate pairs. We set
num_verts to be the size of this array, and we use
FFI::MemoryPointer to initialise enough memory to hold that many
Now the tricky part.
The memory is reserved, but currently empty. We need to fill it with
GeoCoord structs and then populate the lat/lon values for each. The work here is done by
GeoCoord.new(ptr + i * GeoCoord.size).
In this case,
ptr is pointing to the memory we initialised to hold our array of structs. FFI memory pointers support pointer arithmetic, so
ptr + x will calculate the memory location that is
x bytes further on from the first location.
FFI structs can accept a memory location as an argument when initialised, so we pass it the correct memory location via
In the first loop iteration
i is zero, so the first
GeoCoord struct is initialised with the memory location at the beginning of the memory region referenced by
ptr. In the second iteration
i is one, so the offset is
1 * GeoCoord.size. This allows the second
GeoCoord struct to slot in right after the first. By the end, we have an array of structs referenced by the
coords variable but held in contiguous memory by FFI and referenced by
Now we iterate over the given
list_of_coords and populate the structs’ fields with the lat/lon values, converted to radians.
Finally, we set the
verts field to equal the
ptr variable, and we’re all set.
🎯 Putting it all together
Running C code from Ruby can seem daunting at first, but it’s well worth the investment learning how to do it. As well as getting a glimpse of how Ruby works behind the scenes, C extensions and bindings give you powerful options and broaden the horizons of your application’s abilities.
There aren’t the same safety nets that come with pure Ruby but with careful testing and good coding practices, the pitfalls are manageable and FFI does a great job of protecting the programmer.
When deciding which approach to take, remember:
- Use MKMF ideally when you need full control of the C code, the amount of code needed is small, and you don’t need to support non-MRI Rubies;
- If you don’t want to write C directly, have a large separate existing project, want to target other Rubies, or need to use a precompiled/third-party C library, FFI really shines.
And there we are — with a bit of exploration and experimentation, we’ve extended our Ruby tooling so the whole engineering team can enjoy the power of H3 hexagons!
Like what you see? We’re hiring! 🚀 Check out our open engineering positions.
- Check out the Ruby bindings for H3 — much of the example code is drawn directly from it;
- Read the Uber H3 literature for all the details of how it works;
- The h3 header file is the best reference for the function signatures H3 makes available;
- This wonderful explanation of all the Ruby C macros for a full reference of what you can do with MKMF;
- The Ruby Guide for writing a Native Extension;
- Kim Burgestrand’s post on advanced topics in FFI;
- Kernighan & Ritchie’s classic, “The C Programming Language”;
- 🐝 What Is It About Bees And Hexagons?;
- Some pointers from XKCD 😂