đ Ruby Bindings and Extensions
A practical guide to going native with C
--
Ruby is a wonderful language for so many reasons. Itâs expressive, easy to read, ideal for modelling complex domains and thereâs a broad and active community maintaining the language and libraries.
However, it can occasionally be preferable to reach down to a lower level language (like C) since Rubyâs expressivity comes at the cost of some overheads. You may have a particular method that gets called very frequently and does not benefit from any high-level constructs (raw mathematics calculations, for instance); or it may be necessary to integrate directly with a C library because it performs functions that are unavailable in pure Ruby.
Our Motivation
Here at Stuart, we wanted to integrate with a C library called H3, released and maintained by Uber. H3 uses hexagons to manage geospatial data (but more on that later).
The project was less than a year old when we started out and there were no Ruby bindings available. C isnât a go-to language for most of the engineering team, but weâre always keen to learn and so we decided the best way forward was to build the bindings ourselves.
Aut inveniam viam aut faciam! (Find a way, or make one!) â Hannibal âď¸
Getting Started
There are two common approaches to working with C from Ruby:
- Write a native extension directly in C and use MKMF to manage it;
- Bind directly with an existing C component using FFI.
Weâll briefly discuss MKMF with a self-contained example before taking a deeper dive into the world of FFI and how weâve used it to integrate H3 support into our Ruby applications.
đ¨ NOTE: Itâs assumed that you have a working knowledge of Ruby and a cursory understanding of C. If youâve never even looked at C, take a quick look at this primer before continuing.
đ Part One â Native Extensions with Make Makefile
Most commonly when interfacing with C from Ruby, people reach for MKMF, a core feature of the Ruby standard library. The convention is to have a subdirectory in your gem directory called ext
which contains your C code. This naming helps clarify that the code within is not Ruby like the rest of the codebase.
This directory also contains a file called extconf.rb
which is responsible for generating a Makefile
. In turn, this file is capable of compiling the C code into a format that Ruby can call functions on directly (known as a âNative Extensionâ).
MRI Ruby is itself written in C, so its header file provides a lot of helpful macros and functions to marshal data back and forth from Ruby and C.
An example
Letâs work through an example (full source code is here).
This is a typical gem project layout with the addition of the ext
subdirectory containing our C. Letâs start with the C code, then work our way out to the supporting Ruby files.
Making sense of the C
Thereâs quite a lot going on here!
We begin by including the ruby.h
and math.h
header files. The Ruby header contains a lot of useful type definitions and macros that weâll be needing, and the math header contains standard mathematical functions.
The method declaration for c_method
takes two arguments, self
and rb_val
. The first argument is always the Ruby receiver object (whether this is an object, class, or module) â this is because in C there are no objects.
Youâll notice also that the arguments are of type VALUE
. This is a special union type in C to represent a Ruby object. In Ruby we donât care about explicit typing but since C is statically typed we need the VALUE
union type in order to have syntactically valid C.
However, we do need to convert the given VALUE
to something more specific before we can do anything useful with it. This is where the NUM2DBL
macro comes in. Assuming that the given argument is a Ruby Numeric
value, NUM2DBL
converts it into a C-friendly double precision float. Once weâre finished doing our maths, we then convert the double back to a Numeric
using DBL2NUM
and return the result back to Ruby.
Finally, the Init_XYZ()
function is a âmagicâ function invoked by Ruby to initialise our C code where XYZ
is the name of our C module i.e. âexampleâ. To initialise, we define our module using rb_define_module
and we use rb_define_singleton_method
to set up our c_method
method, linking it to the actual C method and indicating that we expect only one argument.
The equivalent code in Ruby would look something like this:
So thatâs it for the C! Letâs figure out the rest of the support files in Ruby.
Support Files
Weâll start with the all-important extconf.rb
file.
MKMF can do a lot of things to set up a Makefile, but most of the time the simplest case (as above) does all we need. We just give it the ext
directory name and file name combo and it figures out the rest, provided our C file correctly defines that magic function and is syntactically valid.
Other features, like loading third-party libraries before the C code is linked, setting up additional headers, configuration etc can be configured using various other methods in MKMF if needed.
In order to compile our C code into a binary object that Ruby can use, we use the rake-compiler
gem. By setting up a Rake task to handle compilation, we simply run rake compile
and our C code is packaged up into the lib/example
directory, ready to be called alongside our Ruby code.
Building native extensions on gem install
In the .gemspec
file, you can provide a path to the extconf.rb
file and it will build your C extension when the gem is installed. Just set the extensions
property in your spec and gem install
handles the rest.
And thatâs about all weâll cover for MKMF.
Once you have the basic structure, you can do whatever you wish with C code and call it via a new singleton method line inside the magic method.
If you need to learn more, check out the official documentation and also take a look at Rubyâs header file to see what functions and macros are available.
𧲠Part Two â The Foreign Function Interface
Rather than write C directly within the project directory, the FFI gem makes it possible to directly call functions inside compiled libraries via the C library libffi.
This approach differs significantly to MKMF:
- Since the intention is to call functions directly within existing binaries, there is no need to compile any native extensions (except for the FFI gem itself when it gets installed);
- You donât need to write any C, just FFIâs DSL for invoking C functions and building/accessing structs and data collections;
- Itâs compatible with non-MRI Rubies like JRuby (if you use MKMF then MRI is your only available Ruby);
- There is extra protection from injury â FFI handles memory management for you, so memory allocated by C is released when the FFI object in question gets garbage collected. More on this later.
Weâll discuss FFI usage via examples from the bindings we wrote at Stuart for H3.
But first, letâs explain some more details about H3 so the upcoming code samples make sense in context.
What is H3?
From Uberâs own literature:
The H3 geospatial indexing system is a multi-precision hexagonal tiling of the sphere indexed with hierarchical linear indexes.
In a nutshell, H3 is a way of identifying any part of the surface of the earth with a unique hexagon. Each hexagon has a unique identifier, known as the H3 index
, and a resolution level. At the top resolution level (level 0), itâs possible to cover Earth with 110 hexagons and 12 pentagons (think of the way a football is stitched together! â˝ď¸đ¨). These top level hexagons are known as the base cells.
Each of these hexagons can then be broken into seven smaller hexagons. Repeating the process recursively down to the smallest supported resolution allows indexing to the square-metre level of accuracy. In total, there are over 600 trillion unique H3 indexes on Earth.
So, of all shapes, why hexagons?
Well, there are only three polygons that tessellate regularly: the equilateral triangle, the square, and the regular hexagon. Hexagons are unique in that each neighbouring hexagon is the same distance away (if you measure from the centre). Triangles and squares donât have this property.
As shown below, a triangle has 12 neighbours at 3 differing distances, a squareâs 4 diagonal neighbours are further away then its 4 remaining neighbours, and a hexagonâs 6 neighbours are all equidistant from its centre.
So why is this useful? Well, Uber uses H3 for surge pricing by tracking rides in realtime, bucketing them to their containing hexagons, and dynamically adjusting prices bases on supply and demand in those regions.
Hexagons were an important choice because people in a city are often in motion, and hexagons minimize the quantization error introduced when users move through a city.
If you want to learn more, check out this talk by Joseph Gilley:
Really cool stuff â major respect to Uber for open-sourcing this! Read the Uber Engineering H3 page to learn more.
Building the bindings
Ok, enough hexagons! Back to Ruby bindings.
We started out with the MKMF approach when we began writing our Ruby bindings for H3. It worked reasonably well but resulted in quite a lot of boilerplate. The decision was made to move over to FFI and take advantage of the DSL and automatic memory management.
Letâs get our hands dirty with FFI by wrapping a simple function.
The H3 library defines a function called h3ToParent, which takes a H3 index and a resolution, then returns the parent H3 index at the given resolution i.e. the larger hexagon which contains the given hexagon.
đ¨ NOTE: These examples are purely demonstrative and may not work when executed in isolation. We encourage you to read the H3 Ruby source code on GitHub for a full working example, boilerplate included.
After telling FFI to load the H3 library, we use attach_function
to allow the method to be called from Ruby.
- The first argument* gives the name we want to use when calling the method (so snake-case đ instead of camel-case đŤ);
- The second argument is the actual name of the function in the C library so FFI can find it;
- The third argument is an array of types which informs FFI of the argument types we expect to be passing in (in order);
- The last argument is the expected type of the return value.
This kind of approach has a lot of advantages!
Itâs easy to read, and we push the burden of validating/converting data types down to FFI itself so we can focus on the details (if we pass a string where itâs expecting an integer, it will raise ArgumentError
for us).
*Itâs also possible to call attach_function
without this alias, so it can be called with the original camel-case.
Custom Types
FFI also allows us to work with types in a more fine-grained way. Thereâs a typedef
method that functions as an alias e.g.
typedef :ulong_long, :h3_index
Now we can talk about h3 indexes instead of unsigned long longs when calling the attach_function
method.
Complex Custom Types
We can also build up more complex types using FFIâs DataConverter
module.
Letâs say we want to validate the int
argument (which represents a resolution) to ensure the value is within an acceptable range of resolutions i.e. 0â15 inclusive.
Now we can use the Resolution
class in our attach_function
definitions and weâll get validation errors if the number is out of range!
Passing structs to C functions
Simple functions with only native types for arguments and return values are pretty straightforward to integrate. But what about functions that expect to receive a pointer to a struct as an argument?
Well, FFI has us covered there, too.
Letâs wrap a function, geoToH3
, which expects a GeoCoord
struct containing a pair of latitude/longitude coordinates.
In this method, we tell FFI to expect a pointer to aGeoCoord
struct. We set one up using the DSL, populate the structâs fields and then pass it right in.
Youâll notice we wrap the geoToH3
function within a Ruby method, geo_to_h3
, rather than giving it a snake-case alias and calling it directly. This allows Ruby calling code to pass a 2-element array of degree coordinates, rather than needing to care about building a struct with radians coordinates.
So thatâs passing structs in. How about a function that returns a struct?
The h3ToGeo
function does the inverse of geoToH3
and returns a GeoCoord
struct with coordinates corresponding to the given H3 index.
Thereâs something subtle at play here.
The C function is declared to return void
i.e. it doesnât return anything! This is weird territory as a Ruby programmer â we pass in a GeoCoord
struct as an argument by reference and the h3ToGeo
function updates the structâs contents rather than returning us a fresh struct.
So, why is this?
Well, C libraries are often written in this way, and itâs to make the client code 100% responsible for memory management. If the function returned a pointer to a new struct, then the function would be responsible for allocating its memory on the heap. This means the client code would then be responsible for eventually freeing it later. This half-and-half responsibility can result in memory leaks, and also takes control away from the client code regarding how memory gets allocated in the first place.
Manipulating memory that the client code is responsible for is preferable, so thatâs what good C library developers do. If youâre curious, read more about this (and other minimalist C approaches) here.
Pointers, Memory, and Arrays
At this point, letâs familiarise ourselves with a crucial difference between C and Ruby â how memory is allocated and managed.
Ruby is like staying in a hotel room where you neednât concern yourself with cleaning up or cooking since you have house-keeping and room-service to look after you.
You simply create objects using .new
and get on with your life, with no need to worry about allocating memory (just ask for ice cream and it will be brought to you).
Similarly, you donât (usually) need to care about destroying objects when youâre finished with them. This is because Rubyâs garbage collector keeps an eye on your allocated objects and destroys them for you once they fall out of scope (housekeeping will get rid of that used ice cream dish).
By contrast, C is being stuck home alone. If you donât clean up, things get messy; if you donât cook, you donât eat; if youâre not careful, youâll get burned.
Using Pointers with FFI
Letâs take a look at a more involved example where we have to concern ourselves with memory allocation. Thankfully, FFI makes this as painless as possible.
The h3ToString
function takes an H3 index in numerical form and converts it to the equivalent hexadecimal representation.
Now our Ruby is beginning to resemble C! đą
We use FFIâs MemoryPointer
class to initialise a piece of memory for us. We tell it that we expect the contents to be of type char
, and that there will be a maximum of 17 characters (16 hexadecimal digits plus a null terminator character to indicate the end of the string has been reached). This allows FFI to calculate precisely how many bytes of memory it will need to allocate.
Once our memory buffer is ready, we pass the pointer into h3ToString
, along with the expected size. C populates the memory for us, then on the Ruby side we use FFI::MemoryPointer#read_string
to read all the characters until the null terminator character is encountered.
This approach does add a little overhead to writing client code, particularly when you have nested structs that need initialising, or if C returns a nested struct that you need to iterate over. However, it has a nice benefit in that MemoryPointer
objects are automatically garbage collected (taking the allocated C memory with them). This frees us from the obligation to manually release memory so (hopefully đ¤) we donât get any leaks.
Building nested structs
Finally, letâs consider using pointer arithmetic to build nested structs.
The GeoFence
struct contains a pointer to an array of GeoCoord
structs plus an integer count of how many structs are in the array. This allows arbitrary-shaped regions to be described.
This struct requires a bit more legwork to build!
The Ruby method build_geofence
accepts an array of coordinate pairs. We set num_verts
to be the size of this array, and we use FFI::MemoryPointer
to initialise enough memory to hold that many GeoCoord
structs.
Now the tricky part.
The memory is reserved, but currently empty. We need to fill it with GeoCoord
structs and then populate the lat/lon values for each. The work here is done by GeoCoord.new(ptr + i * GeoCoord.size)
.
In this case, ptr
is pointing to the memory we initialised to hold our array of structs. FFI memory pointers support pointer arithmetic, so ptr + x
will calculate the memory location that is x
bytes further on from the first location.
FFI structs can accept a memory location as an argument when initialised, so we pass it the correct memory location via .new
.
In the first loop iteration i
is zero, so the first GeoCoord
struct is initialised with the memory location at the beginning of the memory region referenced by ptr
. In the second iteration i
is one, so the offset is 1 * GeoCoord.size
. This allows the second GeoCoord
struct to slot in right after the first. By the end, we have an array of structs referenced by the coords
variable but held in contiguous memory by FFI and referenced by ptr
.
Now we iterate over the given list_of_coords
and populate the structsâ fields with the lat/lon values, converted to radians.
Finally, we set the GeoFence
structâs verts
field to equal the ptr
variable, and weâre all set.
Phew! đ
đŻ Putting it all together
Running C code from Ruby can seem daunting at first, but itâs well worth the investment learning how to do it. As well as getting a glimpse of how Ruby works behind the scenes, C extensions and bindings give you powerful options and broaden the horizons of your applicationâs abilities.
There arenât the same safety nets that come with pure Ruby but with careful testing and good coding practices, the pitfalls are manageable and FFI does a great job of protecting the programmer.
When deciding which approach to take, remember:
- Use MKMF ideally when you need full control of the C code, the amount of code needed is small, and you donât need to support non-MRI Rubies;
- If you donât want to write C directly, have a large separate existing project, want to target other Rubies, or need to use a precompiled/third-party C library, FFI really shines.
And there we are â with a bit of exploration and experimentation, weâve extended our Ruby tooling so the whole engineering team can enjoy the power of H3 hexagons!
Like what you see? Weâre hiring! đ Check out our open engineering positions.
Further reading
- Check out the Ruby bindings for H3 â much of the example code is drawn directly from it;
- Read the Uber H3 literature for all the details of how it works;
- The h3 header file is the best reference for the function signatures H3 makes available;
- This wonderful explanation of all the Ruby C macros for a full reference of what you can do with MKMF;
- The Ruby Guide for writing a Native Extension;
- Kim Burgestrandâs post on advanced topics in FFI;
- Kernighan & Ritchieâs classic, âThe C Programming Languageâ;
- đ What Is It About Bees And Hexagons?;
- Some pointers from XKCD đ
đŁ For more posts like this you can follow us on Twitter and on our Medium Publication.