Fiddling with Ruby’s Fiddle

I enjoy playing around with Ruby’s internals so I can see how things really work under the hood. One of the great GREAT (did I mention great?) features of ruby is the Fiddle core extension. This allows you to use Ruby to really dig into the internals of Ruby object structures, which means you can use irb to do some super cool stuff. I have to thank Aaron Patterson (@tenderlove) for giving an excellent talk at Ruby meet up in Vancouver and showing us some pretty cool stuff with Fiddle, which gave me the inspiration to play around with it.

While playing around with Fiddle, one of the things I learned was you can get the actual pointer value of an object by taking the object id, and doing a bitwise shift to the left. This will give you the pointer (or memory location) of the ruby object in memory.

str = “Hello world!”
str_ptr_int = str.object_id << 1
=> 140561723786320 #or wherever your machine put your string in memory

Given that we can get actual pointer values ,Fiddle gives us some nice ways of using pointers so that we can peek inside the raw data of an object. One of the classes we will use is Fiddle::Pointer. We can show that the above is true by doing this:

str_ptr = str_ptr_int
str_ptr.to_i == str_ptr_int
=> true

Now that we have an actual pointer object, we can grab blocks of memory and examine the data. If you take a look at the C struct definition for RString, you will notice that (like every ruby object), it begins with a struct RBasic. RBasic contains flags and also a pointer the class of the object (which of course is an object itself). We can get both the flags, and the pointer value to the class by doing:

r_basic = str_ptr[0, Fiddle::SIZEOF_LONG*2]
_, klass_ptr_int = r_basic.unpack(“Q2”)
klass_ptr = klass_ptr_int
=> String

The above code essentially takes the data that fills the size of two longs (since a VALUE is an unsigned long), and unpacks this data as two longs. The first value is the flags (which I’m ignoring), and the second is the pointer to the class object, which predictably we’ve shown to be a String.

If you take another look at RString, you’ll see that the next block of data is a union. Depending on the size of the string, this union either represents another struct that contains the length of the string, and a pointer to the string value itself, or an embedded character array containing the string value. What I’ve learned while using Fiddle with strings is that strings of a low length are stored directly within the string object itself and its size calculated according. Let’s test this:

str_value_data = str_ptr[Fiddle::SIZEOF_LONG*2, EMBEDDED_LENGTH]
=> “Hello world!\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00"

Or for a bigger string, let’s get the string size according to where it is in the RString object:

str = “hello world!” * 20
# … get the str pointer like we did before
str_size_data = str_ptr[Fiddle::SIZEOF_LONG*2, Fiddle::SIZEOF_LONG]
str_size = str_size_data.unpack(“Q”)
str_size == str.size
=> true

And the value:

str_value_ptr_int = str_ptr[Fiddle::SIZEOF_LONG*3, Fiddle::SIZEOF_LONG]
str_value_data = str_value_ptr_int
str_value_ptr_int = str_value_data.unpack(“Q”)[0]
str_value_ptr = str_value_ptr_int
str_value_ptr.to_s == str
=> true

This is incredibly cool, and suffice to say you can gain some powerful insight into the internals of Ruby by using Ruby itself. What if we changed the value of the size?

str_ptr[Fiddle::SIZEOF_LONG*2, Fiddle::SIZEOF_LONG] = [10].pack(“Q”)
str.size == 10
=> true
p str
=> “hello worl”

Or what if we changed the class of the object?

fixnum_class_ptr_data = [Fixnum.object_id << 1].pack(“Q”)
str_ptr[Fiddle::SIZEOF_LONG, Fiddle::SIZEOF_LONG] = fixnum_class_ptr_data
str.class == Fixnum
=> true

Pretty awesome if you ask me.