Ruby’s Inspect Considered Harmful

Originally posted on the Agworld Developers Blog, hosted on Tumblr, on June 4, 2012.

Jason Hutchens
The Magic Pantry
6 min readMay 23, 2022

--

If you’ve ever worked with C++ professionally then you’ve most likely read Scott Meyers’ wonderful Effective C++, a collection of specific ways of improving the code you write.

Consider, for example, Item 27: Minimise Casting. Lots of in-depth detail, followed by a few rules to follow when writing new code, or reviewing code written by your colleagues. I’ve worked at more than one place that derived their coding standards, and company hairdo, directly from Scott’s book.

I’ve often thought that there should be a similar collection of wisdom for Ruby. If there was, I’m sure it’d begin like this:

Item 1: Implement Inspect for all Non-Trivial Classes

All Ruby classes derive from Object, which provides to_s and inspect. Both return a string representation of the object instance. General wisdom is that to_s is for displaying the instance to the user, whereas inspect is for displaying the instance to the developer. After all, puts calls to_s, and irb calls inspect to display the return value of issued commands.

If you don’t implement to_s yourself, the default implementation will be used, which simply prints the class name and the address of the instance:

VALUE
rb_any_to_s(VALUE obj)
{
const char *cname = rb_obj_classname(obj);
VALUE str;

str = rb_sprintf("#<%s:%p>", cname, (void*)obj);
OBJ_INFECT(str, obj);

return str;
}

For example:

$ irb
>> Object.new.to_s
=> "#<Object:0x104a4b898>"

If you do implement to_s, it’s probably because you want to format the instance for display to the end user. But, in general, you probably don’t normally implement to_s, do you? You normally build up the string representation of the object manually, at the point where it’s needed. Right?

What’s more interesting is the default version of inspect. It uses the default version of to_s if there aren’t any instance variables. If there are, however, then it uses the class name and the address of the instance followed by a generated string:

static VALUE
rb_obj_inspect(VALUE obj)
{
extern int rb_obj_basic_to_s_p(VALUE);

if (TYPE(obj) == T_OBJECT && rb_obj_basic_to_s_p(obj)) {
int has_ivar = 0;
VALUE *ptr = ROBJECT_IVPTR(obj);
long len = ROBJECT_NUMIV(obj);
long i;

for (i = 0; i < len; i++) {
if (ptr[i] != Qundef) {
has_ivar = 1;
break;
}
}

if (has_ivar) {
VALUE str;
const char *c = rb_obj_classname(obj);

str = rb_sprintf("-<%s:%p", c, (void*)obj);
return rb_exec_recursive(inspect_obj, obj, str);
}
return rb_any_to_s(obj);
}
return rb_funcall(obj, rb_intern("to_s"), 0, 0);
}

The generated string results from calling inspect on all of the instance variables recursively, with care taken to short-circuit the recursion to prevent infinite loops by returning an ellipses wherever an object that’s already part of the string being built is detected:

inspect_obj(VALUE obj, VALUE str, int recur)
{
if (recur) {
rb_str_cat2(str, " ...");
}
else {
rb_ivar_foreach(obj, inspect_i, str);
}
rb_str_cat2(str, ">");
RSTRING_PTR(str)[0] = '#';
OBJ_INFECT(str, obj);

return str;
}

Astute readers will note that I skipped a third case: if a custom version of to_s has been defined, then inspect will use that instead. I glossed over that because this implementation is peculiar to 1.9.x, and differs from the 1.8.x behaviour which, according to Matz in this Ruby issue, is due to return in 2.x and beyond.

In either case, you can see how it works in this example (note the presence of @bar in the stringified return value of foo, the last command issued):

$ irb
>> class Foo; attr_accessor :bar; end
=> nil
>> foo = Foo.new ; foo.bar = 5 ; foo
=> #<Foo:0x1076fb398 @bar=5>

The built-in implementation of inspect may all seem well and good, and useful for debugging. It is handy to see the instance variables. But there’s a lot of custom C code in the Ruby interpreter to support it (I’ve omitted quite a bit), and you’d think it would be easy to leave inspect out of the core language altogether, given how easy it is to build a clone in pure Ruby. In fact, that’s precisely what awesome_print does.

And what about really complex classes? If the string returned by inspect is many screens long, then it’s not really all that useful for debugging, is it?

All good points. But there’s another, insidious drawback to the default inspect that leads to severe performance issues. Both NameError and NoMethodError call inspect when constructing their exception message, and there’s simply no way to monkeypatch around that. Observe:

static VALUE
name_err_mesg_to_str(VALUE obj)
{
VALUE *ptr, mesg;
TypedData_Get_Struct(obj, VALUE, &name_err_mesg_data_type, ptr);

mesg = ptr[0];
if (NIL_P(mesg)) return Qnil;
else {
const char *desc = 0;
VALUE d = 0, args[NAME_ERR_MESG_COUNT];

obj = ptr[1];
switch (TYPE(obj)) {
case T_NIL:
desc = "nil";
break;
case T_TRUE:
desc = "true";
break;
case T_FALSE:
desc = "false";
break;
default:
d = rb_protect(rb_inspect, obj, 0);
if (NIL_P(d) || RSTRING_LEN(d) > 65) {
d = rb_any_to_s(obj);
}
desc = RSTRING_PTR(d);
break;
}
if (desc && desc[0] != '#') {
d = d ? rb_str_dup(d) : rb_str_new2(desc);
rb_str_cat2(d, ":");
rb_str_cat2(d, rb_obj_classname(obj));
}
args[0] = mesg;
args[1] = ptr[2];
args[2] = d;
mesg = rb_f_sprintf(NAME_ERR_MESG_COUNT, args);
}
if (OBJ_TAINTED(obj)) OBJ_TAINT(mesg);
return mesg;
}

Now, that’s an awful lot of C, but the sinister lines are the following two, from around the middle of the previous code sample:

      d = rb_protect(rb_inspect, obj, 0);
if (NIL_P(d) || RSTRING_LEN(d) > 65) {

This calls inspect on the object and THROWS AWAY THE RESULT IF IT’S LONGER THAN SIXTY-FIVE CHARACTERS SORRY FOR SHOUTING!

Now, seriously, OMGWTF. You are going to hit that 65-character limit really easily, with the simplest class that contains a couple of instance variables. And what happens when inspect exceeds 65 characters? Well, in that case, NameError and NoMethodError just fall back to using the default implementation of to_s.

Sheesh.

Exception in View Freezes Rails

Which brings us around to the insidious bug that I investigated when I first joined Agworld.

Every so often, in development and staging and production, our server would grind to a halt, chewing up CPU and allocating memory by the bucketloads. The only solution was to hard kill the process. We found a bug report that seemed relevant. and started contributing (although at the time the bug title was something that suggested WEBrick and Sprockets as the culprit, so it wasn’t as obvious as it is nowadays).

That bug was very active, and it was clear that other developers working on large-scale Ruby apps were suffering. After a few late nights, we tracked it down to NameError and NoMethodError, together with the fact that calling inspect on core Rails objects resulted in very long strings being built, especially if you’re running a large app with lots of routes and classes cached in memory.

Literally the following was happening: A hapless developer would make a typo in a view. They’d visit the erroneous page in their browser. Ruby would throw a NameError. Something up the call-chain would access the message accessor of the exception object. Ruby would call inspect on the object that threw the exception. Half an hour would pass as a massive string was constructed, as practically every object in the Ruby heap was stringified and concatenated. Ruby would calculate the length of the resulting string, determine that 193852374 or whatever is a bit more than 65, throw away the string that it had spent the last 30 minutes constructing, and finally call to_s on the object instead.

Insanity.

We monkeypatched Rails to prevent all of this from happening, as follows:

module ActionDispatch
module Routing
class RouteSet
alias inspect to_s
end
end
end

That is, we found that ActionDispatch::Routing::RouteSet was the gateway to the explosive inspect, and nipped things in the bud by redefining inspect at that point. The result for hapless developer was an instant, useful error message and backtrace.

The fact that it took 7 months for this fix to finally make it into Rails 3.2.3 stable (and yet be mysteriously absent from the changelog) is another story.

Overall, though, the wisdom is this: you probably don’t want to use the built-in inspect. You probably should be using awesome_print when debugging (and you can even wire up irb to use it). Really long inspect strings aren’t that useful in log files. So why not patch Object::inspect with something more useful, or at least implement inspect on non-trivial classes by hand?

Even better would be if the default version of inspect imposed an upper-limit on the length of the generated string, or accepted parameters for maximum string length and recursion depth. And if it was never used internally by the interpreter at all; I really can’t fathom the wisdom of name_err_mesg_to_str preferring inspect to to_s at all.

Item 2: Prefer << to + When Concatenating Strings

Nah, I kid, I’m no Scott Meyers. But, seriously, you should hardly ever be using + for string concatenation. Think of the heap!

This article was originally posted on the Agworld Developers Blog, hosted on Tumblr, on June 4, 2012, and is almost certainly out of date by now.

--

--