Objective C, Encoding and You

Objective C is a dynamically typed programming language that allows full runtime introspection as well as runtime modification. This means that you can examine, modify, and create classes, methods, instance variables (ivars), properties etc. while your program is running.

For example, the following code snippet creates a new Objective C class at runtime called “MyClass” which overrides the -description method to return @"Hello" .

Creating an Objective C class at runtime

<objc/runtime.h> contains all the functions needed to add ivars, class methods, properties and more should you want to create a more fully featured class.

Note that as far as the runtime is concerned, there is zero difference between ivars and properties declared in @interface declarations and @implementation declarations. The only difference is visibility as far as the compiler is concerned.

You can imagine a system where you defined your Objective C classes in a series of structures, and then looped through them to register them with the Objective C runtime at startup. Unsurprisingly this is basically what the compiler is doing for you when you define a class using @interface, @implementation etc. If you glance through <objc/runtime.h> you will see a number of structures such as struct objc_class which are for the most part what the compiler is generating for the runtime to register.

What I want to focus on in this article is how Objective C encodings work. Line 10 of the code snippet above has

class_addMethod(myClass, @selector(description), methodIMP, "@@:")

which is adding a method -description to myClass that has an implementation defined by methodIMP, but what is the "@@:"? According to the documentation for class_addMethod it is:

* @param types An array of characters that describe the types of the arguments to the method.

which is nice, but you need to look in the “Objective-C Runtime Programming Guide” to find some actual documentation about these characters. Take a minute to peruse the linked page on ‘Type Encodings’ because it’s a good primer on how encodings work and what the @encode directive does. Unfortunately the table of encodings is more than a bit out of date (ex: How are block pointers encoded?) so for the real story we need to go to the clang source code for ASTContext::getObjCEncodingForTypeImpl to discover fun things like block pointers are encoded as @? , and that there are more than a few “historical/compatibility” issues with encodings. I expect some of those TODOs have existed a long time…

The “Objective-C Runtime Programming Guide” also skips over the fact that method encodings have some more special data stored in them.

Method blah = class_getInstanceMethod([NSObject class], 
@selector(init));
NSLog(@"%s", method_getTypeEncoding(blah));

will return @16@0:8. The first @ represents the return type. The second @ represents self and the : represents _cmd. but what about the numbers?

Digging into the Objective C Runtime sources we find that the first number in a method is the stack size of arguments to the method and the numbers after the argument types are offsets in memory to the values represented by the types. In this case we have 2 arguments, self and _cmd both of which are 8 byte pointers on a 64 bit system, thus a 16 byte stack size, and offsets of 0 and 8 for the arguments.

Looking at a more complex method like

- (BOOL)writeToURL:(NSURL *)url atomically:(BOOL)useAuxiliaryFile encoding:(NSStringEncoding)enc error:(NSError **)error;

from NSString, we get a method encoding of c44@0:8@16c24Q28^@36. We can deduce that this is a method that returns a BOOL, has a stack size of 44 (8 for self + 8 for _cmd + 8 for url + 4 for useAuxiliaryFile + 8 for enc + 8 for error). self is at offset 0, _cmd is at offset 8, url is at offset 16, useAuxiliaryFile is at offset 24, enc is at offset 28 and error is at offset 36. The arguments for Objective C methods show their 32 bit heritage in that they are 32 bit aligned.

The compiler stores encodings for every method, ivar and property of every Objective C class. Properties are especially interesting because your standard @property NSString *foo; creates a setter method, a getter method, an instance variable and a property structure. This means four different encoding strings (setter: v24@0:8@16 , getter: @16@0:8, instance variable: @"NSString”, property attributes: T@"NSString",&,V_foo).

As a side note, remember that the runtime also records strings for the method names and instance variable names so you also get foo, setFoo:, and _foo, and that since the language is dynamic, none of this can be dead-stripped by the linker. The Objective C runtime gets pretty verbose.

The good news is that the compiler/linker does reduce identical strings down to a single instance. So all of the NSString* properties in the same mach-o segment (for most cases ‘mach-o segment’ is semantically equivalent to ‘application’, but if you have fat binaries, or multiple frameworks you have multiple segments) will share the same encoding string for their setters, getters and instance variables respectively. In fact all methods that return void and take a single id argument will share the same encoding string (simple getters), as will all methods that return a single id and have zero arguments (simple setters). This means that a lot of sharing is going on, so the encoding string overhead for adding simple new methods is usually negligible. Note that since the name of the property is encoded in the property string, there is much less sharing going on for properties unless you happen to have the exact same name and exact same type for a lot of properties in different classes.

Since they don’t appear to be documented anywhere but the sources, here is the encoding characters for properties:

So aside from showing that properties may add some bloat to your application, and that encodings are a relatively arcane historical feature of Objective C, why do we care about encodings?

Structs.

Standard C struct encoding by itself is relatively benign.

struct aStruct {
int foo;
long bar;
NSString *bam;
};

encodes as {aStruct=iq@} . Interestingly if you want to save a little space, you can do

typedef struct {
int foo;
long bar;
NSString *bam;
} aStruct;

and it will skip the name encoding and encode as {?=iq@} .

What happens if we start nesting structs such as:

struct bStruct {
struct aStruct struct1;
struct aStruct struct2;
};

@encode(bStruct) gives us

{bStruct={aStruct=iq@}{aStruct=iq@}}

which is starting to get a little more verbose, but still doesn’t seem worth writing this article about. Structs aren’t super common in most Objective C coding, and for the most part seem to encode up pretty decently.

What happens if we start considering Objective C++?

class aClass {
int foo;
long bar;
NSString *bam;
aClass();
~aClass();
int doSomething(long foo);
};

encodes to

{aClass=iq@}

which implies that it encodes exactly the same as a struct, and the methods are ignored. Nesting classes works the same as nesting structs.

What about templates? Something relatively simple like a std::map of std::string to std::string.

@encode(ObjCType) expands to:

{ObjCType=#{CppType={map<std::__1::basic_string<char>, std::__1::basic_string<char>, std::__1::less<std::__1::basic_string<char> >, std::__1::allocator<std::__1::pair<const std::__1::basic_string<char>, std::__1::basic_string<char> > > >={__tree<std::__1::__value_type<std::__1::basic_string<char>, std::__1::basic_string<char> >, std::__1::__map_value_compare<std::__1::basic_string<char>, std::__1::__value_type<std::__1::basic_string<char>, std::__1::basic_string<char> >, std::__1::less<std::__1::basic_string<char> >, true>, std::__1::allocator<std::__1::__value_type<std::__1::basic_string<char>, std::__1::basic_string<char> > > >=^{__tree_end_node<std::__1::__tree_node_base<void *> *>}{__compressed_pair<std::__1::__tree_end_node<std::__1::__tree_node_base<void *> *>, std::__1::allocator<std::__1::__tree_node<std::__1::__value_type<std::__1::basic_string<char>, std::__1::basic_string<char> >, void *> > >={__tree_end_node<std::__1::__tree_node_base<void *> *>=^{__tree_node_base<void *>}}}{__compressed_pair<unsigned long, std::__1::__map_value_compare<std::__1::basic_string<char>, std::__1::__value_type<std::__1::basic_string<char>, std::__1::basic_string<char> >, std::__1::less<std::__1::basic_string<char> >, true> >=Q}}}}}

Whoa. That’s over 1KB (1242 bytes to be exact) of encoding for the one type. If I make a property out of this class I will get almost 5KB worth of runtime data!

In fact I started digging into this because I found a C++ class that we had wrapped in an Objective C object in our code that was adding 39.4K worth of data to the runtime. It was by far the longest strong constant we had in our app.

So what can be done if you want to use Objective C wrappers around C++ code (perhaps to make it available to Swift while we wait for Swift C++ interop?) and don’t want to pay the runtime price?

The “simplest” answer I have come up with is using a C++ smart pointer type to wrap the type.

For example usingstd::unique_ptr for our type above expands to

{ObjCType=#{unique_ptr<CppType, std::__1::default_delete<CppType> >={__compressed_pair<CppType *, std::__1::default_delete<CppType> >=^{CppType}}}}

In most cases I use std::unique_ptr to own the pointer. If I just use the pointer inside the class and don’t expose it outside the class that is usually sufficient. In the cases where I need to pass it outside my object (or I want to pass it around inside my object by naked pointer for some reason) I created the stupidest “smart” pointer ever which I call objc_metadata_hider_ptr.

Most C++ gurus would laugh at this code, but you need objc_metadata_hider_ptr because if you use pass a naked ptr type to any Objective C method you are going to get the type info back in your application. Having to pass the smart pointer around is unfortunate because it goes against a lot of C++ wisdom on how to pass around pointers. You are fine to pass the naked type to C/C++ functions of course .

The Objective C runtime considers C++ pointers and C++ references to be the same thing as far as encodings are concerned. Foo& and Foo* both end up encoded as ^{Foo={…}}. Unfortunately objc_metadata_hider_ptr doesn’t work with C++ references. Luckily there is objc_metadata_hider_ref that does…

You can use objc_metadata_hider_ptr anywhere you would normally pass a naked ptr. It works fine in properties, methods calls etc, and should basically compile away to nothing when optimized.

In my 39.4 KB example I referenced above it went down to .2 KB with a simple change to unique_ptr .

Luckily the compiler will prevent you from doing something dumb like declaring a @property with a std::unique_ptr. If it didn’t, the first time you accessed the value using self.foo your class would lose ownership of the pointer.

Unfortunately the smart pointer solution doesn’t help us in the case where we want to have a complex template directly in our Objective C class though. For example:

@interface Foo {
std::unique_ptr<std::map<std::string, std::string>> bar;
}
@end

is still going to blow up in size. The best solution I have found here is an intermediate struct

struct CppType {  
std::map<std::string, std::string> myMap;
};

and then wrapping that in a C++ smart ptr. I made it a struct (instead of a class) so that myMap was public by default. It’s not pretty, but it works.

Note that you don’t need a struct per C++ type in your Objective C class. You could have a single struct that wraps all of your C++ types.

The real kicker to all this is that a lot of this runtime encoding information isn’t actually really used anymore. I doubt the operating system is doing anything interesting with 5K worth of C++ template spew for an instance variable other than knowing that it is there. Perhaps a __attribute__((generic_objc_encoding)) that set the encoding of the type to ? for specific types would be a useful optimization (Found this bug which seems to be appropriate, and apparently came up with a similar solution to mine).

Final Note

Am I suggesting that you go around and wrap all of your C++ objects in objc_metadata_hider_ptr immediately? No. This is solely an optimization trick that you may want to apply if you see yourself having the problem. Easiest way to look for it is to run strings Foo.app/Foo | grep -e '{‘ and see what pops up. You may be surprised.

You may also want to track https://bugs.llvm.org/show_bug.cgi?id=39888 which is a clang tide check for watching for large Objective C encodings.