Reverse engineer a Python object
I faced an interesting challenge at work the other day. I felt like sharing because it might save a few hours for others, or reveal some insights about the Python internals.
There was a function which takes an object (let’s call it Foo) as an argument. I knew how the serialized object looks like, I had it in JSON format. Setting up the whole Python environment and provisioning the database looked like too much trouble, so I decided to restore the original object based on the JSON response. There must be a way, right?
It was pretty straightforward to parse the JSON as an object (of objects). What the following piece of code does it basically recursively turns each dict of the JSON into an object where the properties serve as the keys of the original dict. (Credits to this SO answer)
So far so good, but there was a surprise in the function I wanted to run:
Wait a minute. So this code expects a dictionary in some cases.. What to do now? I was sure about one thing: I don’t want to implement my own JSON parser. Of course I could write a more elaborate object_hook callback function instead of that lambda, but it felt like I’d micromanage things that way. Why not parse it as it is, then align it for the requested format? It’s easy to convert a named tuple to a dict with the ._asdict() call. So I tried
foo.bar.bazs = foo.bar.bazs._asdict()
But of course it raised an AttributeError, since namedtuples are immutable.
Ok, so I googled “mutable namedtuple” and this SO answer suggested me to use recordclass. How cool is this? It overcomes the standard library’s limitation. The only downside is that it’ll be another dependency. Well not quite the only one, as we’ll see. I tried to run the original function:
And it turns out the keys are meant to be objects, not just the ids of those objects. Luckily my JSON response also contains a list of baz objects. So I could just create a dict from them:
This raises an error saying that type: ‘X’ is unhashable. What is hashability? From the Python Glossary:
An object is hashable if it has a hash value which never changes during its lifetime (it needs a
__hash__()method), and can be compared to other objects (it needs an
__cmp__()method). Hashable objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their
So this is not an Excalibur. Namedtuples are immutable for a reason. their hash value is based on their property value’s hash value, and it never changes. Note: Namedtuples are only hashable if their property values are hashable. See:
(The dict object is mutable, therefore it’s not hashable => the namedtuple is not hashable).
It is possible to compare recordclass objects, and it works nicely, however the hash value is for the object’s entire lifetime. For instance you expect
to work, but that’s granted by the namedtuple’s immutable nature. The two separate instances point to the same value.
I could have somehow automagically combine these mutable and immutable data structures together, in the json.loads’s object_hook, but again, it seemed like too much work.
I recently finished a book called Fluent Python and I remembered there was a code recipe for a nametuple like object:
so this returns an object, which is hashable by default (according the glossary), and mutable.
I tried to run the code, no luck as it lacks of the method _asdict(). I could quickly fix this by adding the following method:
This is all good, made my code run. Of course, the hash function is not working as expected. But that’s not too hard to fix. Since most of my objects had an id property, we can say that two objects are equal if their id’s are the same. If there’s no id, we can still rely on the Python object id:
Of course this implementation won’t account for property changes, but we can make this assumption that two instances are equal if they have the same id (or object id).
I hope you enjoyed this reading. If something’s unclear or I missed something, feel free to reach out to me.