Moving from Inheritance to Composition in Django

A Not-So-Simple Transition

Rowan Hale
Skilljar Engineering
10 min readJun 5, 2019

--

By Rowan W. Hale, a Software Development Engineer at Skilljar

Come, gather ‘round the fire and let me tell you a story of a headache past — a problem annoying enough that even months later I find it hard to think about it without subconsciously shaking my head. My hope is that in hearing about my past follies, you can save yourself a headache or two in the future. And if not, maybe you’ll still learn a little bit about data migrations or object-oriented design or Django. Let’s get started.

It was 2019, and in the wee months of the year, I had picked up some work to add more functionality to one of our online learning systems. This included adding a new data model, one that was intrinsically related to an existing model. The relationship was very much parent-child, so my CS instincts kicked in and I decided to use inheritance to make this new data model a child of the existing one.

I started my dev work, went through a design review, and got things into a workable, but-not-yet-finished state within a week or so. At this point, I had a conversation with a colleague about how we had done some similar work in the past, and noted that we had chosen to use composition, rather than inheritance, to model the relationship between related objects. I considered that it might be worth switching over to composition, but a few things stopped me:

  1. For this particular case, inheritance seemed to be a better logical model
  2. We had to meet a deadline that was fast-approaching
  3. I was 90% of the way done with the work and didn’t want to start overt

After I had the changes going through review and the deadline was a day or two away, that same colleague and I had another conversation. It was at this point that they convinced me it was worth using composition for this model — largely to keep our internal decisions consistent and since the downsides were minimal, if any. The deadline was too close to pivot, so I talked with the product manager and decided to keep it as-is so that we could meet the deadline. I then committed to making some changes to transition from an inheritance model to a composition model for the following week.

It’s worth taking a moment to talk about the data-model-level differences between the inheritance and composition models. In the composition model, Django is using multi-table (as opposed to single-table) inheritance, meaning that each child object has an entry in the child table, which has a reference ID field that links the entry in the child table to the entry in the parent table. In this model, the parent table contains only data that is defined in the parent class, and the ID in the parent entry is the ID of the overall object (the child table has no unique ID field, it just references the parent entry for joining purposes). In the composition model, the child object still has its own table, but it also has a unique ID field for each entry, in addition to a field that references the parent object (which is now a foreign key, rather than part of the same object). See the following diagram for reference.

Given that Django used multi-table inheritance, it appeared that we would already have the object-specific data in its own table. Theoretically, this meant that the work should only involve severing the implicit connection between the two tables, and instead making it explicit (by adding the “parent_id” foreign key). Sounds simple, right? Well, it wasn’t. For a number of reasons. And the resulting work ended up taking over a week and spanned four different deployments. See, I’m already shaking my head again. Ugh. Let me outline and discuss the various reasons that things went sideways:

  1. Dropping inheritance in Django is non-trivial

For those who aren’t aware, Django keeps track of your data model in two ways:

1) by looking at the Python definition of your model and

2) by building the model back up from scratch by applying all of your data model migrations in order.

It’s very easy to drop inheritance from the Python model (just stop inheriting from a parent class), but the migration-based model wasn’t so simple. When Django creates a parent-child relationship, it adds a specific line to its migration, adding the optional “bases” parameter to its migration’s CreateModel call to specify what the parent class is. However, Django provides NO WAY of manipulating this concept of “bases” after a model has been created. The result is that we could remove inheritance in the Python model, but Django would still think that the models were related when building its migration-based view of the world. This wouldn’t necessarily be a problem, but….

2. … Django doesn’t allow field overrides on child classes

If your parent class has a field “foo”, your child class cannot have its own field
“foo”. It can still inherit from the parent, but the child can’t redefine what that field name means for it. The result of this is that when breaking the Python-level inheritance and generating a Django migration (so that you can migrate your database), Django tries to add an “id” field to the child class. We realized this is because the child class is no longer using the “id” of it’s parent (again, remember this is multi-table inheritance under the hood), so the child class now needs an “id” field of its own.

BUT Django’s migration-based model still thinks that the child class has a parent, and since it won’t allow field overrides on child classes, it won’t allow the child to have its own “id” field. So Django is unable to resolve this, and needs manual intervention in the form of … TIME TRAVEL!

The only way to work through this issue is to go back to the original migration that set up the parent-child relationship, and comment out the line of the migration’s CreateModel call that specified the “bases” parameter — in effect, making Django forget that it ever defined a parent-child relationship. With this gaslighting done, inheritance could finally be broken! That is, if it weren’t for the fact that …

3. … Django improperly deals with adding AutoFields after a model’s initial creation

This meant that even after time traveling and gaslighting Django so that we could add the “id” field to the child class (without Django thinking we’re overriding a parent’s field), generating and running a valid migration caused issues. Specifically, when adding a new field to an existing model, the field either has to be nullable or has to have a default value. AutoFields by nature can’t be nullable, so we need to provide a default when adding them.

BUT, since AutoFields are going to define their own default when adding the sequence to the database, you end up in a funky position:

  • If you don’t explicitly define a default for the AutoField, Django won’t let you generate a migration …
  • But if you DO define a default, Django will successfully generate a migration, but error out when applying the migration, because it now thinks it has two default values:
  1. The one we’ve explicitly defined, and
  2. The one that Django chooses when creating the database sequence.

Ok, so adding an AutoField to an existing table can’t be done in a single step, but we can just define the field as an IntegerField with a default. Then, in a subsequent migration, we can change the field to an AutoField and drop the explicit default. Annoying, but workable — right? Wrong! You may have dealt with the “two defaults” problem, but you forgot to deal with the problem of…

4. Django has issues when setting the database sequence for an AutoField that is morphed from a field with a default

After going through the previous migration steps and then trying to create new instances of this child object, I was getting errors that the “id” field was null when it shouldn’t be. But how could this be? We explicitly said the field couldn’t be null and we then provided a default, so why was this field null?

I decided to take a look at the generated SQL for the migration that changed the field from an IntegerField to an AutoField. In the SQL, Django was creating a new sequence and then setting the default for “id” as the next value in the sequence. Sounds reasonable. But there was a line at the end of the SQL that I wasn’t expecting …

The final SQL command dropped the default for the “id” field, because moving from an IntegerField to an AutoField dropped the “default” param in the Python model definition. Once again, Django was getting confused by the differences between the Python models and the database models. Luckily, you can add arbitrary SQL as part of a migration, so I was able to add a line to re-set the default for the “id” field as the next value of the previously-generated sequence.

So after all of this, I had finally moved from an inheritance model to a composition model, but at the cost of a significant time investment, as well as some of my sanity. But let’s take a step back and consider what lessons I learned from this experience:

  1. Be conscientious when considering using inheritance in Django

Inheritance in Django already has its problems (such as capping the depth of inheritance at 2 levels), but this experience pretty firmly placed me in the camp of “avoid inheritance in Django at all costs”. Inheritance is always going to be tempting to pursue because it’s such a natural logical model for so many relationships. However, in Django specifically, I would caution you to consider if the model would make more (or even just as much) sense using a composition model, ESPECIALLY if you think that you would ever change a model from inheritance to composition.

2. Make sure to allocate enough time for tasks and favor over-estimating time costs

Part of why there was such a hesitance to switch from an inheritance model to composition, even after talking to my colleague, was because of the timeline. The deadline was approaching, and it was unlikely that we’d have enough time to pivot. So we took the safe route and continued with what we had been doing to ensure that we met our deadline. If we had a bit more time, it would have been a much easier decision to pivot.

While I don’t necessarily think our original time estimates were bad, there was some ambiguity around trying out an inheritance model since our models were entirely composition-based up until that point. When faced with ambiguity, I recommend over-estimating time costs to ensure you can plan against the worst case. Running short on time (especially for customer-facing hard deadlines) can easily lead to a cycle of people making bad decisions. Bad decisions back people into corners, and people who are backed into corners make bad decisions.

3. Sunk-cost fallacy is a pain in the ass

One other major reason for my hesitation to switch to using composition was that I had already done most of the work for the inheritance implementation. Even if we had more time before the deadline, it’s likely that the investment I had already put forth would have stopped me from abandoning the completed work. This is a textbook example of the Sunk Cost Fallacy at work: a bad decision seems much better than it is, simply because resources have been poured into it already.

If you ignore the sunk cost, it becomes clear that continuing on the original path requires a small amount of work, but results in a bad solution. Alternately, switching, while requiring a larger amount of work, would result in a much better solution. Even though it requires more work, most people would push for the stronger solution in this case (assuming the volume of extra work isn’t prohibitive and there aren’t other restrictions like an approaching deadline).

And thus ends my tale. Hopefully you will learn from my mistakes and now better understand some of the issues associated with Django’s migration process and internal data models.

Interested in learning more about Skilljar? Check us out at https://www.skilljar.com/. We’d love to chat with you if you’re interested in joining the team. You can learn more about open positions at https://www.skilljar.com/about/careers/

--

--