Since 1.0, Django’s supported model inheritance. It’s a neat feature, and can go a long way towards increasing flexibility in your modeling options.
However, model inheritance also offers a really excellent opportunity to shoot yourself in the foot: concrete (multi-table) inheritance. If you’re using concrete inheritance, Django creates implicit joins back to the parent table on nearly every query. This can completely devastate your database’s performance.
To refresh, if you’ve got models like:
class Person(Model):
name = CharField()
...
class Manager(Person):
department = CharField()
...
That’s concrete inheritance. Django creates two tables: a Person table and a Manager table. Since the name field will only exist on the Person table, every time you look up a Manager Django will use to generate a join against the Person table to get the Manager‘s name. These joins tend to be “hidden” — they’re created automatically — and mean that what look like simple queries often aren’t.
If, on the other hand, you’ve got models like:
class Person(Model):
name = CharField()
...
class Meta:
abstract = True
class Manager(Person):
...
That’s abstract inheritance and Django only creates a single table, the Manager table. Any model that subclasses Person will have the Person’s fields copied onto the child object. This means that looking up a Manager doesn’t require an extra join.
In nearly every case, abstract inheritance is a better approach for the long term. I’ve seen more than few sites crushed under the load introduced by concrete inheritance, so I’d strongly suggest that Django users approach any use of concrete inheritance with a large dose of skepticism.
Comments:
Not to mention that with concrete inheritance, serialization will only include the child classes fields not the parent's. That bit me recently when trying to inherit from django.contrib.comments.models.Comment.
If this is a gotcha, perhaps we should issue a warning? :-)
As former lead dev of The Texas Tribune, we used concrete inheritance without any issues, and it made life a lot easier than having polymorphic relationships between our various content types. We had a base "content" class with several subclasses: story, blog post, video, etc, and had several hundred thousand records. We were seeing page load times of 1 - 2 seconds on our home page, un-cached, and it was pulling around 100-150 records with various fk and m2m relationships running on EC2.
Given that I'm not a dba, it would seem there are performance implications both ways. Are generic foreign keys that much less performance intensive? I think you would end up with about the same number of joins using that methodology too, and it makes getting to your subclasses objects more difficult imo than just being able to have a foreign key to the base class.
Recommendations?
Kindest regards, Brandon
Leave a comment: