Turning the Tables: How to Get Along with your Object-Relational Mapper
I have a confession to make: I love SQL. It’s older than me, but it’s still the best query language in common use by wide margin. For a while in the early aughts it seemed like it was being supplanted, but it has resurged and now even the newest most high-tech databases on the planet use it. It’s an example of what’s right in software development.
Most web apps don’t rely on straight SQL however. They hide SQL behind a curtain called an object-relational mapper, or ORM. ORMs aren’t inherently evil. Treated carefully, they can even be your friend. Unfortunately most of the best practices concerning ORMs advocate a usage that’s, at best, suboptimal, and at worst, downright dangerous.
Many MVC frameworks (especially Ruby on Rails) advocate a usage of ORMs that causes more problems than it solves. I call it database denial, and it’s unfortunately common.
Of all the mistakes made in ORM design, the most egregious is the Active Record pattern. In this style of ORM, a class called BlogPost is both a representation of a database table called blog_post and a domain model. A domain model is just a class that represents something in the real world. For example, a class called User is a domain model representing a real-life user.
If you read enough about object-oriented design patterns, this sounds like good practice. It’s a single class responsible for holding both data and the methods that use that data. This is encapsulation, one of the three main tenets of OOP. In web apps though, what looks like a good idea is actually a trap. It’s the cause of several profound but fixable problems.
- God Classes
There are some model classes that inevitably get used everywhere. Usually these are things like User, Product, and Comment that are your most commonly accessed tables and hold the data that most of your domain logic operates on. In any application of non-trivial size, these will grow to hundreds or even thousands of lines of code if you strictly follow the conventions of most frameworks, which advocate skinny controllers and fat models. This makes refactoring intimidating and makes your programs hard to predict. Seeing dozens of methods next to each other with no implicit understanding of how they’re related and what routes they’re used by makes forming a coherent understanding of your program impossible, and makes on-boarding new developers slow and frustrating.
2. Would you like a stub with that query?
Putting your domain logic in your model typically leads to mixing that logic with the database queries for that table. In the most extreme cases, a single model can encapsulate domain logic, database queries, database transactions, sending emails, enqueuing background jobs and a dozen of other things. This is an egregious violation of of the single responsibility principle.
It’s also a disaster for testing. Domain logic is the part of your program that most benefits from unit tests. If it’s mixed with queries, you have to stub those queries in your tests, which makes testing hard, time-consuming and dissuades developers from testing at all. If you can tease apart your domain logic and your queries and put them in separate classes, the classes with the domain logic become embarrassingly testable. Make the classes with your domain logic pure (no side effects) and testability comes naturally.
3. Where do I JOIN?
One of the primary benefits of an ORM is that it abstracts away the details of querying. There’s one problem with that: querying is the most important thing your web app does, and it shouldn’t be abstracted away. If you don’t think about SQL when you use and ORM, you won’t use JOINs and your querying will be an inefficient mess. “But Brad, I can write code faster this way”. Yes, but the moment people start to use your site in significant numbers, your app won’t scale, your page loads will take ten seconds and you’ll spend half your development budget on DevOps solutions to the bottlenecks in your terrible code.
It’s not hard to write SQL friendly code, you just have to use architectures that encourage it. The great misconception about ORMs is that if you use them to generate efficient SQL, it slows you down. In practice, with just a basic understanding of good SQL, it’s not slower at all. For that to be true, however, your class design must be optimized for it, and most major MVC frameworks aren’t built with that goal.
4. View Insanity
In typical MVC best practices, a controller instantiates a model and passes it to a view. This is pulling the pin on the hand-grenade. Views use the properties on the models, which, if you’re not careful, fire more queries. Querying from the view layer is the most damning violation of Separation of Concerns imaginable. In the worst case scenario, a single request might trigger queries in the Model, View and Controller layers. When you do this, it’s impossible to reason about what queries are fired during the course of a single request. What if a query throws an error? There’s no elegant way to recover in a view. Even worse, you’ve now fallen into the trap of N+1 queries, which is what happens when you access a property that fires a query on every item in a list. Good bye reasoning about code. So long gracefully recovering from errors. Adios performance. Hopefully you’ll have time for a rewrite before you have to scale that architecture.
All of these problems are symptom of the same sickness: database denial. This is what happens when the goal of your OOP design is to hide the fact that your app is powered by a relational database. This doesn’t work because ORMs are a leaky abstraction. They can’t be used properly without an understanding of what they’re doing under-the-hood.
Using objects that abstract away the details of querying indeed means you can create a web app without knowing SQL, but that should not be the goal. Mixing domain logic with querying, in particular, leads to problems far beyond just bad SQL, as we’ve already shown. Among the justifications for database denial are:
- I can build things faster
The problem with faster is that eventually it slows you down. You can’t eliminate complexity, you can only sweep it under the rug to be tackled later. When it’s unavoidable that you lift up the rug and fix the problems you’ve created, the result will be more wasted time than you originally saved. In the mean time you’ll have unhappy customers because your pages will take forever to load. Your code will also be buggy because database denial leads to poor testability (see problem #2 above). Fat models and haphazard querying from multiple layers also means INSERTs and UPDATEs can’t be grouped together in a single transaction, increasing the risk that bad data will sneak its way into your database.
If this is a school project or a personal blog, fine. But most people I know who build software intend it to be used by thousands, or millions of people. At that scale, you can’t deploy code that doesn’t optimize querying or you risk not being able to scale your database. If your primary consideration is building in a hurry, this isn’t as bad as it sounds. Once you choose an architecture that’s database friendly and get used to thinking in queries, you can develop just as fast, possibly even faster since it’s easier to figure out what’s actually happening with each request at the database level.
2. What if I want to switch databases later?
This one really makes me laugh. Databases like PostgreSQL and MongoDB are so different in their strengths, their weaknesses and the architecture behind using them properly, that no amount of abstraction can make your code suitable for both. Changing databases is a full rewrite, not something you do on a whim. I’d go as far as saying your database has more impact on how you write your web app than even your choice of programming language. Abstractions that deny this fact are the ultimate YAGNI. It’s like claiming a tank is the best vehicle for picking your kids up from school on the off chance you need to drive through a war zone.
3. Everything you can do with SQL you can do with an ORM
Not even close. SQL is by huge margin the world’s most under-appreciated software technology. Think functional languages are the end-all-be-all of concise, powerful data manipulation? They don’t hold a candle to the amazing things you can do in a few lines of SQL. It was invented in 1974 and its still the most common language for querying because in 41 years, no one has come up with anything better.
All ORMs come up laughably short trying to have feature parity with SQL. ActiveRecord, the ORM that ships with Ruby on Rails, can’t do a LEFT JOIN with more than one condition. All fail terribly on complex sub queries. Most ship with no ability to do bulk INSERTs. If you happily use an ORM without complaining about its shortcomings, you haven’t studied enough SQL.
Ok, ok, so what should I do?
Ready for the shocker? Despite my vitriolic rant against ORMs, I do not advocate ditching them. The reason they’re bad is because there’s a fundamental impedance mismatch between object-oriented class design — which models things hierarchically — and databases — which model things relationally. There’s no easy way to bridge that chasm. ORMs, warts and all, are the best thing we have. You just have to change how you use them to avoid the aforementioned problems. I call this approach turning the tables.
- Ditch the Domain models
As I’ve mentioned before, allowing your models to be both an abstraction over a database table and a domain model at the same time is a mistake. Take the methods out of the model and put them elsewhere so they don’t trigger queries. Then they can be easily tested without stubbing.
Yes, I’m advocating the anemic model pattern, which some brilliant engineers whose books I love decry, but remember, this is specific to server-side, SQL-backed, MVC web apps, which are different than other types of programs because they’re fundamentally stateless. I don’t advocate this because I think it’s ideal, but rather because it’s the best that can be done given less-than-ideal Active Record ORMs.
2. Repositories over Active Record
Active Record style ORMs aren’t the only type of ORM. The other common style are ORMs based on the repository pattern. Categorically, this style of ORM is better. They have their own drawbacks and anti-patterns which I’ll save for another article, but they have fewer inherent snafus than Active Record based ORMs. If you have to use Active Record models, treat them like repositories by using them only for querying. Keep your domain logic out of your models, so that models are only an abstraction of a database table, and nothing more.
3. Operation-Based Slicing
I covered this in a previous article: use your classes to represent operations, not things. In my apps, my classes are not User, Product and Comment, they’re CreateUserCommand, AddProductToCartCommand and PublishCommentCommand. It’s classes as verbs. Each class has only a single public method and they can be composed to form data transformation pipelines. Slicing your app this way allows you to create smaller classes that are easily grokkable. All the methods related to a single operation are in one place where they can be refactored confidently.
Most importantly, do not hold state in instance variables. This allows you to model web apps for what they really are, a series of data transformations between the database and the browser. Since classes as actions makes them similar to functions, which are also actions, you might even call this functional object-oriented design.
This is not a complete divorce from OOP. You can use inheritance for cross-cutting concerns, and if you’re using a statically typed language, you’ll probably want to use polymorphism for dependency injection. It is eschewing encapsulation though. Patterns that store intermediate state in instance variables were created for desktop apps, where objects were instantiated and then stuck around either on the screen or in memory. In server-side web, everything gets created and destroyed in a single request, so there’s no need for that.
4. Use straight SQL where needed
Some things that SQL does awesomely you just can’t do with an ORM. Writing SQL friendly code means using SQL where appropriate. You can’t get around it. You might even enjoy it.
Don’t Deny the Database
The most useful, actionable and distilled advice I can give for using an ORM is to treat it like a thin layer over SQL, and don’t let it trick you into believing the database isn’t there. Yes, your team will need to learn SQL, but that’s inevitable anyway if you want to scale, so you might as well get on board. Use class design to support your database’s strengths: JOINs, transactions, db-level validation, foreign key constraints. Don’t use abstractions that sacrifice database best practices in exchange for some perceived but rarely realized benefit such as rapid development, or database swap-ability.
Long after your favorite MVC web framework has fallen out of favor, supplanted by something newer and better, relational databases powered by SQL will still be there, doing the same great job they always have. If your design is SQL friendly, they’ll be your best ally. Deny them and risk a litany of problems that aren’t easily fixed. Make it a habit and you might even find that you love SQL too.