Rails teams, what are you meant to do with structure.sql?

Published in

Tanda Product Team

5 min readSep 12, 2018

One of the big unsolved mysteries about building a Rails app in a team — at least, in my team — is what you should do about the structure.sql file. I feel like it is far and away the thing that confuses both new and experienced developers.

If you are reading this, please tell me I’ve missed something really obvious about how you’re meant to work with this file.

Why not use schema.rb?

Just to get this out of the way first, we aren’t using schema.rb because we want to support custom PostgreSQL types & constraints, which schema.rb doesn’t support. That said, I think some of the problems we have here would exist even with that.

With that in mind, here’s the problem:

If you have multiple people share a development database, your structure.sql will gradually become more wrong

structure.sql is supposed to describe the correct database schema for a given branch. An example of where this falls apart:

I’m on a branch where I add a new column. I write a migration, run it, and commit the updated structure.sql to my branch, then make a pull request.
While I’m doing this, you create a new branch to change a column default. You write a migration, run it, and commit the updated structure.sql file. Then you make a pull request.
The updated structure.sql that you commit includes my new column.
My pull request is delayed, but yours is approved right away. You merge yours to master.
structure.sql on master now includes my column even though my PR hasn’t merged. It also now includes my “insert into schema_migrations” line, which means I’ll get a merge conflict on my PR (which is more annoying than an actual problem).

structure.sql is generally used for one thing only: to generate a test database before loading seed data and running tests. In this case, the test database will still be generated fine — unless you have a unit test that counts how many columns are in one of your tables, you’ll probably not get any problems. So it’s tempting to be like “eh, it’s fine”. But here’s a not fine example:

I add a new foreign key, write a migration, run, commit, PR.
Tests fail on my PR because the foreign key finds some invalid tests. I fix them on my branch.
You create a new branch and add a new column. You commit the updated structure.sql which includes my foreign key. “Foreign keys are my friend because they ensure referential integrity” you tell yourself, and commit. Your tests fail and you have no idea why.

All these problems can also come up if you’re working on two separate branches yourself. So they can also happen if the database isn’t “shared”.

Why not just give each developer their own database?

It makes it harder for developers to collaborate (“i just replicated the issue in this account, check it out on your local”)
It doesn’t solve the problem if a single developer is making multiple schema changes on seperate branches.
It means each developer needs to learn how to administer their own database. (ie. it means more work for the ops team)
It costs more.
It makes it much harder to replicate real issues with only a seed dataset.

I mean, I’m not fundamentally opposed to this, though I think there are downsides. Does everyone else just do this?

Why can’t everyone only commit their changes to structure.sql rather than committing the whole lot?

This sounds nice in theory but, in my experience, fails in practice. We tried telling everyone to do this, but what ended up happening was that as soon as one person didn’t do it properly the whole thing started to fall apart. (Because as soon as one person gets it wrong we are back in the state described above.)

We found that this was particularly hard to work with when people got merge conflicts. Resolving a conflict would generally require you to understand your changes and changes that had happened on master since you last merged in (since you’d want to keep stuff that’s on latest master and keep your changes). In an ideal world where everyone is only committing their work this is okay; as soon as someone gets it wrong this spirals out of control quickly. A simpler solution is often just to re-run a db:structure:dump so that structure.sql gets re-written without merge conflict indicators, and then commit that.

Finally, our structure.sql is a 13,000 line file. Some git clients crashed showing diffs in it. I don’t necessarily endorse it but I see why you’d just git add . at that point.

Am I just giving up too easily here? Is the answer just to be more strict about this?

Why don’t you build a microservice that stores the correct schema for each branch? Then remove the file from git and grab it in a commit hook.

Sure, there’s plenty of options if you go down this path. But I’d love to find a solution that is the Rails way, and I doubt this would be it.

So what is the solution?

That’s where I’m stuck. At the moment our workflow is pretty much what’s described above, plus jabout once a week someone will describe to someone else that “yes, structure.sql on master doesn’t really resemble production, yes, we are aware of it, no, nobody has good ideas on how to fix it. So just commit the whole file and then if your tests break deal with that.”

ps. I hoped that by writing this doc I’d think of a good solution but no luck so far. Over to you, internet — how do other teams handle this?

While I’m here, a few other niggling issues we’ve mostly worked around:

Everyone needs the same version of pg_dump

We enforce this with the rake task defined here, so it’s only a small nuisance now. Prior to this task, we’d sometimes get people commit a wildly different structure.sql which resulted in big, confusing diffs.

You don’t want to commit every database feature

Specifically, you probably don’t want custom views in your structure.sql, since you probably don’t use them in your app. We have a rake task (defined here) that removes these when you migrate.

structure.sql should ideally only include schema_migrations entries from the branch you are on

We also have a rake task for this (also here). But it doesn’t work perfectly, and even when it does work, you still have the problem of the corresponding schema being incorrectly committed (outlined above).

If you forget to commit changes to structure.sql, your tests fail in confusing ways

Some developers see the huge diff on structure.sql, balk at it, and decide to just not commit the file. Rails won’t let you create a test database if the latest version in your structure.sql doesn’t match the newest file in your db/migrate folder. Instead you get an error. And thus, we have this in our dev wiki:

Rails teams, what are you meant to do with structure.sql?

Written by Alex Ghiculescu