100$ photo by Pepi Stojanovski (@timbatec) on Unsplash
Download this free HD photo of money, cash, currency and 100 by Pepi Stojanovski (@timbatec)
Now days, it is practically the norm in the industry for relational database schemas to be managed through application code instead of manually written and handled SQL scripts. Pretty much every RDBMS has technologies developed officially by vendors or by the community to do this using what is known as “database migrations”. However, this strategy wasn’t as popular 5 to 10 years ago or more, especially in smaller and less experienced teams that were not usually exposed to the best practices and techniques often, and were pressured by the business to just deliver results quickly.
This was the case of my current team when I joined. One of the software products of the company is a very sizeable project that’s been under development for over 10 years, and the engineers that first worked on it all that time ago chose to go with the typical “Database First” approach. This might have been an easy decision back then since there were not many (if any at all) robust tools to do it any other way in the technology stack they were working with.
Just as with any commercially distributed data driven applications, developing, releasing and updating the product came with a lot of database related challenges that a Database First approach doesn’t solve well. Here I will describe how the team chose to solve those challenges when the project started, the difficulties and problems that arose as a result, and how I managed to successfully migrate to a Code First approach that drastically improved the entire cycle. It is important to remark that our product is an web based on-premises software, meaning that each of our clients installs an instance of the software in a server that belongs to them inside their network.
This was the situation when I joined the team:
- All developers in the team, development and QA servers were connected to a single, shared database in a network accessible database server.
- Necessary changes in the schema were manually crafted by each developer in SQL scripts that they would apply to the shared database.
- This SQL scripts would be then checked in to version control inside a `database` folder. Each developer had to name the file using a convention: a timestamp followed by a descriptive name of what the script did.
- When schema changes were applied to the database by a developer, they would need to update the model in the code.
- On each release, the implementation / support team would grab the SQL scripts from that folder that were above the timestamp of the latest release, and apply those to the database of each of our clients when an updated was needed.
If you have any familiarity or experience working with data driven applications, you already know how terrible the previous workflow is and all the possible problems it could case. Its main issue it’s that it is almost entirely manual, with very little automation. Any process that’s manually executed by a humans is prone to errors, and errors happened constantly. Here’s a rundown of all the issues with this approach and everything that could go wrong (and that therefore, went wrong):
- Mistakes caused by a developer would impact the entire team, since the database was shared.
- Sometimes developers working on different branches and different features had to work with the same database entities. This was not possible with that set-up without conflicts or difficulties.
- Often the work that other developers were doing in the running app affected the scenarios that another developer was working on. I particularly remember a few times were the whole team had to stop running their local instances of the app (and therefore, stop working) so that one developer could debug a new feature he was working on without being affected.
- Developers would not follow the SQL scripts file naming conventions, or they would follow it differently. Since there was no automated tool to check or validate this, this mistakes would occasionally slip by, causing a lot of confusion and problems later down the road for both the development team when developing new features or the implementation team when preparing releases.
- Developers would make changes to the database and forget to roll them back if they changed their approach.
- Developers would make changes to the database and forget to check in their changes to version control.
- Developers would forget to update the model in the code to match their schema changes, resulting in run-time errors that sometimes were not caught by anyone in our team, and made their way into production.
- When doing code reviews, each reviewer had to keep an open eye to check for any possible missed detail related to the database workflow, increasing the mental load and overall difficulty of code reviews and also decreasing their effectiveness.
- Due to all this, no one had real certainty that the state of the shared development database would match the state of a production database after an update.
All these issues costed thousands of dollars in losses for the company, and what’s worse, damaged the image of the product and the institution towards the clients. It also caused a lot of frustration to all technical teams members that had to work in these conditions, myself included.
We reached a point where we were going to start working in new and big functionality, which would require the focused effort of the entire team and a lot of database related work. It was clear to me that with the workflow set in place at the moment, going into this new project would be a nightmare. I managed to convince the business folks and was granted a small period of time to do something about it. The idea was to shift from the Database First approach to a Code First approach and target each of the weaknesses laid down in the list above, while retaining compatibility with all the instances of the product that were already in production, obviously without any data loss and minimal or no breaking changes at all.
We were using .NET Framework and Entity Framework, and thankfully Microsoft had already developed some tools for this process which I thought would be sufficient (https://msdn.microsoft.com/en-us/library/jj200620(v=vs.113).aspx). I used them to reverse engineer our new code-based model, and then scaffolded a migration that would presumably be enough to recreate the entire schema from scratch, but promptly realized after examining it that there are a few configurations that are just not possible to represent using Entity Framework’s Code First technology or the reverse engineering tools, mainly stored procedures and some complicated indices. At that point we had to compare the generated schema, with the schema we already had in our shared existing database to look for the differences, then retrieve those differences and manually add them to the generated migration. It actually took us several months to completely get this migration to the point where it would faithfully recreate the database since new missed details would pop up from time to time.
As I said, the point of this initial migration was to get a SQL script that we could use to create a database from scratch with the schema up to that specific point in time prior to switching to the Code First approach and replace the historical and manually maintained `database` folder. We would then use this script instead to create it locally in each of our isolated development environments and also when installing the latest version of the software in new clients. In other words, this migration was merely transitional and was deleted after the desired SQL script was generated.
What was left was an additional script to seed the database with the necessary basic data for the app to function properly. After that was done, we removed the transitional code migrations and checked in the isolated SQL scripts, which we would call with a Node.js script to drop and recreate the database in any environment when needed. We then updated our deployment process to run pending migrations automatically on release.
From that point on, we would follow the typical workflow with Code First Migrations: changing the schema by doing changes to our model in the code and scaffolding migrations to represent them. It was essential that any changes that we included in these migrations didn’t already exist in the schema from the Database First era, or otherwise they would fail when we tried to apply them in a previous version of the application that had an already created database. This was one of the biggest sources of danger since it became instinct for the rest of the developers on the team to add changes in migrations, including ones that were already part of the previous version without considering backwards compatibility.
So, to compare with the previous workflow, the situation after we finished was:
- Every developer AND server was connected to his/its own local database. No more database sharing.
- Changes in the database were automatically translated from changes in the code. No more manual changing.
- Migrations are generated and applied by a tool, so no more inconsistent naming or related problems. If any developer forgot to add a migration, the application would simply not run and would be evident immediately.
- Schema related changes only needed to happen once, in the code.
- The implementation & support team no longer had to deal with database versioning themselves since it was already handled by the application itself and the deployment process.
Basically every weakness of the previous process was directly eliminated with this effort, and all the resulting issues vanished. This not only allowed us to develop faster, but also with more confidence, less stress, fewer bugs and of course, less budget loss.