Stored Procedures as a Litmus Test
I conduct screening interviews for a variety of different technical positions for a variety of different companies. One of the things that I am always worried about with any technical hire is that their education stopped years ago, with no effective way of learning new things. I wrote about why I think this happens in “Your Lead Architect Doesn’t Really Understand Microservices”, but suffice to say, it’s way too common, and with the rapid advancement of technological tools, you should not hire someone who hasn’t advanced their tool set in the past 10+ years.
One screening device I’ve used a number of times as a “litmus test” for “does this person learn new things in technology” is whether or not the hire would use stored procedures in a greenfield application. I think there are reasonable decisions made to continue using stored procedures in applications that rely upon them, but have viewed stored procedures as a widely-accepted architectural anti-pattern for at least the past five years (and probably closer to ten now).
Before I go any further, let me explain what I mean by “litmus test”. I do not mean that if you happen to think that stored procedures are good for greenfield development that you would be automatically disqualified for any technological job. What I’m actually most interested in is how you react when I start asking you questions about the problems with stored procedures that have caused most people to stop using them. If you can understand the problems and engage and show that you understand the issues and that the problem is that you haven’t been engaged in the architectural community’s discussions of stored procedures for the past 5+ years, then the question is whether this is something that is problematic in the specific job that you would have (e.g., likely not problematic for a VP IT Operations; likely problematic for Senior Application Architect). If, however, you double down on stored procedures without having appropriate responses to the reasons that they have been largely abandoned, then you would be disqualified for not being able to adapt to new, widely-accepted technologies that are objectively better.
Recently, I have been involved in a few debates over whether viewing stored procedures as a bad architectural choice for greenfield applications is an appropriate belief to have. Is it still defensible to build an application that relies on stored procedures that are tightly bound to the database server (e.g., the database server runs the stored procedures; updates to the stored procedures are commits to the database master; certain/many requests to the database must use the stored procedures)?
I have spent a fair amount of time thinking through stored procedures and reading many articles on stored procedures again, but ultimately, my opinion has not changed. I don’t think there’s any good general case for using stored procedures in a greenfield application today. This article is my explanation as to why.
Widely Accepted as Anti-Pattern by Top Architects
It’s worth pointing out that, as far as I can find, no keynotes at software architectural conferences or in blogs like High Scalability to talk about the architectures of highly-available, widely-used code advocating stored procedures today. Perhaps the most likely candidate to use stored procedures would be the somewhat-anachronistic Stack Overflow, which runs on bare metal, uses a Microsoft stack (including single-master SQL Server clusters) and doesn’t have what most would call a “cloud-native” architecture. Yet we can see from a recent blog post on Stack Overflow’s architecture that “Stack Overflow has only 1 stored procedure in the database and [the lead architect] intend[s] to move that last vestige into code.” That’s right — stored procedures are a “vestige” of an older architectural pattern according to Nick Craver.
I’ve spent many hours now over the past few weeks trying to find anyone who writes or speaks about application architectures who has recommended stored procedures in the past five years, and I think the most recent example I found was from 2008 (and even that was quite lukewarm). (I am not including vendor documentation or developer-advocates from database vendors; I’m talking about respected practitioners who aren’t compensated for promoting a particular technology). I don’t necessarily think that this should be dispositive, but it should carry some weight, especially if we are interested in promoting continuing education through practitioners sharing knowledge of what works.
But Couldn’t It Be a Good Idea for Non-Web-Scale Applications?
Let’s assume that good architectures for web-scale applications — which is the main pool for most of the experts who speak on application architectures today — are just different from good architectures for the average application. Let’s assume they’re completely different things, and we have to examine the issue of stored procedures as an architectural choice de novo for the “average” application. How would we do it?
We should start by deciding what criteria for choosing an architecture is valid, and which is invalid.
Given that the whole goal here is to identify which technical professionals are learning new things, we should not accept “the team knows it well” as a valid criteria for our architectural choices. If you let that control, then you are fundamentally rejecting continuing education as a goal in technical hires, since in that case, you will absolutely have to learn things that the team does not know.
I would also reject application speed as a criterion — if this is an average, non-web-scale application, it should be fairly trivial to make it performant for its users. (Some people argue that stored procedures are faster; they can obviously also make things slower because you’re executing potentially-unnecessary code on master database servers, which are very hard to scale. And in general, the first step to improve the performance of an application that uses many stored procedures and is not performant is to move the stored procedures to application code, as is described in the Stack Overflow blog post above). So let us ignore application performance as a criteria.
So what would be valid criteria for various application architectural choices? Here are several that I think are relevant:
- Development velocity: what architectural choices can we make so that we can develop things faster? (a) Note that “with fewer people” is usually faster, since coordination/communication costs are really high in software development. (b) Note also that faster development almost always means “easier to QA” and “easier to deploy”
- Minimize technical debt, maximize agility down the road: what architectural choices can we make so that we can make changes to this application easily? So that we aren’t locked into assumptions that change in the future?
- Easy to staff: what architectural choices can we make so that it is easy to staff this development effort and easy to replace team members down the road? What choices can we make that are easier for the average developer/IT staff member to do successfully (versus requiring expensive, hard-to-find top-tier talent)? Note also that minimizing different skills required means minimizing the number of people needed, which should also mean easier to staff (all other things being equal).
There are a few other criteria that seem like they could be in play, but are less important, like making architectural choices to attract certain talent, or making architectural choices to reduce cost, but I don’t think either of those are nearly as important as the above. (And the main cost is people, so if you succeed in having an easy-to-staff, fast development that maximizes agility down the road, then I think you’ve addressed the most important cost issues anyhow).
We also need to identify the alternative(s) to stored procedures so we know what we’re comparing.
So why are stored procedures used, and what are the alternatives to them?
Answer: we store and retrieve state (data) from a database. In relational databases, we normalize our data storage into multiple tables that relate to each other. And while relational databases often have the ability to create required connections between tables through foreign keys, they do not have the ability to enforce stronger requirements on the data going into and coming out of the database. So, for example, if there is a regular need to insert records into multiple tables at once that relate a certain way, and need certain clean-up and/or validation, inserting those records through stored procedures can centralize within the database layer a certain treatment of how those records work together. And the same thing can be true about retrieving data as well.
The primary alternative to stored procedures is to move the logic that executes within the stored procedure to application code that runs within the application that would otherwise be calling the stored procedure. Usually, the role of the stored procedures is played by an object-relational mapper (ORM), which, at the application layer (instead of the database layer) enforces the way that data is stored so that it has the correct cleaning/validation/relation between tables.
Note that in a microservices-based application, an organization might choose to encapsulate the ORM as essentially a database proxy service and have other microservices (and applications) hit an API that is backed by the ORM.
You can also see non-relational databases as being an alternative to stored procedures; e.g., document storage in something like MongoDB is to some extent a rejection of stored procedures/reworking of the concept of ORMs, where instead of converting objects to/from relational tables, the application is instead just storing the object itself into the non-relational database. Note also that none of the most popular non-relational databases (e.g., Mongo, Cassandra, Elasticsearch, Neo4j, Dynamo) have support for traditional stored procedures.
Let’s Compare: Development Velocity
From a development standpoint, we should see better velocity if our developers (a) need to know fewer things, (b) have fewer dependencies upon other people (especially internal IT), and (c) can test code quickly. Stored procedures present problems to each of these velocity goals.
First, stored procedures are written in SQL, which is (sadly) increasingly less and less known amongst developers. Given the rise of ORMs across all development stacks, it’s been more than 10 years since new developers have had to use SQL to write applications. So as a practical matter, in order to develop applications that use stored procedures, you will need to have developers in whatever language you’re writing your application, as well as separate stored-procedure developers writing in SQL. Note also that ORMs don’t play all that well with stored procedures (since they’re both basically solving the same problem), so your developers who are familiar with just saving their objects with ORMs will need to write custom code that interacts with the stored procedures. None of this helps with development velocity.
Second, because stored procedures are deployed separately from the application, and, because there is usually only one database master, the deployment of stored procedures is usually done separately from application deployments and by separate staff — e.g., a database administrator, who reviews and deploys the stored procedures on her own time. These restrictions almost always also exist in development, test, and QA environments, meaning that any changes to stored procedures cannot be done as quickly and have blocking dependencies on other people. In contrast, ORMs are just part of regular application code and don’t create any dependencies or requirements on any IT staff beyond whatever existing dev/test/QA/prod processes already exist for the application servers.
Third, testing stored procedures (as well as getting effective error reports from stored procedures) is substantially harder — and requires completely separate processes — from testing and getting errors from the ORM, which is just part of the application code. Writing unit and feature tests against the application and ORM is just like writing any other tests against the application — you’ll use the automated testing framework you’re already using. And, when unit-testing the ORM, you can use a mock database source (e.g., SQLite) for speed and simplicity. If you’re using stored procedures, you’ll have to build entirely different tests that have to run on separately-instantiated versions of your database, which will add a lot of time onto your test suite, likely more than doubling your overall testing time.
It’s very hard to me to see any way to argue that development velocity is going to be faster with stored procedures — you’re forcing development into a different environment that is necessarily more protected (by the DBAs) and harder to test. Choosing stored procedures means you need more skills and people on your team, and forces a cross-team dependency that’s just not necessary with ORMs.
Let’s Compare: Minimize technical debt, maximize agility down the road
We would like to make architectural decisions that allow us to make changes to our applications down the road without having to be stopped by crippling technical debt. Anything that locks us into particular choices that we may want to change is a potential problem here, although we must acknowledge that we have to make some choices that may be hard to change just to get things out the door in a timely fashion.
This desire is one of the main reasons for the rise of microservices in application architectures. If you are able to break your application into different, independently-deployable services, you make it easier to rewrite and change the application, as you can attack the rewrite/change one microservice at a time. (Note that the above link to Martin Fowler’s excellent breakdown of microservices does a good job in explaining that starting with monoliths may be better, especially if you’re not quite sure where you’re going to end up).
If you do decide upon a microservices architecture, it’s hard to understand why you would choose to use stored procedures. Stored procedures attached to a database are monolithic: they make it harder to make changes to the business logic (in the stored procedures) or to the data structure (in the database). A true microservices architecture decouples the database from the business logic that writes to the database, which will enable changing the database or the business logic writing to it without having to change the other.
Let’s Compare: Easy to staff
One common challenge that every company developing software has today is that the demand for great software developers far outstrips the supply. So it’s worth picking architectural patterns that work fine if you are hiring average software developers — and you would like to make sure that even if you have a number of quite bad software developers, that they aren’t able to amplify the technical debt into your application through your architectural choices.
We’ve already talked about the fact that using stored procedures requires additional skills versus using ORMs, so let’s set that to side for this topic.
Instead, let’s ask whether it’s easier to write good stored procedures, or whether it’s easier to write good application code that uses an ORM.
This is to some extent a trick question — using an ORM in application code is trivial. You just instantiate your object, and then you load it from the database. Or you save it to the database. All validations and clean-ups you write against the ORM are just done as normal application code within your save/load functions.
In contrast, stored procedures require the skilled DBA to write SQL, which is far from as straightforward as application code. A stored-procedure-heavy architecture is one that is either terrible because of bad stored procedures, or requires substantial talent of the DBA writing the stored procedures to do them properly.
But Wait — Other Reasons to Use Stored Procedures?
There are a few arguments for stored procedures that I ran into in my readings that aren’t addressed by anything I’ve written above, but that I also find unconvincing; this section is for those.
But wait! What if we have lots of different applications that want to pull the data?
One argument I heard for stored procedures was that the reusability of stored procedures makes them simpler and better if many different applications (especially analytical applications running models) were all hitting the database, as opposed to a single application.
I don’t find this argument particularly compelling compared with a standard microservices architecture. After all, it’s often common that when you’re having multiple applications hit a single database, asking for the same type of information, you also want to do a number of other things with those requests (like use your centralized authentication systems, log them centrally, and apply rate limits and other types of controls on them) that can’t all be done in stored procedures.
So this only seems like a good argument if you’re expecting to have a monolithic architecture, and you can’t rely upon everyone accessing your data to be able to do so through an application (which could use an ORM); in other words, you have a lot of people just connecting to the database directly, and you want these people to be able to write to and read from the database with those direct connections. This seems like a highly questionable concept (since you can’t put the general query log on for performance reasons, how will you know who’s doing what to your database?) that seems likely to violate whatever kind of security policy you’re supposed to have.
And add to that: most analytical models end up getting run against data sources that are downstream read replicas of the transactional master databases — this is why we have seen the rise of columnar databases (Vertica, Red Shift, BigQuery) as well as parallel-processing systems like Hadoop and full-text search stores like Elasticsearch. You’ll end up ETLing the data in bulk (which won’t use stored procedures) to another system to do your analysis.
But wait! Don’t stored procedures enforce better data quality and consistency than application code because they run within the database itself?
I have also heard the argument that because stored procedures run within the database, they do a better job of making sure data is stored properly within the database tables than ORMs, which are more disconnected and abstracted from the database.
The problem with this argument is that stored procedures don’t do anywhere a reliable job of making sure data is stored properly in tables. Unlike enforced foreign keys and using STRICT mode in a database, stored procedures can be changed, and don’t actually enforce anything on the data already within the database tables. So even if a stored procedure is enforcing perfect data quality at time N, (a) it may not have at time N-1, and (b) people can still insert data into tables without using the stored procedure. In these ways, stored procedures are no different from the ORM — none of them guarantee anything about the data within the tables. And both stored procedures and ORM-based code can enforce clean-up on loading data, so it’s hard to see any advantage or difference there as well.
Stored Procedures as Vestige
For all the above reasons, I just can’t see a good argument for using stored procedures in greenfield applications, unless it is that you have a team that knows them, and that learning new architectural patterns that would give the team substantial benefits in speed and staffing are not worth the efforts to learn them. But given that all organizations appear to need software development as a core competency, I find that hard to believe and painfully short-sighted.