Remove DATABASE FIRST APPROACH!

5 min readFeb 4, 2019

Damn how I enjoyed the good old times when I spent days to compose an entity-relationship diagram first!

I remember using actual designer tools next and transform the ERD into database diagram to export to the desired engine’s format.

You had to normalize, denormalize, optimize by data usage, make vocabulary tables, figure out the best types for columns, reliable engine for tables, foreign keys, primary keys, indexes, views, …

It sucked. It really sucked. In an era where the database was the biggest bottleneck and most expensive part of an application, you had to decide how to store data properly before spending time on anything else in the backend.

Or did you?

I remember we have been using those ER diagrams to visualize each customer’s business model and go over and over it again to clarify with the client.
Even defining queries appeared before the code …
Helped a bunch though!

Now that databases are details in most cases, you can completely ignore them until production by putting the implementation of data manipulation on a separate layer outside of your domain (http://blog.cleancoder.com/uncle-bob/2012/08/13/the-clean-architecture.html).
There is even a system that focuses on business first: domain-driven design (https://en.wikipedia.org/wiki/Domain-driven_design)!

What I am saying is, you can react faster to the market, respond earlier to the customer and you can also build a demo way quicker than before. And more importantly: you do not need to hold a full-day meeting just to kickstart a project!

And yet, the planning phase in some companies still start with a detailed, overly optimized relational database model. Sigh!

Choose the right tool for the job!

How would you choose the right one when the details are unknown?

Why would you stick with an RDBMS when the data volume, representation, network throughput, traffic, scalability, liability needs are unknown or unstable?

It is another representation of the data, decided by the needs of the customer, heavily focused on UI and interactions.

Roughly a decade ago

RDBMS promised ACID in the cost of scalability and high-availability. MySQL and such scaled horizontally better than vertically, carrying a single point of failure issue. The lack of use-cases and computing power along with the average internet speed and popularity made slow queries acceptable.

When reliability and speed became a bigger factor, master-slave replications (made reading faster but introduced an inconsistency between writes and reads) and multi-master (increases reliability) stacks spread.

Same data is stored in every machine of the cluster.

How would you send gigabytes of data across networks fast and reliable?
What about running queries swift on huge sets?

Multiple answers introduced several solutions with different bottle-necks, making clusters messy and expensive to operate.

Locks appeared on multiple tables, queries stuck in queues and we were frowning waiting for results.

Most of the applications used key-value caches to avoid database calls to speed up the delivery process. These only lived in the memory and were not reliable, durable enough.

Memcache, Redis, and friends evolved into a system we now call NoSQL. With distributed, easily scaled, fault-tolerant and high-available promises, their popularity, and usage spread fast.

Would you use expensive RDBMS cluster to have a scalable luxury on top of your fast delivery layer?

NoSQL taught us something important: scheme and stored relations are not always necessary.

Imagine spending valued time on mapping extraordinary amount of fields and indexes, etc to a table structure and on a later phase, the designed UI always moves these values together.
The client might need to read them frequently, therefore, it could require a simpler, easier and faster cache.

UI and other representations reveal how you want to show and deliver the data. Storing it in a different format, just because you used to design that first, creates the most expensive tech-debt to fix.

It makes difficult to change or apply subsequent features as it ties the hands of the developers and operations as well.

Refactoring the data sets without altering other collaborators in the project is barely achievable.

But how might one move storage design to a later stage?

In a nutshell: you don‘t give a damn about that structure.

I know, rapid development tools nowadays promote the active record and ORM patterns. Both require a specific understanding of the database model. One might say these are anti-patterns and I would agree, but don’t rush this thought!

The thing is, you can easily define an entity, a business model without knowing the database representation. You do not need to know how the connection is made or how the data is serialized!

You need to focus on the business and its problem first.

Business rarely interested in database representation, it cares about accessing data.

When calling storage in your code is required, depend on interfaces and add their fake (or non-production) implementation later (use dependency inversion principle). You might use in-memory collections, I/O stores, SQLite, basically anything that lets you forget the “how” part.

Create repository, command, query, criteria and other interfaces in your domain service or application layer and move the temporary implementation to the infrastructure layer.

Are these new things to you? Leave a comment and check back in a few days, I might get you covered!

I know, we still need to communicate with the client, record all the requirements of the application, visualize it and put it on a paper. But instead of focusing on the actual implementation, you could discuss the domain, the events that make up the business. Keep the ERD if you want but concentrate on all those details that make up most of what a typical customer cares about.

No need to jump into technical details. Discuss the problems without mentioning databases, tables, access-control!

Infrastructure should be driven by the code, built upon business requirements.

Do I think database-first still viable?

Everything is suitable in the right circumstances. There are other popular and better methods though, focusing on delivering based on domain needs rather than waterfalling the development by database models.

Do you have any rational opinion on why I should design my database first? Leave a comment!

You still kickstart your projects with storage first approach? The “I’ve been doing it for years and it works” is not rational comment, pick a better one :)