From Requirements to Database Schema

Published in

Random Life Journal

6 min readJul 20, 2020

One of the responsibilities of a backend engineer is to keep the data organized and make sure all the operations executed correctly. Most of the time, we will achieve that by DBMS (Database Management System), such as SQLite, MySQL, and PostgreSQL.

To keep the data organized, we need a schema or schematic diagram to describe how the DBMS should look like. As the skeleton of the whole data storage system, it will be extremely difficult to change or modify after the services were launched due to the risk of loss and damage to data.

Since the database schema design is such a daunting task, backend engineers will tend to figure out a general process or framework that help us avoid fatal design error and maybe some technical-debt. In the following content, I would try to design the database schema of a simple content platform like Medium.

Requirements Analysis

What is stored in the database?

data.

Yes, it is surely data, but what do they represent? It looks like a stupid question but I think this is a great entry point to start the database schema design. The reason why we need a database is to store something and make them a kind of resource that can be accessed or operated later.

Now think about the question from another aspect…

What resource should we store?

or even retrieve to the layer of the features of your service or product…

According to the design of our service/product, what kind of resource should we build for our system?

While cooperating with non-engineer teammates, we often talk about the requirements, not the actual schema. This implies that converting the requirements to resources properly is a key ability, and also the first thing I will do.

Almost all the modern services are used-based, which means we need an account to sign in for advanced usage. Although Medium allows guests to read public stories without log in, if you want to clap for the story or follow the author, an account is necessary.

In addition to users, the protagonist of the content platform should be a resource, too. We need storage for all the stories, which makes it possible to load the stories for readers and update stories for authors.

We can also provide tags for each story, and the publications to group up authors in a similar domain. These mechanisms may slightly increase the complexity, but it will significantly improve the efficiency of creating value for both sides of the content platform.

Now we have sorted out some resources that we need to build for a content platform: users, stories, tags, and publications. Different engineers have different ways to analyze, but I prefer to think from the target object of the operation. It may not the best, but it helps.

Database Schema

Recall the above content, the database schema is the skeleton of DBMS that defined how the data should be stored. And that is the next issue…

How to store the resources we have mentioned above?

I will use Python code and some ORM tools to present the schema because it will take a lot of effort to plot the diagram and modify it step by step, I don’t want to do that.

Here is a beginner-friendly article for the concept of ORM:

An Overview of Object Relational Mapping

Writing great software relies not just on the code, but how that code is able to manage and interact with key data…

medium.com

ER Model

Entity-Relationship Model shapes a way of thinking that can portrait the database schema more easily, just take all of the resources as entities first:

Users: hold the profiles of users.
Stories: hold the content of stories.
Tags: hold the tags collection for stories.
Publications: hold information about publications.

Next, focus on the relationships of different entities:

Users can follow other users.
Users can follow more than one publication.
Each story will have one and only one author.
Each story can be marked with more than one tag.
Each story will be included in at most one publication.
Each publication should belong to one owner.
Each publication can group up more than one editor.

The function of Medium is far more complicated than what I listed above, I just extract part of it to practice the skill of system design here. With the entities and relationships, schema could be designed more completely by checking each item within the list.

ORM

First, is the user entity that contains the standard profile like name and introduction, and the following mechanism can be achieved by ManyToManyField within Django ORM directly.

But if we only rely on the implicit table create automatically by ManyToManyField, it would be difficult to append a new key to the intermediate table. So I would prefer specifying a table via the “through” parameter, after all, explicit is better than implicit.

The publication entity are similar to the user, they both have name and introduction, but need more attribute to describe the members it grouped up.

There’s only one owner who owns the publication, and multiple editors authorized to publish or remove stories. In fact, Medium has another role called “writer”, who can submit their story to the publication, but I am not concern about that now, two different roles are already enough to represent the advantage of the intermediary table.

Even though the owner and editors are all the relationships from the entity of publication to the user, but they are different roles with different authorities. And one of the solutions is to append another key “level” to specify the attribute, it should be one of the roles of owner and editor.

Finally, it’s the entities of the story and the tag. Each story must have an author who is just the user publish the story, but it might not need to be adopted into a publication, so we should allow NULL for this attribute.

Since the recommended system is not smart enough to feed us the articles we interested in precisely only depend on the title and the content, tags are really a great solution.

We spend a few seconds to mark the story with several keywords and make the direct-matching algorithm become feasible, this is the consensus by cognition of all platform users.

Conclusion

The above is the brief steps that how I convert the requirements to a database schema, and could be organized into general sections below:

- Requirements Analysis
    - Resources for service or products
    - Resources to access and operate- Database Schema
    - Entities-Relationships Model
    - Django ORM or other implemtaion and management tool you prefer
    - Test the behaviours of your database

hope it helps you :)