Basic Guide to Design a Better Database Schema

Harendra Chhekur
Oct 20, 2020 · 9 min read

Things you should keep in mind while designing database schema for your application.

So let’s talk about designing a database schema, there are actually two major problems beginners mostly do while designing.

  1. The first and very common problem is not dividing the table into multiple table, instead using one common table to have lots of data. Which actually cause few problems since our table gonna have huge amount of data, that could actually make data read/write operations to pretty slow because MySql generally does linear search if data is not indexed. It also make the management of data little harder because we have made it complete mess xP
  2. The second again very common problem is diving the schema into so many tables which could actually make your queries little faster but not in every case because we’re actually having lots of table which mean have to apply lots of joins while retrieving data, that could actually make your queries slower and little bit complex too. Other problem is having management issue because of so many tables and little bit of wastage of storage too.

There is nothing right and wrong about design it’s all about good design so we ‘ll discuss few tips and tricks to design a good database schema.

There is very simple approach to roughly decide, how many tables you should have is to total number of unique entity your application gonna have. Number of tables could be more than that too while normalizing the database schema, we’ll discuss further in the post how to do that too, and number of tables could be less than the unique number of entities too, so you guys are thinking how, right?

so let’s take a simple example to understand this scenario, suppose you have entities like employee, supervisor or manager. Here we have two entities but there is no need to make the separate tables for both the entities because both are the employee to a company with just different privileges so we can just have a self relation to represent both the entities with just one table.

So the thumb rule for roughly deciding number of tables is

# of table = # of unique entities your application have

if you aren’t familiar with the basics of MySql then I would recommend to have a look at this Basics of MySql

let’s discuss the theoretical approach first, wait theoretical approach could little bit tedious to understand so let’s discuss the practical approach instead which is easy to understand too.

  • Whenever you feel that you’re gonna have redundant values in any of your column that directly indicates that your design is not good enough and you should separate that column from the current table so that it won’t have redundant values.

why: because this leads to few problems like updating a common value having multiple records, suppose you have records something like this

USER
╔══════╦══════╦════════════╗
║ id ║ name ║ department ║ ╠══════╬══════╬════════════╣
║ 1 ║ Alex ║ Eng ║
║ 2 ║ Bob ║ Design ║
║ 3 ║ Ada ║ Eng ║ ╚══════╩══════╩════════════╝

so here you can see we have two records having same department Eng but what if you want to change the name of department from Eng to Engineering so you have to update both the rows, query will look something like this:

UPDATE USER SET department='Engineering' WHERE department='Eng';

this query will update both the record which means if we have N records then query will take around O(N) to update all the records but does it really need to run on O(N)? no. We can solve this problem by separation of table and will run in O(1) which will be fast enough, right! let’s see how to do that

solution: so we are going to make a separate table for department which will have one to many relationship with the user .

USER                                 DEPARTMENT
╔══════╦══════╦═══════════════╗ ╔═════╦═══════╗
║ id ║ name ║ department_id ║ ║ id ║ name ║ ╠══════╬══════╬═══════════════╣ ╠═════╬═══════╣
║ 1 ║ Alex ║ 1 ║ ║ 1 ║ Eng ║
║ 2 ║ Bob ║ 2 ║ ║ 2 ║ Design║
║ 3 ║ Ada ║ 1 ║ ╚═════╩═══════╝ ╚══════╩══════╩═══════════════╝

here you can see, we have made department as separate entity have linked with the user with one to many relationship, so now if we want to change the name of department from Eng to Engineering then query will look something like this:

UPDATE DEPARTMENT SET name='Engineering' WHERE id=1;

here this query will just take O(1) operation because it is just going to update one record only.

  • Whenever you feel that you’re gonna have null in some any of your column then again it directly indicates that your database schema is not good enough because any column should not contain null values whatever may happen, but based on your use case if you are getting null values in some your rows then you should separate that column to a new table.

why: there are actually two main reasons to avoid null values. First is you can’t index that column if it contains null values, so this is the major concern because it makes force your queries for full search.

USER
╔══════╦══════╦═══════════════╗
║ id ║ name ║ certificate ║ ╠══════╬══════╬═══════════════╣
║ 1 ║ Alex ║ Ruby Adv ║
║ 2 ║ Bob ║ NULL ║
║ 3 ║ Ada ║ React Basics ║
║ 4 ║ Sheva║ Ruby Adv ║ ╚══════╩══════╩═══════════════╝

here our certificate column contain null values so you can’t index this column, so problem occurs if you want all the record where certificate is Ruby Adv

SELECT * FROM USER WHERE certificate='Ruby Adv';

so this query actually gonna take O(# of records in table) because it’ll do complete search.

solution: so we are going to make a separate table for certificates having one to many relationship with user

USER                 CERTIFICATES
╔══════╦══════╗ ╔═════════╦══════════════╗
║ id ║ name ║ ║ user_id ║ name ║ ╠══════╬══════╣ ╠═════════╬══════════════╣
║ 1 ║ Alex ║ ║ 1 ║ Ruby Adv ║
║ 2 ║ Bob ║ ║ 4 ║ React Basics ║
║ 3 ║ Ada ║ ║ 3 ║ Ruby Adv ║
║ 4 ║ Sheva║ ╚═════════╩══════════════╝
╚══════╩══════╝

here you can see we have made the certificates as a separate table so now there is no null values so can easily index CERTIFICATES.name column and query will use that index for binary search , so let’s have a look at query

SELECT USER.name FROM USER AS U, CERTIFICATES AS C WHERE C.name='Ruby Adv' AND C.user_id=U.id;

so now this query will use Binary Search to search the records instead complete search and will work in O(log(# of record in table)) .

why: Another reason is wasting of storage because if your column contains null value which means it stores nothing but if you have defined your column as CHAR or INT or something, it’ll use the space you have defined for each record like if you have define CHAR[100] then for every row, it’ll gonna use this amount of memory whether it contains some value or not, even in case of VARCHAR which actually takes the storage based on the length of value stored in it. It is also gonna take at least 1 Byte of memory to just store the length because it is kinda mandatory, VARCHAR always use some extra memory to store the length of string stored in it.

solution: Separate that column and make a new table out of it so it’ll just contain the values for the rows that are actually have to.

  • Whenever you feel that you’re gonna have multiple values in any of your single column, which means you want to store multiple id’s or string in a single column whatever reason may be but this is not a good sign for a good design. So let me tell you what people generally does when they have to store multiple values in single column they make that column as a VARCHAR and stores the comma separated strings.

why: This actually leads to a problem where you can’t fetch all the records from that column in just one go, which means you have to write multiple queries to solve this which is not that efficient.

USER                           PRODUCTS
╔══════╦══════╦═════════╗ ╔═════════╦══════════════╗
║ id ║ name ║ products║ ║ id ║ name ║ ╠══════╬══════╬═════════╣ ╠═════════╬══════════════╣
║ 1 ║ Alex ║ 1 ║ ║ 1 ║ Mouse ║
║ 2 ║ Bob ║ 1,2 ║ ║ 2 ║ Pen ║
║ 3 ║ Ada ║ 2 ║ ║ 3 ║ Mobile ║
║ 4 ║ Sheva║ 3 ║ ╚═════════╩══════════════╝
╚══════╩══════╩═════════╝

here you can see we have multiple values in products column which just represents what are the products a user have. Here Bob have two products 1,2 if you have to fetch all the products Bob has, it is not possible with one query because products contain space separated string so you have to split that.

$row = mysqli_query("SELECT products FROM USER WHERE id=2", $con);
$p = array_map('intval', explode(',', $row['product']));
$p = implode("','",$p);
$products = mysqli_query("SELECT name FROM PRODUCTS WHERE id IN(${$P})", $con);

so here we have to break the query in two parts because we have stored comma separated string in the column, let’s talk how can we solve that.

solution: So instead of storing multiple values in a single column, we’ll create a separate new table for the same so that we can store multiple values as separate rows in this table and linked to user with one to many relationship

USER            PRODUCTS                   U_P_ASSOCIATION
╔══════╦══════╗ ╔═════════╦══════════════╗ ╔═════╦═══════╗
║ id ║ name ║ ║ id ║ name ║ ║u_id ║ p_id ║ ╠══════╬══════╣ ╠═════════╬══════════════╣ ╠═════╬═══════╣
║ 1 ║ Alex ║ ║ 1 ║ Mouse ║ ║ 1 ║ 1 ║
║ 2 ║ Bob ║ ║ 2 ║ Pen ║ ║ 2 ║ 1 ║
║ 3 ║ Ada ║ ║ 3 ║ Mobile ║ ║ 2 ║ 2 ║
║ 4 ║ Sheva║ ╚═════════╩══════════════╝ ║ 3 ║ 2 ║
╚══════╩══════╝ ║ 4 ║ 3 ║
╚═════╩═══════╝

so here we have created one more table just to store the association between user and product to define one to many relationship and avoid multiple values in a single column, so now let’s try to fetch all the products Bob have

SELECT P.name FROM USER AS U, PRODUCT AS P, U_P_ASSOCIATION AS A WHERE U.id=2 AND A.u_id=U.id AND P.id=A.p_id;

so now we can fetch all the products Bob have in just one query, which is easy to maintain and faster than our previous solution too.

  • Whenever you have to define some constants or you have some finite number of values which is to be used in different tables, you should never use those values in place itself. Always make a separate table to the finite number of records or may be for constants then reference those into different tables.

why: if you actually store those constants in records itself as value then it actually leads to a very similar problem we have seen at first point, redundancy and because of redundancy management becomes tough like when you have to update those constants, so you have to update it everywhere it is used, which is actually gonna take so much time.

solution: Store them in a separate table and use references so that whenever you have to update the values it’ll be super easy, you just need to update it only one place and it’ll automatically take care of rest of the things.

You should always keep the entity relationship diagram of you database schema somewhere, which actually helps you to easily understand the whole database in small amount of time and other members of the team too.

why: let’s suppose you have to make changes in your database schema after a long time then it’ll very difficult to know the table and attributes of the table, basically if there is no documentation of your database schema then it’ll going to be pretty hard to update the schema along with the time. Having a ER diagram makes the updation process handy ; )

How: there are actually various way to maintain the documentation about your schema like ER diagram, Tabular form etc…

it depends on you whatever makes more sense to you but always keep the documentation of database schema to make the whole updation process smooth!

The Startup

Get smarter at building your thing. Join The Startup’s +731K followers.