How to Make a Good Database Design?

Peter Lee
The Startup
Published in
8 min readJul 24, 2020

How to make a good database design? Why we should create a good design database? Database design is an essential skill of a software engineer. In some interviews, the interviewer can ask you a few questions about it. As far as I know, we have some database principles. There are a lot of definitions about them and you can search on google for more details. Based on my experience, I’ll write it simply.

After reading this article, you will understand things:

  • What is a good database design? Why we should create a good design database? How to make a good database design?
  • Design process
  • Define and use some rules
  • Normalization Rules
  • Integrity Rules
  • Column Indexing
  • Some notes and advice when we design a database

Database Design Overview

Firstly, What is database design?

“Database Design is the organization of data according to a database model. The designer determines what data must be stored and how the data elements interrelate.” Source: wikipedia.org

Database design is a part of the Design Process when we develop software. Before doing database design, we have to complete software architecture (N-tier layer, Microservice, …) at the high-level. Database design is a very important step at the low-level. Design Process often creates by Senior Software Engineer or Software Architect who has a lot of experience in the IT field.

Development Process. Source: Internet.

With a medium or big system, we usually choose and combine some databases to achieve our purpose. We need to support transactions and relationships: MySQL or PostgreSQL or SQL Server. We need to save flexible data: MongoDB(unstructured data). Support caching (Redis: key-value, sorted set, list, ..), support full-text searching(Elastic Search, …), and so on.

Depends on your project, you should choose and combine some databases appropriately and wisely. There’s not the best database, only have database appropriately. We should take advantage of databases and know the limit/issues of them. In this article, I’ll only write about DBMS(Database Management System): MySQL. The reason is it’s complex more than NoSQL database such as MongoDB, Redis, and so on.

In some projects, the Senior Software Engineer or Solution Architect could request to make a Class Diagram and ERD (Entity Relationship Diagram). What the difference between the Class Diagram and ERD?

  • The class diagrams are used to represent the main object or building block of the system. They are used to show the relationship of one class with another and also represent the attributes of the system.
  • However, and ERD is more of a database in the form of tables. They don’t show individual relationships but relationship sets as well as sets of entities. They show the type of information that needs to be stored in the database.

In my opinion, we should make ERD and don’t create Class Diagrams unless we have some special reasons. This depends on your project.

What is a good database design?

A properly designed database provides you with access to up-to-date, accurate information. Because a correct design is essential to achieving your goals in working with a database, investing the time required to learn the principles of good design makes sense.

Key points of good database design:

4 key points of good database design.

Design Process

You should make sure you make the right decisions by using these guidelines. In my opinion, the design process includes the following steps:

Design Process
  • Step 1: Define the purpose of the database based on business requirements. Example: you wanna build a system displaying Olympic Tokyo 2020 information(news, results, live matches, and so on): Summer Olympic Games — Tokyo 2020, Summer Paralympic Games — Tokyo 2020.
  • Step 2: Find and organize the information required. Example: Summer Olympic Games — Tokyo 2020: https://odf.olympictech.org/2020-Tokyo/tokyo_2020_OG.htm and Summer Paralympic Games — Tokyo 2020: https://odf.olympictech.org/2020-Tokyo/tokyo_2020_PG.htm
  • Step 3: Define and use some rules: name conventions (ex: lowercase), all tables are required: id field, created_at, and updated_at field, and so on.
  • Step 4: Divide the information into tables, specify the primary keys. Example: games_competition_group, games_competition, games_event, games_event_phase, games_unit(match), and so on.
  • Step 5: Determine the relationships among tables: one-to-one, one-to-many, many-to-many
  • Step 6: Refine your design & Normalize the design: analyze your design for errors. Create the tables and add a few records of sample data. See if you can get the results you want from your tables. Make adjustments: adding more columns or removing columns if needed, create a new table for optional data using a one-to-one relationship, split a large table into two smaller tables, and so on.
  • Step 7: Adding Indexing: single column or multi-columns.

Define and use some rules

In my experience, we should define and use some rules when we make a design database. Every member in the team has to abide by them. Here are the rules of my team:

  • Choose Engine MySQL: InnoDB and charset: utf8 or utf8mb4.
  • Naming conventions: all names are snake cases (lowercase). Example: games_event, games_result, games_event_phase, and so on
  • All tables are required: id field (PRIMARY KEY and AUTO_INCREMENT), created_at, and updated_at field. Except for some special cases.
  • All tables have the same prefix. Example: games_ or sports_ or empty prefix.
  • All tables have to name in Singular or Many. Example: games_person or games_people, games_textblock or games_textblocks.
  • All Columns: have to add a comment for column and always set the default for integer (Ex: 0), varchar field (Example: ‘’), and unify in the whole system.
  • Id field: PRIMARY KEY set int(10) unsigned (Not set int(11) 2147483648 max value and -2147483648 min value)
  • Boolean field: choose tinyint(1) type. Should be set `display` name. Ex: `display` tinyint(1) unsigned NOT NULL DEFAULT ‘1’ COMMENT ‘`0`: hidden/false, `1`: visible/true’
  • Status field: choose tinyint type. For example, tinyint(2)
  • Int field: choose int or mediumint or bigint if determine you will work with large data. Example: A case study of views on youtube. Example: we should use int(3) for weight (kg) of athele instead int(5). Because int(3) has maximum 999. There isn’t any athele who is 999 kg. Ref: https://dev.mysql.com/doc/refman/5.7/en/integer-types.html
  • Text field: choose default varchar(255) type. Depends on large of text you should choose tinytext — 255 Bytes(255 characters), text — 64KB (65,535 characters), mediumtext — 16MB (16,777,215 characters) and longtext — 4GB (4,294,967,295 characters). Ref: https://chartio.com/resources/tutorials/understanding-strorage-sizes-for-mysql-text-data-types/. BLOBs are an alternative type of data storage that share matching naming and capacity mechanisms with TEXT objects. However, BLOBs are binary strings with no character set sorting, so they are treated as numeric values while TEXT objects are treated as character strings. This differentiation is important for sorting information. BLOBsare used to store data files like images, videos, and executables.
  • Naming Indexing: single column format: column_name_idx, multiple column format: column_a_column_b_idx. If has >=3 columns, we should choose a name appropriately
  • Naming Unique Indexing: single column formart: column_name_unique, multiple column formart: column_a_column_b_unique. If has >=3 columns, we should choose a name appropriately.
  • Relationships: to avoid eager/lazy loading. We should not use @ManyToOne, @ManyToMany, @OneToMany in your code. Instead of that, we use the foreign key as an integer field.
  • Metadata: If you’re using MySQL <= 5.6: Using JSON in String; Otherwise, you’re using MySQL => 5.7: Using JSON Data Type which provides JSON Column Indexing. References: https://dev.mysql.com/doc/refman/5.7/en/json.html
  • Dynamic column/attributes: It’s flexible data and the table contains columns: id, foreign_key, type(long, text, integer), property/attribute, and value. Otherwise, we can consider using Entity-Attribute-Value (EAV). Ref: https://inviqa.com/blog/understanding-eav-data-model-and-when-use-it
  • Partition Data: Split data into a lot of tables if you have large data.
  • NULL Value: NULL is not a data type — this means it’s not recognized as an “int”, “date”. Arithmetic operations involving NULL always return NULL (69 + NULL = NULL). Finally, NULL is simply a place holder for data that does not exist(missing information and inapplicable information). We only use IS NULL or IS NOT NULL (don’t use =, <>).

Here is an example:

An example of creating a table in DB.

Normalization Rules

Apply the so-called normalization rules to check whether your database is structurally correct and optimal.

  • First Normal Form(1NF): this is known as the atomic rule. Use the one-to-many relationship to follow 1NF.
  • Second Normal Form(2NF): it’s 1NF and every non-key column is fully dependent on the primary key.
  • Third Normal Form(3NF): it’s 2NF and the non-key columns are independent of each other. Ex: price and discount column in the product table.

Note: We have a lot of higher Normal Forms: it’s the 3NF and higher normal form. In my opinion, you should use 3 above rules that are enough. Some times, we can break these rules. For example, you need to save metadata in a column data JSON type (this violates 1NF)or we wanna high performance in the report feature: we’ll add more columns that can take a long time to calculate the final result (this violates 3NF).

Integrity Rules

You should also apply the integrity rules to check the integrity of your design:

  • Entity Integrity Rule: the primary key can’t contain NULL. Otherwise, it can’t uniquely identify the row. (includes: multiple column’s primary keys).
  • Referential Integrity Rule: each foreign key value must be matched to a primary key value in the table referenced (or parent table). Most RBDMS can be set up to perform the check and ensure referential integrity but I highly recommend you doing it manually with medium/big project.
  • Business Logic Rule: besides the above two general integrity rules, there could be integrity (validation) pertaining to the business logic. Example: competition code or unit code or phone number have to correct format before inserting into tables. These could be carried out in validation rule(for the specific column) or programming logic.

Column Indexing

You could create an index on selected columns to facilitate data searching and retrieval. An index is a structured file that speeds up data access for reading but my slow down for updating. Notice that the index needs to be rebuilt whenever a record is changed, which results is overhead associated with using indexes.

The index can be defined on a single column, a set of columns. You could build more than one index in a table. Most RDBMS builds index on the primary key automatically.

Note: The EXPLAIN statement provides information about how MySQL executes statements. Ref: https://dev.mysql.com/doc/refman/5.7/en/explain-output.html

Some notes and advice

  • You should learn and apply the above design steps to your project appropriately.
  • Besides, Applying normalization and integrity rules is very important. Don’t forget them when you design !!!.

References

--

--