A Brief Guide to Database Normalization

Understanding the Basics and Advanced Levels of Database Normalization

Leah Nguyen
6 min readJul 18, 2023

If you’re new to database design, you’ve probably heard about database normalization. This is the process of organizing data in a database so that it is consistent, efficient, and easy to manage. There are several levels of normalization, each with their own benefits and use cases. In this article, we’ll cover the basics of normalization, including first normal form (1NF), second normal form (2NF), third normal form (3NF), and other advanced normal forms.

Normal Form

Concept of normalization and normal forms were introduced, after the invention of the relational model. Database normalization is an essential procedure to avoid inconsistency in a relational database management system. It should be performed in the design phase. To achieve this, redundant fields should be refactored into smaller pieces.

Normals forms are defined structures for relations with set of constraints that relations must satisfy in order to detect data redundancy and correct anomalies. There can be following anomalies while performing a database operation:

  • insert: data is known but can not be inserted
  • update: updating data requires modifications in multiple tuples (rows)
  • delete: deleting some data causes some other data to be lost

First Normal Form has initial constraints, further normal forms like 2NF, 3NF, BCNF, 4NF, 5NF would add new constraints cumulatively. In other words, every 2NF is also in 1NF; every relation in 3NF is also in 2NF. If all group of relations are represented as sets, following figure can be drawn:

First Normal Form (1NF)

First normal form (1NF) is the simplest level of normalization. It involves ensuring that each table in the database has a primary key and that each column in the table contains atomic values. In other words, each row in the table should have a unique identifier, and each value in the table should be indivisible.

Let’s take an example to understand this better. Consider a table that stores information about employees. The table might have columns like employee_id, name, address, and phone_number. However, the address column could contain multiple values, like street name, city, state, and zip code.

Example Table

To bring this table to 1NF, we need to split the address column into separate columns, each containing a single value.

1NF Output

Second Normal Form (2NF)

Second normal form (2NF) builds on the foundation of 1NF and involves ensuring that each non-key column in a table is dependent on the primary key. In other words, there should be no partial dependencies in the table.

Let’s continue with our employee table example. Suppose we add a column for department to the table. If we find that the value in the department column is dependent on the employee_id and name columns, but not on the phone_number column, we need to split the table into two tables, one for employee information and one for department information.

2NF Output

Third Normal Form (3NF)

Third normal form (3NF) builds on the foundation of 2NF and involves ensuring that each non-key column in a table is not transitively dependent on the primary key. In other words, there should be no transitive dependencies in the table.

Let’s take another example. Consider a table that stores information about books. The table might have columns like book_id, title, author, and publisher.

However, the publisher column could be dependent on the author column, rather than on the book_id column. To bring this table to 3NF, we need to split it into two tables, one for book information and one for author information.

3NF Output

BCNF — Boyce-Codd Normal Form

Boyce-Codd Normal Form (BCNF) is a higher level of normalization than 3NF. It is used to eliminate the possibility of functional dependencies between non-key attributes. A table is in BCNF if and only if every determinant in the table is a candidate key.

To understand BCNF better, consider a table that stores information about students and their courses. The table might have columns like student_id, course_id, instructor, and instructor_office. In this table, the determinant is course_id, and the non-key attribute is instructor. However, a course can have multiple instructors, so there is a possibility of functional dependencies between non-key attributes. To bring this table to BCNF, we need to split it into two tables, one for course information and one for instructor information.

BCNF Output

Fourth Normal Form (4NF)

Fourth Normal Form (4NF) is the highest level of normalization and is used to eliminate the possibility of multi-valued dependencies in a table. A multi-valued dependency occurs when one or more attributes are dependent on a part of the primary key, but not on the entire primary key.

To understand 4NF better, consider a table that stores information about employees and their skills. The table might have columns like employee_id, skill, and proficiency_level. In this table, the primary key is a combination of employee_id and skill. However, the proficiency level is dependent on the skill, but not on the entire primary key. To bring this table to 4NF, we need to split it into two tables, one for employee information and one for skill information.

4NF Output

Fifth Normal Form (5NF)

Fifth normal form (5NF) is the highest level of normalization and is also known as Project-Join Normal Form (PJNF). It is used to handle complex many-to-many relationships in a database.

In a many-to-many relationship, where each table has a composite primary key, it is possible for a non-trivial functional dependency to exist between the primary key and a non-key attribute. 5NF deals with these situations by decomposing the tables into smaller tables that preserve the relationships between the attributes.

To understand this better, consider a database that stores information about movies and their actors. The tables might have columns like movie_id, actor_id, character_name, and salary. In this database, it is possible for a non-trivial functional dependency to exist between the primary key (movie_id, actor_id) and the salary attribute.

To bring this database to 5NF, we need to decompose the tables into smaller tables. For example, we might create tables for movies, actors, and characters, and then use a join table to connect them. Each table would have a single primary key, and the join table would include foreign keys to the other tables.

5NF Output

Reflection

Today, many organizations rely on databases to store, manage, and retrieve their data. In order to ensure that the data is organized in a way that is both efficient and consistent, normalization is often used. There are several levels of normalization that can be applied, with 1NF, 2NF, and 3NF being the most commonly used.

In addition to 1NF, 2NF, and 3NF, there are also advanced normalization techniques such as Boyce-Codd Normal Form (BCNF), Fourth Normal Form (4NF), and Fifth Normal Form (5NF). BCNF is used to eliminate the possibility of functional dependencies between non-key attributes. 4NF is used to eliminate the possibility of multi-valued dependencies in a table. 5NF, also known as Project-Join Normal Form (PJNF), is used to handle complex many-to-many relationships in a database.

While these levels of normalization can provide further data consistency and management benefits, they can also result in more complex table relationships, slower queries, and larger numbers of tables. Therefore, it’s important to carefully consider the use cases and benefits of each technique before applying them in database design.

Normal Forms Comparison Table

--

--