DATABASE NORMALIZATION

Arnab Bera
6 min readMay 14, 2023

--

What is DataBase Normalization?

Database normalization is the process of organizing data into tables in such a way that the results of using the database are always unambiguous and as intended. It is a process of organizing data in a database to minimize redundancy and dependency. Normalization is the process of breaking down a database into smaller, more manageable tables and establishing relationships between them. The goal of normalization is to minimize data redundancy, eliminate data inconsistencies, and improve data consistency.

Types Of Normalization :

There are 6 types of Normalization :

  • 1NF — First Normal Form
  • 2NF — Second Normal Form
  • 3NF — Third Normal Form
  • BCNF — Boyce Normal Form
  • 4NF — Fourth Normal Form
  • 5NF — Fifth Normal Form
Normalization Types

1NF — First Normal Form:

First Normal Form (1NF) is the most basic level of normalization, that every column in the table contains atomic values. Atomic values are indivisible values that cannot be further broken down into smaller values. In other words, every column in a 1NF table must contain only one value.

Employee Table without 1NF:

Without 1NF

Employee Table with 1NF:

EMPLOYEE table into 1NF

2NF — Second Normal Form:

Second Normal Form (2NF) is a database normalization technique that requires a table to be in 1NF and every non-key column in the table to be dependent on the entire primary key, rather than on only a part of it. In simpler terms, 2NF ensures that there is no partial dependency of non-key attributes on the primary key.

In above table, the primary key is the “Order ID” column. However, the “Product Name” and “Price” columns are dependent only on the “Product ID” column, and not on the whole primary key. This violates the 2NF rule. To bring this table into 2NF, we need to split it into two tables:

In the new design that satisfies the 2NF, the “Product Name” and “Price” columns have been moved to a separate table that is related to the original table by the “Product ID” key. This eliminates the duplication of product information in the original table.

3NF — Third Normal Form:

Third Normal Form (3NF) is a database normalization technique that requires a table to be in 2NF and eliminates any transitive dependencies between non-key columns. In simpler terms, 3NF ensures that each non-key column in a table is dependent only on the primary key and not on other non-key columns. This helps to avoid data redundancy and maintain data consistency by keeping the data in separate tables based on the relationships between the data.

In the above table, the primary key is composed of the “Student ID” and “Course ID” columns. However, the “Instructor Name” and “Instructor Office” columns are dependent only on the “Course ID” column, and not on the whole primary key. This violates the 3NF rule.

To bring this table into 3NF, we need to split it into three tables:

The new design that satisfies the 3NF eliminates any transitive dependencies between non-key columns. The “Instructor Name” and “Instructor Office” columns have been moved to a separate table that is related to the original table by the “Course ID” key, which ensures that each non-key column in a table is dependent only on the primary key and not on other non-key columns.

BCNF — Boyce Normal Form:

Boyce-Codd Normal Form (BCNF) is a database normalization technique that requires a table to be in 3NF and eliminates any non-trivial dependencies between candidate keys and non-key columns. In simpler terms, BCNF ensures that there are no redundant data dependencies between the candidate keys and non-key columns in a table. BCNF is more strict than 3NF because it eliminates all possible functional dependencies between the candidate keys and non-key columns, ensuring that the table is free from any redundancy and dependencies.

In the above table, the primary key is composed of the “Employee ID” and “Department ID” columns. However, the “Manager ID” column is dependent only on the “Department ID” column, and not on the whole primary key. This violates the BCNF rule.

To bring this table into BCNF, we need to split it into two tables:

The new design separates the “Manager ID” column from the employee information table and puts it in a separate table that only contains department-related information. This ensures that the “Manager ID” attribute is functionally dependent on the “Department ID” attribute, which is the primary key of the new table. By doing this, we eliminate the possibility of redundancy and ensure that the data is structured in a way that meets the BCNF rule, which helps to improve data integrity and consistency.

4NF — Fourth Normal Form:

Fourth Normal Form (4NF) is a database normalization technique that requires a table to be in Boyce-Codd Normal Form (BCNF) and eliminates any multi-valued dependencies between non-key columns. In simpler terms, 4NF ensures that there are no redundant data dependencies between non-key columns that can be expressed as sets of data. 4NF is stricter than BCNF because it further normalizes the data by removing any redundant or unnecessary data that exists in the table.

5NF — Fifth Normal Form:

Fifth Normal Form (5NF), also known as Project-Join Normal Form (PJNF), is a technique used in database normalization to eliminate complex join dependencies between tables. In simpler terms, 5NF helps to ensure that data is organized in a way that eliminates the need for combining data from multiple tables to retrieve complete information. This is achieved by breaking down tables into smaller and more specific ones, where each table contains only unique and non-redundant data. The main goal of 5NF is to improve the efficiency and consistency of the database design, making it ideal for managing large-scale and complex data relationships. 5NF is considered the highest level of normalization, and it ensures that the database can handle complex queries and operations with ease.

Conclusion:

Database normalization is a fundamental process used in database design to reduce data redundancy, improve data consistency, and ensure data integrity. It involves dividing large, complex tables into smaller, more manageable ones, and ensuring that each table adheres to a set of well-defined normalization rules. The ultimate goal of normalization is to create a database that is easy to maintain, update, and query, while ensuring that the data is accurate, consistent, and free from redundancies. By applying normalization techniques, organizations can improve their data management processes, reduce errors, and achieve better insights from their data.

--

--