Normalization

Arindam Panja
6 min readSep 18, 2023

--

Normalization is the process of minimizing redundancy from a relation or set of relations

What is Normalization?

  • Normalization is the process of organizing the data in the database.
  • Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate undesirable characteristics like Insertion, Update, and Deletion Anomalies.
  • Normalization divides the larger table into smaller ones and links them using relationships.
  • The normal form is used to reduce redundancy from the database table.

Why do we need Normalization?

The main reason for normalizing the relations is to remove these anomalies. Failure to eliminate anomalies leads to data redundancy and can cause data integrity and other problems as the database grows. Normalization consists of a series of guidelines that help to guide you in creating a good database structure.

Here are the important points regarding normal forms in DBMS:

  1. First Normal Form (1NF): This is the most basic level of normalization. In 1NF, each table cell should contain only a single value, and each column should have a unique name. The first normal form helps to eliminate duplicate data and simplify queries.
  2. Second Normal Form (2NF): 2NF eliminates redundant data by requiring that each non-key attribute be dependent on the primary key. This means that each column should be directly related to the primary key, and not to other columns.
  3. Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key attributes are independent of each other. This means that each column should be directly related to the primary key, and not to any other columns in the same table.
  4. Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that ensures that each determinant in a table is a candidate key. In other words, BCNF ensures that each non-key attribute is dependent only on the candidate key.
  5. Fourth Normal Form (4NF): 4NF is a further refinement of BCNF that ensures that a table does not contain any multi-valued dependencies.
  6. Fifth Normal Form (5NF): 5NF is the highest level of normalization and involves decomposing a table into smaller tables to remove data redundancy and improve data integrity.

Normal forms help to reduce data redundancy, increase data consistency, and improve database performance. However, higher levels of normalization can lead to more complex database designs and queries. It is important to strike a balance between normalization and practicality when designing a database.

The advantages of using normal forms in DBMS include:

  • Reduced data redundancy: Normalization helps to eliminate duplicate data in tables, reducing the amount of storage space needed and improving database efficiency.
  • Improved data consistency: Normalization ensures that data is stored in a consistent and organized manner, reducing the risk of data inconsistencies and errors.
  • Simplified database design: Normalization provides guidelines for organizing tables and data relationships, making it easier to design and maintain a database.
  • Improved query performance: Normalized tables are typically easier to search and retrieve data from, resulting in faster query performance.
  • Easier database maintenance: Normalization reduces the complexity of a database by breaking it down into smaller, more manageable tables, making it easier to add, modify, and delete data.

Disadvantages of Normalization

  • You cannot start building the database before knowing what the user needs.
  • The performance degrades when normalizing the relations to higher normal forms, i.e., 4NF, and 5NF.
  • It is very time-consuming and difficult to normalize relations of a higher degree.
  • Careless decomposition may lead to a bad database design, leading to serious problems.

1. First Normal Form –

If a relation contains a composite or multi-valued attribute, it violates the first normal form or a relation is in the first normal form if it does not contain any composite or multi-valued attribute. A relation is in first normal form if every attribute in that relation is a single-valued attribute.

ID   Name   Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
  • In the above table, Course is a multi-valued attribute so it is not in 1NF. Below Table is in 1NF as there is no multi-valued attribute
ID   Name   Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3

2. Second Normal Form (2NF)

The first condition for the table to be in the Second Normal Form is that the table has to be in the First Normal Form. The table should not possess partial dependency. The partial dependency here means the proper subset of the candidate key should give a non-prime attribute.

STUD_NO            COURSE_NO        COURSE_FEE
1 C1 1000
2 C2 1500
1 C4 2000
4 C3 1000
4 C1 1000
2 C5 2000
  • {Note that, there are many courses having the same course fee. } Here, COURSE_FEE cannot alone decide the value of COURSE_NO or STUD_NO; COURSE_FEE together with STUD_NO cannot decide the value of COURSE_NO; COURSE_FEE together with COURSE_NO cannot decide the value of STUD_NO; Hence, COURSE_FEE would be a non-prime attribute, as it does not belong to the one only candidate key {STUD_NO, COURSE_NO} ; But, COURSE_NO -> COURSE_FEE, i.e., COURSE_FEE is dependent on COURSE_NO, which is a proper subset of the candidate key. Non-prime attribute COURSE_FEE is dependent on a proper subset of the candidate key, which is a partial dependency and so this relation is not in 2NF. To convert the above relation to 2NF, we need to split the table into two tables such as : Table 1: STUD_NO, COURSE_NO Table 2: COURSE_NO, COURSE_FEE
Table 1                                    Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000

3. Third Normal Form (3NF)

  • The first condition for the table to be in the Third Normal Form is that the table should be in the Second Normal Form.
  • The second condition is that there should be no transitive dependency for non-prime attributes, which indicates that non-prime attributes (which are not a part of the candidate key) should not depend on other non-prime attributes in a table. Therefore, a transitive dependency is a functional dependency in which A → C (A determines C) indirectly, because of A → B and B → C (where it is not the case that B → A).
  • The third Normal Form ensures the reduction of data duplication. It is also used to achieve data integrity.

4. Boyce CoddNormal Form (BCNF)

Boyce Codd Normal Form is also known as 3.5 NF. It is the superior version of 3NF and was developed by Raymond F. Boyce and Edgar F. Codd to tackle certain types of anomalies that were not resolved with 3NF.

The first condition for the table to be in Boyce Codd Normal Form is that the table should be in the third normal form. Secondly, every Right-Hand Side (RHS) attribute of the functional dependencies should depend on the super key of that particular table.

{ BC->D, AC->BE, B->E }

Explanation:

  • Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can determine all attributes of the relation, So AC will be the candidate key. A or C can’t be derived from any other attribute of the relation, so there will be only 1 candidate key {AC}.
  • Step 2: Prime attributes are those attributes that are part of candidate key {A, C} in this example and others will be non-prime {B, D, E} in this example.
  • Step-3: The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or composite attributes.

Conclusion

This article took you through what database normalization is, its purpose, and its types. We also look at those types of normalization and the criteria a table must meet before it can be certified to be in any of them.

It is worth noting that most tables don’t exceed the 3NF limit, but you can also take them to 4NF and 5NF, depending on requirements and the size of the data at hand.

--

--