Normalization of Database, the Easy Way

Gursewak Singh
The Startup
Published in
8 min readAug 5, 2020

20% of the world’s data is stored in a Structured way in the form of Relational Database.

Photo by Jess Bailey on Unsplash

Being a Computers student you might have come across one of the most important topics in Database Management Systems that is Database Normalization. Normalization being a hot topic of discussion in Campus Placements Interviews, for PSUs & gate exams, for research as well. In this blog, I will discuss the various Normalization techniques & will show you how can you find the Highest Normal forms for the tables very easily.

Before jumping to the main topic let me introduce you to the evolution of DBMS first. Earlier, File System management was used where data was stored in terms of files. File System management does not have protocols for data redundancy, data inconsistency, concurrency of users, security & searching data from the files. So DBMS software came into the light where structured data is stored in terms of tables & it provides a systematic way for creating and managing databases. Now it has become very easy for users to store, process & analyze their data with the help of DBMS. In the database, data is stored in the form of tables which are also known as Relations, so we call it RDBMS, where R stands for Relational.

What Exactly is Database Normalisation?

Database Normalisation is a systematic technique of organizing the data in the database such that the above-mentioned protocols are met, most importantly eliminating the Data Redundancy. In simple words, the data in the tables should be stored logically such that later our operations like insertion, deletion, and the update should not cause any kind of ill effects also known as anomalies.

In Normalisation, we decompose the tables into multiple tables to remove the data redundancy & unwanted anomalies.

So How Can We Normalise our Database?

Using normalization techniques:

  1. First Normal Form
  2. Second Normal Form
  3. Third Normal Form
  4. BCNF (Boyce and Codd Normal Form )

1. First Normal Form (1NF):

Each attribute of a table must have atomic (single) values only then we can say that the table is in 1NF.

In the following example, Kiara has two phone numbers stored in a single cell that violates the 1NF rule.

So we need to make sure that the cell in the table should not hold multiple values, only then we can say that table is in 1NF.

2. Second Normal Form

For a table to be in 2NF :

  • It should be in 1NF.
  • Non-prime attributes of the table should not depend on the proper subset of any of the candidate key or we can say all the non-prime attributes should be fully functionally dependent on the Candidate key & not on parts of Candidate Key.

Before that you should know the following terms:

  1. Functional Dependency(FD): Method that describes the relationship between the attributes in a relation. X →Y, where X determines Y & Y is dependent on X. Example: Student Id → Student Name, here student name is determined by the Student ID.

So when we say Student Name is dependent on the Student ID then which one of the following is not valid for this functional dependency?

So highlighted one below is the answer to the above question, as Student Name is derived from the Student ID, we can’t have different Student names with the same Student ID.

2. Primary Key: I guess most of us would be aware of Primary Key, a minimal set of attributes that uniquely identify a tuple(row) in a relation.

3. Candidate keys: Keys that are not taking part in Primary Key formation but can serve as Primary Key. So it can also identify a tuple in the relation. You can use the Closure Method to identify the Candidate Keys when Functional dependencies are given.

Let’s find the Candidate key for R( ABCD ) given FD{ A → B, B →C, C →A, C→D} using Closure Method:

Step 1: Take Closure of A represented as A+, we need to check what can be derived from A. Now see FD: A →B i.e From A we can determine B. So,

A+ = AB ( as A can be determined from A, Reflexive Property)

Step 2: If A → B & B→C, then by the Transitive property, A→C ( A can determine C) & similarly A → D. So,

A+ = ABCD , So A can determine all the attributes in R( ABCD). Hence A is Candidate Key for sure. But we should find all the candidates key in a table when we are dealing with Normalization. Similarly when we take the closure of other attributes as:

B+ = CAD

C+ = CA

D+ = D

Please Note: here (AB)+ can determine all the attributes of the relation but still it’s not the Candidate key , because subset of (AB)+ that is B alone can determine all the attribute so (AB)+ is not the minimal set of attibutes, so it’s not a candidate key.

4. Prime & Non-Prime Attributes :

Given R(ABCD) & Candidate key as BC.

Prime Attributes: Involved in the formation of Candidate Key (B, C)

Non-Prime Attributes: These are not involved in the formation of Candidate Key. ( A, D)

Problems that we might expect :

Check whether the following relation is in 2NF or not:

R(A, B, C, D, E, F) given FD as {C → F, E →A, EC → D, A →B }.

Solution:

Step1: Find all the possible Candidates Keys.

See RHS of the FD, I can see only FADB {C → F, E →A, EC → D, A →B }. can be determined from the attributes on the LHS but comparing it with R(ABCDEF) I can see CE is not getting derived from any attribute present in LHS of FD. So C & E will surely be a part of Candidate keys because they are not getting determined from any other attributes.

So let’s take the closure of CE as

(CE)+ = CE

(CE)+ = CEF (C → F )

(CE)+ = ACEF (E →A)

(CE)+ = ACDEF (EC → D)

(CE)+ = ABCDEF (A →B)

So CE is determining all the attributes of the Relation. It is a Candidate key & no other Candidate key exists, so we can say,

Prime Attributes = C & E

Non-Prime Attributes = A B D F

For a table to be in 2NF: Non-Prime attribute should not be dependent on a proper subset of Candidate Key.

Subset of Candidate key are {{C},{E},{CE}}

Proper Subset of Candidate Key : {{C},{E}}

From given FD (C → F, E →A ). Clearly we can see that non-prime attributes(F & A ) are determined from the part of the Candidate key ( C & E). So the given relation is not in 2NF.

3. Third Normal Form :

For a table to be in 3NF :

  • It should be in 2NF.
  • There should be no transitive dependency in the table or we can say, no non-prime attribute should be derived from a non-prime attribute in a relation. They should be determined from the Candidate or Super key.

R( PQRS ), FD {PQ →R, R →S}

Candidate key: PQ

Prime attributes: P,Q

Non-prime atrributes: R,S

Since R is a non-prime attribute R can be determined from another non-prime attribute. So this table is not in 3NF.

4. BCNF (Boyce and Codd Normal Form ):

Also known as a special form of 3NF as BCNF puts a restriction on 3NF. In BCNF each functional dependency should be derived from Candidate Key or Super Key.

Data Redundancy decreases as we move from 1 NF to BCNF.

Let’s Practice Few Questions:

Find the highest normal form for R(MNOPQR) and

FD {MN→O, O→PQ, Q→R, R→M}

Solution:

Find Candidate Key with Closure method:

OPQRM attributes can be derived from other attributes but N can’t be derived so N will be part of a Candidate key. Let’s check it’s Closure:

N+ = N, but it can’t determine anything else. So let’s check closure for MN,

(MN)+ = MNOPQR so MN is our Candidate Key. So let us continue,

See which attributes are determining attribute M ( check if M comes on RHS in FD, if so replace it the source attribute in the above Candidate key )

(RN)+ = ? ( as R → M, so M can be replaced by R )

so (RN)+ = MNOPQR, RN closure can determine all the attributes in relation so RN is a candidate key.

See which attributes are determining attribute R( check if R comes on RHS in FD if so replace it the source attribute in the above Candidate key ) & check it’s closure.

(QN)+ = ?( as Q→ R, so R can be replaced by Q).

(QN)+ = QNRMOP, QN closure can determine all the attributes in relation so QN is a candidate key.

Similarly, we can find ON is also a Candidate key as from FD(O→PQ), (ON)+ can be found in a similar fashion.

So the Candidate Keys are MN, ON, RN, QN.

Prime attributes: M, N, O, R, Q

Non-Prime attributes: P

So we have all the candidate keys now, so we will move ahead & our very first step will be to check for BCNF then 3NF after that 2NF. We generally don’t check for 1 NF & assume that the table is in 1 NF.

So the table is in 1 NF.

This completes the Normalization Part of Database!!

I hope you liked this article. Feel Free to reach out to me at medha.rwt@gmail.com. Suggestions for any betterment of this article is highly welcome. Do let me know your reviews about this blog. And tell me the next topic you want me to write for you guys.

Till then Stay Safe & Keep Reading.

--

--