A Complete Database Normalization Tutorial

A Complete Guide on Functional dependencies, keys, Minimal Cover, BCNF, 1-NF, 2-NF, and 3-NF

Anuradha Wickramarachchi
The Startup
9 min readAug 27, 2020

--

I got the opportunity to be a tutor, for the last two years delivering Relational Databases. One important area of study in this particular subject was the database normalization. Students often find it challenging because they tend to miss out on important building block theories. Database normalization is a crucial step in structured data storage and data-warehousing for bigdata related applications.

In this article, I wish to gather all theories into one place making the understanding much easier. I hope this article helps you understand and apply functional dependencies in real applications.

Photo by Campaign Creators on Unsplash

Functional Dependencies and Keys

A functional dependency is a constraint between two sets of attributes. In laymen's terms, they represent what attributes are related to what other attributes. For example, imagine a STUDENT database table for a university.

In this table, we can see the following relationships

  • If you know id, we can determine all the other fields.
  • Knowing first_name and last_name can retrieve other fields.

However, the chance of repeated names exists in a university. Hence, the only appropriate functional dependency is as follows.

We are certain that knowing gender or age will give us no clue as to what the other fields must be. Also, any combination of attributes including id can be a unique identifier for the university STUDENT table. Hence, we call them keys. However, the simplest key remains to be id, which we call as a candidate key. There could be several candidate keys, where one is picked to be the primary key. Let us consider the following functional dependencies.

Here, R represents the relation schema with attributes A, B, C, D, and E. F represents the functional dependencies among those attributes. Our first task is to find the candidate keys.

Note that I use the notation (ABE)+. This is called the closure, which means all the possible attributes that can be inferred from what we know in left. Pretty simple. Keep the following simple rules in mind as they will come in handy.

Let us compute the closure for the attributes in the following schema and decide a primary key.

Note: attributes within a primary key are called prime attributes.

Key-finding Steps

  • Obtain closures for individual attributes
  • Consider combinations with 2 attributes (try the hint to reduce the dirty work)
  • Expand and continue until a key is found

Now that we are aware of keys and how to find them let’s move on to the minimal cover algorithm. Many students find this quite challenging. However, if the steps are followed carefully the answer is guaranteed, without any gimmicks.

Minimal Cover

Minimal Cover is compressed of the fundamental set of functional dependencies that can represent the entire set of dependencies. Consider the following example. Here we will try to remove the redundant or inferrable functional dependencies.

How can we check if a functional dependency is redundant?

Step 1: Write the right-hand side as singular attributes

Step 2: Simplify the left-hand side

Step 3: Remove redundancies

In order to remove redundancies, we follow a similar approach to that of step 2. If something to be removed, that dependency must be included among the rest of the dependencies.

So the final answer or the closure is.

The order really matters in Minimal Cover computations. However, the answer will always be able to construct all the functional dependencies that existed.

Now that we have layered a solid understanding of functional dependencies, keys, and minimal cover we can proceed to study the normalization.

First Normal Form (1-NF)

This is the most naive normal form and most likely the data is already having this property satisfied.

  • Table attributes must contain single values
  • Each record must be unique

Consider the following table

We can see that the above table has multiple telephone numbers recorded for Robert. We can bring the table into the first normal form by splitting the entry for Robert into two rows.

Should the table contain duplicate rows, in 1-NF you will discard one of them to remove the redundancies.

Second Normal Form (2-NF)

The 2-NF normalization requires the following conditions to be satisfied

  • Be in 1-NF
  • Remove subsets of data belonging to multiple rows and place them in separate tables
  • Use foreign keys to ensure the relationships are preserved

In our 1-NF table, you can see that records for Robert are duplicated. If he had more data, all of them will be duplicated consuming more space and inconsistencies. Hence, we can decompose the table into two more tables to attain 2-NF.

No in the USERS table we have user details and associated phone numbers in the table below. Note that USER PHONE NUMBERS table has id attribute which is a foreign key to the USERS table.

Photo by Marius Masalar on Unsplash

Third Normal Form (3-NF)

3-NF decomposition has several steps that involve the theories we discussed before. A relational schema R is said to be in 3-NF if its functional dependencies in the form X -> A satisfy one of the following conditions.

  • X is a superkey
  • A is a prime attribute

Let us walk through the steps while following an example.

Step 1: Compute the Minimal Cover

Step 2: Group the functional dependencies by their left-hand side

Step 3: For each grouped functional dependencies, gather them to form a Relational Shema

Step 4: Remove redundant schemas

Step 5: If the set of schemas do not have a superkey, add a super key table

Step 6: Project all the functional dependencies into the final set of schemas

Photo by Scott Graham on Unsplash

Boyce-Codd Normal Form (BCNF)

A relational schema is said to be in BCNF if its functional dependencies are in the form X -> A such that X is a superkey. Note that this condition is more strict than 3-NF. Hence, there is a chance that some functional dependencies may not be visible in the final decomposition. This is something noteworthy!

Let's go through each step for BCNF with an example.

Step 1: Identify the non-trivial functional dependencies that violate BCNF.

Step 2: Break the violating functional dependency X -> Yinto two schemas as R-Y and XY.

BCNF can also depend on the order of decomposition. Hence, multiple answers can exist!

Concluding Remarks

Database normalization is an easy task once the basic steps are understood. For BCNF one could easily use a tree and keep on decomposing. Although BCNF can kill certain functional dependencies, you must look for the closure of existing attributes to see if they are really lost or hidden. This is one major mistake I have seen among students.

I hope this was a useful article for readers with interest. Cheers!

--

--