Database — Normalization (Part 7)

The process of organizing your database through a set of defined steps.

Omar Elgabry
OmarElgabry's Blog
5 min readSep 15, 2016

--

This is what normalization is all about; organizing your data — cashbackauthority

Wish you already came along the last part Database — Design: Logical Design (Part 6)

This is a process where you take your database design, and you apply a set of formal criteria of rules called “Normal Forms”. And we step through them first normal form, second normal form, and third normal form.

You usually will end up creating a few new tables as part of the process. But, the end result of normalization is to make your database easier to edit, easier to maintain, and preform operations, remove any duplicates, & more reliable to work with.

First Normal Form

First normal form requires that every column should have one, and only one value; there is no multi-valued attribute, and there shouldn’t be repeating groups of data.

So, for example, if you have a customer, and this customer can have two emails; the main email and the alternative one. Now, you can just put two columns in your table, each column represent on of these emails (repeating group ☹).

Alternatively, you can have two rows for this customer, one with the main email, and the other row with the alternative email, while keeping the other columns’ values as they are (repeating group ☹).

But, “What if you want to extend this, and add more emails?”. We don’t want to have a table that has repeating group of data.

So, what are going to do is to extract the customer emails in a new table. This new table will have a foreign key points to the primary key of the original table, and another column for the email values. Both the foreign key and the email value are together form the primary key of the new table.

First Normal Form

Now, we can say that our tables are in the first normal form, let’s make it in the second normal form.

Second Normal Form

The problem arise when you have a composite primary key in a table. The second normal form requires that all of the non-primary columns have to be dependent on the entire (composite) primary key.

An example could be, let’s say you have a table that stores books information. A book could have a name, author, and date released.

According to the requirements, we can have book that’s written by more than one author, but we can’t have more than one book with the same book name and the same author. So, both name and author form a composite primary key.

A book table

Now, the date column relies only on the name of the book, it doesn’t care about who wrote this book, it cares about the name of the book. And if you noticed we’ve written the date “2011” more than one time with “Database Systems” book. This is not a good practice when you have duplicates.

So, what we are going to do is to create a separate table, with one-to-many relationship with the original table, it has the book name as a primary key, and all the dependent attributes (like date).

Now, the book name in the book table is a foreign key that points to the primary key of the new created table, and every non-primary column in the book table (if exist) depends on the whole composite primary key; name and author, and not one of them.

Second Normal Form

Now, we can say that our tables are in the second normal form, let’s make it in the third one.

Third Normal Form

The third normal form is concerned with the non-key attribute that rely on another non-key attributes, and not the primary key.

An example for that could be with the course table, where every course has an name as a primary key, room number and capacity.

Third Normal Form

Now, it’s obviously noted that the capacity depends on the room, it has nothing to do with the course name. This means we have a non-key attribute that depends on another non-key attribute.

Why this sounds like a problem? Well, “What happens if somebody reached into this table, and they changed that room for a specific course, but they didn’t change the capacity?”.

So, what we are going to do is the same, we will create a new table with one-to-many relationship with the original table, it has the room number as a primary key, and all the dependent attributes like the capacity.

Now, the room number in the course table is a foreign key that points to the primary key of the new created table.

Third Normal Form

Derived Attribute

Another example of third normal form is when you have a (non-primary) derived attribute that depends on other non-primary attributes.

We would remove that derived column form this table, as we can figure it out when we need to. One of the main reasons for this is to prevent any conflicts, any inaccurate data to be inserted mistakenly.

Many database systems offer you the option of defining a computed or calculated column. It’s not actually stored in the database, it is a convenient read-only fiction. It’s value is automatically calculated based on the other columns in the table, and you may find that useful from time to time.

Denormalization

Sometimes you will hit the situation where you will break the normalization rules for performance improvements, because normalization is often involves splitting data into multiple tables.

If you are following along, you remember at the first normal form, when we explained the example of emails. “What happens if we added two email columns to the customer table?.”

Technically, this can be described as breaking first normal form. It’s a repeating group. But in practice, you may find it more convenient to just allow an email1 and email2 columns, because you’re sure there won’t be a flexible number of email addresses.

This will improve your performance, and save you from creating more tables.

Wrapping Up

Now, we can say we have an organized, more reliable database, and this will improve the consistency of your data. In the next tutorial, we are going to cover basics of “Structured Query Language (SQL)”.

--

--

Omar Elgabry
OmarElgabry's Blog

Software Engineer. Going to the moon 🌑. When I die, turn my blog into a story. @https://www.linkedin.com/in/omarelgabry