1NF

Published in

Javarevisited

6 min readDec 27, 2021

Learn First Normal Form of relational database design

Design your database like pro — Photo by Jan Antonin Kolar on Unsplash

In this story, we will start to talk about the design of relational databases. The whole serial will include 1NF, 2NF, and 3NF, but the main hero of this story will be the so-called 1NF (First Normalization Form).

As you know, relational databases based on SQL are most used in today’s systems. They are powerful because of their ACID properties.

During my professional career, I’ve seen many relational databases. I must tell you one thing, they often had a bad design. Many programmers don’t have enough education or are not interested at all in designing databases.

Because of that, many poor things can be seen. A lot of redundancies across databases, so inconsistencies across databases are pretty often. Then people start doing manual interventions over databases with SQL scripts, etc… Lots of bugs in production because of inconsistencies in database, lot of manual maintaining of the database.

A properly designed database should go through a normalization process. Usually, it is enough if the database goes through 3 normal forms: 1NF, 2NF, and 3NF. Or almost 3 normal forms. Usually, 1NF and 2NF are mandatory. 3NF is good, but sometimes we are not so strict and somewhere (on some table) allow violation of it because of performance benefits.

The process of normalization will end with more tables in the database, however, it will lower data redundancy, and because of that, data integrity is much easier to maintain. Minimizing data redundancy is one of the key objectives of normalization.

Let’s start with one example. Let’s imagine we have some IT school with a lot of students, courses, and instructors (instructors will come in the following stories). Let’s say some junior programmer has started to design it. And STUDENT table was created.

Initial STUDENT table — Non normalized table — Initial STUDENT table

This table contains the list of courses for each student. The ID column is a primary key of the STUDENT table.

A table is in 1NF if it satisfies the following conditions:

There are no duplicated rows in the table
Entries in a column are of the same kind
Contains only atomic values
There are no repeating groups

An atomic value is a value that cannot be divided. A repeating group means that a table contains two or more columns that are closely related. For example, if the STUDENT table has columns COURSE_1, COURSE_2, and COURSE_3 (instead of column COURSES), then it will violate the rule for repeating groups.

If all tables in the database are not at least in 1NF, it is considered a very poor design and then we shouldn’t use the relational database at all! Databases that do not require 1NF are often called NoSQL databases. Then you should use some NoSQL database instead and you will probably lose the best of relational databases: ACID properties.

Let’s now redesign the STUDENT table and put it into 1NF. We will rename the column COURSES into COURSE (singular name) and put each student’s course into a separate row. Also, column NAME will be split into 2 columns, FIRST_NAME and LAST_NAME. Probably, sometimes we will need the student’s first name separately from the last name, and vice versa. And then what to do with, for example, Ana Maria Johnson? How to programmatically decide what is the first name, and what is the last name? Some people can have a first name with 1 word, and the last name with 2 words. We are stuck. Also, it is not a solution to have all 3 columns: NAME, FIRST_NAME, and LAST_NAME. Because it violates the 1NF rule about repeating groups. So, 1NF of STUDENT table could be:

Redesigned STUDENT table is now in 1NF — STUDENT table in 1NF

The redesigned table is now in 1NF. However, 1NF is not enough, there are many problems here.

What are the problems with the current design of the STUDENT table? Well, with this design we have a so-called insert, update, and delete anomalies.

What is an insert anomaly? If we want to add a new student, then that student should have at least one enrolled course. Otherwise, we can’t insert a new student (assuming the COURSE column is not nullable). Or, if column COURSE is nullable, then we must set it to NULL.

What is an update anomaly? Let’s assume the student with ID 1 (Ana Maria Johnson) gets married. And she gets the last name of her husband. Now we should update all rows where Ana Maria appears. Thus maintaining data integrity is more difficult.

What is a delete anomaly? For example, if we delete the student with ID 1 (Ana Maria Johnson), then we will not have any student who enrolled JavaScript course. So we will lose data that tells that the JavaScript course exists.

Also, with this design, each time the existing student enrolls new course, we will repeat common student data (ID, FIRST_NAME, LAST_NAME). So we will have a lot of data redundancies which will increase database size, and the performance of the database will be hit.

One more problem is the primary key. Each table in a relational database should have a primary key. We already said that the ID column is the primary key here (can be autogenerated ID from database). Now we have rows with duplicated IDs. So we need a new primary key on the table. What it should be here? All 4 columns (ID, FIRST_NAME, LAST_NAME, COURSE) should be part of the new primary key! Not very optimal.

Our database normalization should continue with 2NF and 3NF. Why is then 1NF so fundamental? Because it is the boundary for using the relational database at all! SQL is not made to efficiently query columns with multiplied values. If your database is not in 1NF, then you should use a NoSQL database!

In my career, I’ve seen relational databases which are not in 1NF. Even these databases were created by experienced programmers. How did this happen? The answer is — lack of education! They usually don’t know how to implement a many-to-many relationship between database entities. What is a many-to-many relationship? Well, 1 student can enroll in many courses. And 1 course can be attended by many students. This is a many-to-many relationship. Because of lack of education, they design this relationship something like this:

Even experienced developers make such bad design of many-to-many relationship! — Bad design of many-to-many relationship

Then they somehow make this work. Usually by a lot of code in program language they use. Parsing values of columns with multiplied values, splitting them by separator, etc… Also, how will you make a SQL query which, for example, gets all students who enrolled in some particular course? Not easy! Not to speak about the performance of that query.

By the way, the right way to represent a many-to-many relationship in a relational database is to introduce a new table. So-called Join Table. This we will see in stories that will come as follow-up stories. We will also see the power of database normalization and get known to 2NF and 3NF.

Conclusion

1NF is not enough. We should go further in the process of database normalization. But it’s fundamental form because it is the boundary between SQL and NoSQL databases!

1NF

Conclusion

Written by Aleksandar Danilovic