Database Series | Why Database needs to be normalized before adding queries on it.

Camina Delacruz
Sep 15 · 4 min read
Photo by David Pupaza on Unsplash

As you know, Database Normalization is the process of structuring a database, commonly a relational database. You might be wondering what is very important to reduce data redundancy and improve data integrity. It was first proposed by Edgar Frank “Ted” Codd, an English computer scientist who developed the relational model for database administration while working for IBM. This model play as the theoretical foundation for relational databases and relational database management systems. As part of his relational model.

Normalization entails organizing a database’s columns (attributes) and tables (relations) to assure that database integrity constraints properly carry out their dependencies. It is accomplished by applying proper rules either by synthesis (creating a new database design) or decomposition (improving an existing database design).

Description of normalization

Photo by Scott Graham on Unsplash

The process of structuring data in a database is known as normalization. This comprises the creation of tables and the establishment of linkages between those tables per rules aimed to safeguard the data and make the Database more adaptable by avoiding redundancy and inconsistent dependency.

Redundant data wastes disk space and creates maintenance problems. If data exists in more than one location to be modified, the data must be changed in all areas in the same way. Reversing a customer’s address is significantly easier to implement if the data is only saved in the Customer’s table and nowhere else in the Database.

What is an “inconsistent dependency”?

While it is normal for a user to search in the Customers table for a specific customer’s address, it may not make sense to seek the income of the person who calls on that Customer. Because the employee’s paycheck is associated with or dependent on the employee, it should be transferred to the Employees table. Data can be challenging to obtain because the data’s path may be absent or broken due to conflicting dependencies.

There are some guidelines to follow when it comes to database normalization. Each rule is referred to as a “normal form.” The Database is said to be in “first normal form.” If the first three rules are followed, the Database is considered in “third normal form.” Although other levels of normalization are possible, the third normal form is regarded as the highest level necessary for most applications.

As with many formal rules and specifications, real-world scenarios do not always allow for perfect compliance. Normalization, in general, necessitates the creation of new tables, which some customers find inconvenient. If you decide to break one of the first three normalization criteria, make sure your application anticipates any potential complications, such as redundant data and conflicting dependencies.

The following descriptions include examples.

First normal form (1NF)

-Remove repetitive sets from individual tables.

-Make a new table for each piece of relevant data.

-Use the main key to identify each collection of connected data.

To store similar data, do not utilize numerous fields in a single table.

Second normal form (2NF)

-Separate tables should be created for sets of values that apply to numerous records.

-Utilize a foreign key to connect these tables.

Records should not rely on anything other than the main key of a table (a compound key, if necessary)

Third normal form (3NF)

-Remove fields that are not reliant on the key.

Values in a record that are not part of the key for that record do not belong in the table. In general, consider putting those fields in their table whenever the contents of a group of fields may apply to more than one record in the table.

Boyce and Codd Normal Form (BCNF)

The Normal Form of Boyce and Codd is a more advanced variant of the Third Normal Form. This form addresses a specific type of anomaly that 3NF does not handle. BCNF refers to a 3NF table that does not have several overlapping candidate keys. For a table to be in BCNF.

-R must be in 3rd Normal form

-and for each functional dependency ( X → Y ), X should be a super Key.

Fourth Normal Form (4NF)

When a table is in the Boyce-Codd Normal Form,

-it is said to be in the Fourth Normal Form.

-It lacks Multi-Valued Dependency.

The Fourth Normal Form instruction is available here. However, before moving on to the fourth normal form, we recommend first grasp the other normal conditions.

And the Database progresses, we also have what we call a Denormalization, an approach used to improve the performance of a previously normalized database. Denormalization is the process of trying to enhance a database’s read speed at the expense of some write performance by adding duplicate copies of data or grouping data. It is frequently motivated by performance or scalability in relational database applications that must do many reading operations. Unnormalization, also known as an unnormalized relation or non-first normal form (N1NF or NF2), is a database data model that does not meet any of the database normalization constraints provided by the relational model. in that the benefits of denormalization can only be fully realized on a normalized data model.

Analytics Vidhya

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem

Camina Delacruz

Written by

BS Computer Science Major in Application Development in University of Makati. Customer Representative of Transunion Graphic Artist and Writer

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem