SELECT to BCNF: SQL Commands and Database Normalization Explained

Wasim Akram
8 min readOct 28, 2023

--

What is SQL?

SQL (Structured Query Language) is a powerful tool that helps you communicate with and manage databases. It’s like a universal language for databases. With SQL, you can ask questions, retrieve information, add new data, change existing data, and organize data in ways that make sense to you.

  • SQL Commands are divided into various categories.

DDL Commands

DDL stands for Data Definition Language, is used to define,manage and control the structure and organization of a database. DDL commands are responsible for creating, altering, and deleting database objects, such as tables, indexes, views, and schemas.

Here are some common DDL commands and their functions:

CREATE- This command is used to create new database objects. For example, you can create tables, views, indexes, and schemas.

CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
Department VARCHAR(50)
);

ALTER- This is used to modify the structure of existing database objects. You can use it to add, modify, or delete columns, change data types, or rename objects.

ALTER TABLE Employees
ADD Email VARCHAR(100);

DROP- It is used to delete database objects. It can be used to remove tables, indexes, views, or even entire databases.

DROP TABLE Employees;

TRUNCATE- This command is used to remove all rows from a table, but it retains the table structure. It’s faster than DELETE for removing all data.

TRUNCATE TABLE Employees;

DML Commands

DML stands for Data Manipulation Language that focuses on manipulating or interacting with data stored in a database. DML commands allow you to insert, retrieve, update, and delete data in database tables.

Here are some common DDL commands and their functions:

INSERT- The INSERT statement is used to add new rows or records to a table. You provide the values for each column you want to populate.

INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);

UPDATE- The UPDATE statement is used to modify existing data in a table. You specify the columns to update, provide new values, and use a condition to identify which rows should be changed.

UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

DELETE- The DELETE statement is used to remove rows from a table based on a specified condition. The basic syntax is as follows:

DELETE FROM table_name
WHERE condition;

TCL Commands

TCL stands for Transaction Control Language that is used to manage and control transactions within a database management system.

Transactions are sequences of one or more SQL statements that are treated as a single unit of work. TCL commands are essential for ensuring the integrity and consistency of a database during data modification operations.

There are two primary TCL commands: COMMIT and ROLLBACK.

COMMIT- The COMMIT command is used to save all the changes made during a transaction and make those changes permanent. Once a COMMIT is issued, the changes cannot be rolled back, and they become a permanent part of the database.

BEGIN; -- Start a transaction
UPDATE Customers
SET Balance = Balance - 100
WHERE CustomerID = 123;
COMMIT; -- Save the changes

ROLLBACK- The ROLLBACK command is used to undo the changes made during a transaction and return the database to its state before the transaction began. It cancels the transaction and ensures that no changes are applied to the database.

BEGIN; -- Start a transaction
DELETE FROM Orders
WHERE OrderID = 456;
ROLLBACK; -- Undo the changes

DCL Commands

DCL stands for Data Control Language used to control access and permissions within a database management system.

There are two primary DCL commands: GRANT and REVOKE.

GRANT- The GRANT command is used to give specific permissions or privileges to users or roles within a database.

GRANT SELECT, INSERT ON Employees TO UserA;

REVOKE- The REVOKE command is used to take away previously granted permissions or privileges from users or roles. It allows database administrators to restrict or revoke access to database objects.

REVOKE DELETE ON Customers FROM UserB;

DQL Commands

DQL stands for Data Query Language used for querying and retrieving data from a database.

There is one primary DQL command in SQL, and that is:

SELECT- The SELECT statement is used for querying and retrieving data from one or more tables. It allows you to specify which columns you want to retrieve and apply filtering and sorting criteria.

SELECT column1, column2, ...
FROM table_name
WHERE condition;

Aggregate Functions

Aggregate functions in SQL are special functions that perform a calculation on a set of values and return a single, summarizing result.

COUNT- The COUNT function is used to count the number of rows in a result set. It can be used to count all rows or the number of rows that meet a specific condition.

SELECT COUNT(*) FROM Orders;

SUM- The SUM function calculates the sum of a numeric column in a result set.

SELECT SUM(Price) FROM Products;

AVG- The AVG function computes the average (mean) value of a numeric column in a result set.

SELECT AVG(Salary) FROM Employees;

MAX- The MAX function retrieves the maximum (highest) value in a specified column.

SELECT MAX(Score) FROM ExamResults;

MIN- The MIN function retrieves the minimum (lowest) value in a specified column.

SELECT MIN(Price) FROM Products;

GROUP BY- The GROUP BY clause is used to group rows that have the same values in specified columns into summary rows. It is typically used in combination with aggregate functions (e.g., SUM, COUNT, AVG) to perform calculations on groups of rows.

SELECT Department, AVG(Salary)
FROM Employees
GROUP BY Department;

HAVING- The HAVING clause is used to filter the results of a GROUP BY query based on the results of aggregate functions. It allows you to specify conditions that must be met by grouped rows.

SELECT Department, AVG(Salary)
FROM Employees
GROUP BY Department
HAVING AVG(Salary) > 50000;

ORDER BY- The ORDER BY clause is used to sort the result set by one or more columns in ascending (ASC) or descending (DESC) order.

SELECT ProductName, UnitPrice
FROM Products
ORDER BY UnitPrice DESC;

Filtering Data

There are few more commands that are commonly used in conjunction with the SELECT statement to filter and manipulate data.

WHERE- The WHERE clause is used to filter rows based on a specified condition. It allows you to select rows that meet specific criteria.

SELECT * FROM Customers
WHERE Country = 'USA';

AND- The AND operator is used to combine multiple conditions in a WHERE clause. It requires that all conditions must be true for a row to be included in the result set.

SELECT * FROM Orders
WHERE ShipCountry = 'USA' AND ShipCity = 'New York';

OR- The OR operator is used to combine multiple conditions in a WHERE clause. It requires that at least one of the conditions is true for a row to be included in the result set.

SELECT * FROM Products
WHERE Category = 'Electronics' OR Category = 'Appliances';

BETWEEN- The BETWEEN operator is used to specify a range of values. It is often used in combination with the WHERE clause to filter rows within a specified range.

SELECT * FROM Employees
WHERE Salary BETWEEN 40000 AND 60000;

LIKE- The LIKE operator is used for pattern matching within string values. It is often used with wildcard characters to find matching patterns.

SELECT * FROM Products
WHERE ProductName LIKE 'S%';

IN- The IN operator is used to specify a list of values to compare against in a WHERE clause. It allows you to filter rows that match any of the provided values.

SELECT * FROM Customers
WHERE Country IN ('USA', 'Canada', 'Mexico');

IS NULL- The IS NULL operator is used to filter rows where a column’s value is NULL, meaning it has no value or is undefined.

SELECT * FROM Orders
WHERE ShipDate IS NULL;

IS NOT NULL- The IS NOT NULL operator is used to filter rows where a column’s value is not NULL, meaning it has a defined value.

SELECT * FROM Products
WHERE ExpirationDate IS NOT NULL;

JOINS

In SQL, JOIN is a powerful operation that combines rows from two or more tables based on a related column between them. JOINs are essential for retrieving data from multiple tables in a database and for creating comprehensive queries that involve data from different sources.

Here are some common types of JOINs:

INNER JOIN-

  • An INNER JOIN returns only the rows that have matching values in both tables.
  • It’s used to combine rows from two or more tables based on a related column, excluding rows with no match.
SELECT Employees.FirstName, Employees.LastName, Departments.DepartmentName
FROM Employees
INNER JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

LEFT JOIN-

  • A LEFT JOIN returns all the rows from the left table and the matching rows from the right table.
  • If there’s no match in the right table, NULL values are used.
SELECT Employees.FirstName, Employees.LastName, Departments.DepartmentName
FROM Employees
LEFT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

RIGHT JOIN-

  • A RIGHT JOIN is the opposite of a LEFT JOIN. It returns all the rows from the right table and the matching rows from the left table.
  • If there’s no match in the left table, NULL values are used.
SELECT Employees.FirstName, Employees.LastName, Departments.DepartmentName
FROM Employees
RIGHT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

FULL JOIN-

  • A FULL JOIN returns all the rows from both tables and combines them based on matching columns.
  • If there’s no match, NULL values are used for the missing data.
SELECT Employees.FirstName, Employees.LastName, Departments.DepartmentName
FROM Employees
FULL JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

Database Normalization

Database normalization is a database design technique used to minimize data redundancy and improve data integrity by organizing data into separate related tables. The primary goal of normalization is to create a well-structured, efficient, and maintainable database schema. It helps in reducing anomalies in data storage, making it easier to manage and query the database.

Database normalization is usually divided into different normal forms (e.g., First Normal Form, Second Normal Form, Third Normal Form, Boyce-Codd Normal Form) with each form addressing specific issues related to data redundancy and data integrity. Here’s a brief overview of the most commonly discussed normal forms:

First Normal Form (1NF):

  • Data is organized into tables with rows and columns.
  • Each column contains only atomic (indivisible) values.
  • Eliminates repeating groups by ensuring each cell contains a single, meaningful piece of data.

Second Normal Form (2NF):

  • Builds on 1NF.
  • All non-key columns are fully functionally dependent on the entire primary key.
  • Eliminates partial dependencies, ensuring that each non-key column depends on the entire primary key.

Third Normal Form (3NF):

  • Builds on 2NF.
  • All attributes are functionally dependent only on the primary key.
  • Eliminates transitive dependencies, ensuring non-key columns do not depend on other non-key columns.

Boyce-Codd Normal Form (BCNF):

  • A more advanced form that builds on 3NF.
  • All non-trivial functional dependencies must have a superkey as their left-hand side.
  • Ensures that each non-key attribute depends only on superkeys.

Conclusion

In Conclusion, SQL commands are the language through which we interact with and shape our data. They allow us to retrieve, manipulate, and manage information, making it a powerful tool for data professionals.

Database normalization, on the other hand, is the art of organizing data. It reduces redundancy, enforces integrity, and ensures data remains consistent and dependable.

Together, SQL commands and normalization form the core of efficient data management. They transform raw data into actionable insights and enable databases to adapt and grow with changing needs. These concepts are the keys to unlocking the full potential of data-driven decision-making.

--

--