Mastering Data Manipulation Language (DML) in SQL

DataScience Nexus
9 min readOct 14, 2023

--

In the world of databases, understanding how to manipulate data is crucial. We’ve already explored Data Definition Language (DDL) in a previous article, which helps us define the structure of our database. Now, we’re diving into Data Manipulation Language (DML), the set of SQL statements that enable us to work with the data within the tables. In this article, we’ll focus on key DML statements, such as SELECT, INSERT, UPDATE, and DELETE, and discover their powerful capabilities.

Data Manipulation Language

SELECT: Retrieving Data

The SELECT statement is your gateway to extracting data from your database tables. You can use it to retrieve specific records or the entire table’s content. While it might seem unnecessary to retrieve only a part of the data when the entire table is visible, imagine dealing with a massive table containing millions of rows. This is where SELECT’s true power lies — the ability to extract precisely the data that matches your criteria. Let’s dive deeper into this essential SQL statement.

-- Create the BookstoreDB database
CREATE DATABASE BookstoreDB;
USE BookstoreDB;

-- Create the Authors table
CREATE TABLE Authors (
AuthorID INT PRIMARY KEY,
AuthorName VARCHAR(100)
);

-- Create the Books table
CREATE TABLE Books (
BookID INT PRIMARY KEY,
Title VARCHAR(100),
PublicationYear INT,
AuthorID INT,
Price DECIMAL(8, 2)
);

-- Insert data into the Authors table
INSERT INTO Authors (AuthorID, AuthorName)
VALUES
(1, 'Jane Austen'),
(2, 'George Orwell'),
(3, 'J.K. Rowling');

-- Insert data into the Books table
INSERT INTO Books (BookID, Title, PublicationYear, AuthorID, Price)
VALUES
(1, 'Pride and Prejudice', 1813, 1, 9.99),
(2, '1984', 1949, 2, 12.99),
(3, 'Harry Potter and the Sorcerer''s Stone', 1997, 3, 14.99),
(4, 'Sense and Sensibility', 1811, 1, 9.99),
(5, 'Animal Farm', 1945, 2, 10.99);

Five practice questions and answers based on this database:

Question 1:

Question: How can you retrieve the titles of all books in the “Books” table?

Answer: You can retrieve the titles of all books using the SELECT statement:

SELECT Title FROM Books;

Question 2:

Question: What SQL query would you use to retrieve the names of authors who have written books published after 1900?

Answer: To retrieve the names of authors with books published after 1900, you can use a JOIN between the “Authors” and “Books” tables, along with a WHERE clause:

SELECT AuthorName
FROM Authors
JOIN Books ON Authors.AuthorID = Books.AuthorID
WHERE Books.PublicationYear > 1900;

Question 3:

Question: Can you retrieve a list of books, including their titles, authors, and prices, where the price is less than $10?

Answer: To retrieve a list of books with titles, authors, and prices under $10, you can use the SELECT statement with a JOIN and a WHERE clause:

SELECT Books.Title, Authors.AuthorName, Books.Price
FROM Books
JOIN Authors ON Books.AuthorID = Authors.AuthorID
WHERE Books.Price < 10.00;

Question 4:

Question: How would you find the total number of books in the “Books” table?

Answer: To find the total number of books, you can use the COUNT function with the SELECT statement:

SELECT COUNT(*) AS TotalBooks
FROM Books;

Question 5:

Question: Can you write a SQL query to retrieve the titles of books authored by “Jane Austen”?

Answer: To retrieve the titles of books authored by “Jane Austen,” you can use a JOIN with the “Authors” and “Books” tables and a WHERE clause to filter by the author’s name:

SELECT Books.Title
FROM Books
JOIN Authors ON Books.AuthorID = Authors.AuthorID
WHERE Authors.AuthorName = 'Jane Austen';

These questions and answers demonstrate how the SELECT statement can be used to retrieve specific data from a database and how it can be combined with other SQL clauses for more complex queries.

INSERT: Adding Data

The INSERT statement is your tool for adding new records or rows to a table. It goes hand in hand with the keywords INTO and VALUES. You can specify the columns you’re adding information to, or insert data into all columns. The choice depends on the context and your data. If your table contains only a few fields, you can omit specifying column names in the statement. Here’s how you can add a new record to your table:

INSERT INTO Sales VALUES (1, '2017-10-11');

This line of code registers the first record in the dataset. It will become the first row in your “Sales” table. You can add more records similarly:

INSERT INTO Sales VALUES (2, '2017-10-27');

UPDATE: Modifying Data

UPDATE is the statement you need to modify existing data within your tables. It allows you to make changes based on specified criteria. Let’s consider an example:

UPDATE Sales
SET PurchaseDate = '2017-12-12'
WHERE PurchaseNumber = 1;

In this code, we are updating the purchase date for the record with PurchaseNumber = 1. The entire row is updated, and the date is changed to 2017–12–12. The power of UPDATE lies in its ability to make targeted changes to your data.

DELETE: Removing Data

DELETE is your tool for removing data from your tables. It functions similarly to the TRUNCATE statement from DDL, but with one significant difference. While TRUNCATE deletes all records from a table, DELETE allows you to specify precisely what you want to remove. Consider this example:

DELETE FROM Sales;

This statement would remove all records from the “Sales” table, akin to TRUNCATE. However, when you take advantage of the optional WHERE clause, you can be selective:

DELETE FROM Sales
WHERE PurchaseNumber = 1;

In this case, only the record with PurchaseNumber = 1 is deleted, leaving the other records intact.

Summary: Key DML Statements

To summarize, Data Manipulation Language in SQL primarily consists of the following four elements:

  1. SELECT… FROM…: Retrieve data from tables.
  2. INSERT INTO… VALUES…: Add new records to tables.
  3. UPDATE… SET… WHERE…: Modify existing data based on criteria.
  4. DELETE… FROM…: Remove data from tables, optionally specifying what to delete.

Mastering these DML statements is essential for effective data manipulation in SQL. In our next article, we’ll explore Data Control Language and further enhance our understanding of SQL’s capabilities.

Expert-level interview questions based on the “EmployeeDB” database along with their answers:

A simple database called “EmployeeDB” with two tables: “Employees” and “Departments.” Here’s how you can create and populate this database in SQL:

-- Create the EmployeeDB database
CREATE DATABASE EmployeeDB;
USE EmployeeDB;

-- Create the Employees table
CREATE TABLE Employees (
EmployeeID INT PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
DepartmentID INT
);

-- Create the Departments table
CREATE TABLE Departments (
DepartmentID INT PRIMARY KEY,
DepartmentName VARCHAR(50)
);

-- Insert data into the Departments table
INSERT INTO Departments (DepartmentID, DepartmentName)
VALUES
(1, 'HR'),
(2, 'Sales'),
(3, 'Engineering');

-- Insert data into the Employees table
INSERT INTO Employees (EmployeeID, FirstName, LastName, DepartmentID)
VALUES
(1, 'John', 'Doe', 2),
(2, 'Jane', 'Smith', 1),
(3, 'Bob', 'Johnson', 3),
(4, 'Alice', 'Williams', 2);

-- Display the data in the Employees table
SELECT * FROM Employees;

-- Display the data in the Departments table
SELECT * FROM Departments;

Q1: What is the purpose of the “EmployeeDB” database in this example?

A1: The “EmployeeDB” database is a sample database used to demonstrate and practice Data Manipulation Language (DML) operations in SQL. It contains two tables, “Employees” and “Departments,” and serves as a fictional employee management system.

Q2: How many tables are there in the “EmployeeDB” database?

A2: There are two tables in the “EmployeeDB” database: “Employees” and “Departments.”

Q3: What information is stored in the “Employees” table?

A3: The “Employees” table stores information about employees, including their EmployeeID, FirstName, LastName, and DepartmentID.

Q4: How can you retrieve all the records from the “Employees” table using SQL?

A4: You can retrieve all the records from the “Employees” table using the SQL query: SELECT * FROM Employees;

Q5: What does the “DepartmentID” column in the “Employees” table represent?

A5: The “DepartmentID” column in the “Employees” table represents the department to which an employee belongs. It is a foreign key that links employees to their respective departments in the “Departments” table.

Q6: How would you add a new employee named “Ella Rodriguez” to the “Employees” table using SQL?

A6: You can add a new employee named “Ella Rodriguez” to the “Employees” table with the SQL INSERT statement, like this:

INSERT INTO Employees (EmployeeID, FirstName, LastName, DepartmentID)
VALUES (5, 'Ella', 'Rodriguez', 3);

Q7: What SQL statement would you use to update the last name of an employee with EmployeeID 1 to “Smith”?

A7: You can update the last name of an employee with EmployeeID 1 to “Smith” using the SQL UPDATE statement as follows

UPDATE Employees
SET LastName = 'Smith'
WHERE EmployeeID = 1;

Q8: How can you retrieve a list of departments from the “Departments” table in alphabetical order?

A8: To retrieve a list of departments from the “Departments” table in alphabetical order, you can use the SQL query:

SELECT * FROM Departments
ORDER BY DepartmentName;

Q9: What is the primary key of the “Departments” table?

A9: The primary key of the “Departments” table is the “DepartmentID” column. It uniquely identifies each department.

Q10: How can you delete an employee with EmployeeID 2 from the “Employees” table using SQL?

A10: You can delete an employee with EmployeeID 2 from the “Employees” table with the SQL DELETE statement, like this:

DELETE FROM Employees
WHERE EmployeeID = 2;

Q11: In the “Employees” table, we have a foreign key relationship between the “EmployeeID” and the “DepartmentID.” Explain how you would enforce referential integrity in this relationship, and what actions should be taken when an employee is deleted or a department is updated.

A11: To enforce referential integrity in this relationship, you can set up foreign key constraints. When an employee is deleted, you can either restrict the deletion (no deletion if related records exist), set the DepartmentID to NULL, or cascade the deletion (delete related records). When a department is updated, you can similarly restrict the update, set the DepartmentID to a new value, or cascade the update. The choice depends on the desired business rules and database design.

Q12: Explain the difference between an INNER JOIN, LEFT JOIN, and RIGHT JOIN, and provide an SQL query that retrieves a list of employees and their departments, including employees without assigned departments.

A12: An INNER JOIN returns only the rows with matching values in both tables. A LEFT JOIN returns all rows from the left table and the matched rows from the right table. A RIGHT JOIN does the opposite, returning all rows from the right table and the matched rows from the left table. To retrieve a list of employees and their departments, including those without assigned departments, you can use a LEFT JOIN like this:

SELECT Employees.EmployeeID, FirstName, LastName, Departments.DepartmentName
FROM Employees
LEFT JOIN Departments ON Employees.DepartmentID = Departments.DepartmentID;

Q13: How would you optimize a query that retrieves the department with the most employees in the “Departments” table? Provide both a standard SQL query and a query specific to your preferred database system.

A13: One way to optimize this query in standard SQL is to use the following subquery to find the department with the most employees:

SELECT DepartmentName
FROM Departments
WHERE DepartmentID = (
SELECT DepartmentID
FROM Employees
GROUP BY DepartmentID
ORDER BY COUNT(*) DESC
LIMIT 1
);

Optimizing this query for a specific database system (e.g., PostgreSQL, MySQL) may involve using system-specific functions or features to improve performance.

Q14: Explain the concept of ACID properties in the context of a database transaction. How would you ensure data consistency and integrity in a multi-step SQL transaction when, for example, both an employee’s information and department’s information need to be updated simultaneously?

A14: ACID (Atomicity, Consistency, Isolation, Durability) properties ensure the reliability of database transactions. To ensure data consistency and integrity in a multi-step transaction, you can use transactions and rollbacks. In SQL, you can use the BEGIN TRANSACTION, COMMIT, and ROLLBACK statements. For simultaneous updates to an employee’s information and department’s information, you can enclose the SQL statements within a transaction block. If any step fails, you can issue a ROLLBACK to undo the changes, maintaining data consistency.

Q15: Can you implement a stored procedure that calculates the average salary for each department in the “Employees” table and updates a separate “Departments” table with these averages? Provide the SQL code for creating the stored procedure and an example of how to execute it.

A15: To create a stored procedure for this task, you can use SQL syntax specific to your database system. Here’s a simplified example using PostgreSQL:

CREATE OR REPLACE FUNCTION CalculateAverageSalary() RETURNS VOID AS $$
BEGIN
UPDATE Departments AS d
SET AverageSalary = (
SELECT AVG(Salary)
FROM Employees AS e
WHERE e.DepartmentID = d.DepartmentID
);
END;
$$ LANGUAGE plpgsql;

You can then execute the stored procedure as follows:

SELECT CalculateAverageSalary();

This will update the “Departments” table with the average salary for each department based on the data in the “Employees” table.

These questions and answers provide a basic understanding of the “EmployeeDB” database and the DML operations you can perform on it using SQL.

Thank you! for reading. And please don’t forget to follow and subscribe me

--

--

DataScience Nexus

"DataScience Nexus: Connecting Insights and Innovations" Need a writer ? Contact me!