SQL Concepts Every Data Professionals Should Know For An Interview:

Mohammad Aftab
8 min readDec 28, 2023

--

In order to ace the data interview you will need to master essential SQL concepts, categorized into three main areas. First, it Basic CRUD Operations, which are foundational actions to interact with database tables. They include commands like SELECT for data retrieval, INSERT for adding new records, UPDATE for altering existing data, and DELETE for record removal, with accompanying SQL examples for each.

Next, Joins, a critical component for combining data across tables. You will need to be able to implement various join types such as INNER JOIN, LEFT JOIN, and SELF JOIN, amongst others, each with its unique purpose and utility. Examples are provided to illustrate the joins’ functionalities, especially when connecting tables like orders and customers.

Lastly, you will need to understand Aggregate Functions & Grouping, which allow users to perform calculations and summarize data. This encompasses functions like COUNT, SUM, and AVG, and the concept of grouping results with GROUP BY. Furthermore, additional SQL topics like subqueries, common table expressions (CTEs), and handling null values are also key to having a comprehensive understanding of SQL’s versatility and power.

Basic CRUD Operations:

  1. SELECT (to retrieve data)

Explanation: The SELECT statement is used to fetch data from one or more tables in a database.

SELECT first_name, last_name FROM employees WHERE department_id = 10;

2. INSERT (to add new records)

Explanation: The INSERT statement is used to add new rows of data to a table.

INSERT INTO employees (first_name, last_name, department_id) VALUES (‘John’, ‘Doe’, 10);

3. UPDATE (to modify existing records)

Explanation: The UPDATE statement is used to modify the values in one or more rows of a table based on specific conditions.

UPDATE employees SET department_id = 20 WHERE last_name = ‘Doe’;

4. DELETE (to remove records)

Explanation: The DELETE statement is used to remove one or more rows from a table based on specific conditions.

DELETE FROM employees WHERE last_name = ‘Doe’ AND first_name = ‘John’;

Joins

Assuming we have two tables: orders and customers.

1. INNER JOIN (for rows that match in both tables)

Explanation: Combines rows from both tables where the specified join condition is met.

SELECT orders.order_id, customers.customer_name
FROM orders
INNER JOIN customers ON orders.customer_id = customers.customer_id;

2. LEFT (OUTER) JOIN (for all rows from the left table and matching rows from the right)

Explanation: Returns all rows from the left table and the matching rows from the right table; if no match exists, NULL values are returned for right table’s columns.

SELECT orders.order_id, customers.customer_name
FROM orders
LEFT JOIN customers ON orders.customer_id = customers.customer_id;

3. RIGHT (OUTER) JOIN (for all rows from the right table and matching ones from the left)

Explanation: Returns all rows from the right table and the matching rows from the left table; if no match exists, NULL values are returned for left table’s columns.

SELECT orders.order_id, customers.customer_name
FROM orders
RIGHT JOIN customers ON orders.customer_id = customers.customer_id;

4. FULL (OUTER) JOIN (for rows that have matching rows in one of the tables)

Explanation: Combines rows from both tables, returning all rows when there is a match in either table.

SELECT orders.order_id, customers.customer_name
FROM orders
FULL OUTER JOIN customers ON orders.customer_id = customers.customer_id;

5. CROSS JOIN (for Cartesian product between tables)

Explanation: Returns the Cartesian product of the sets of rows from the joined tables.

SELECT orders.order_id, customers.customer_name
FROM orders
CROSS JOIN customers;

6. SELF JOIN

Explanation: A self join joins a table with itself. It is useful to find rows in the same table that share a specific attribute or set of attributes.

(Assuming employees table with columns employee_id, employee_name, and manager_id):

SELECT e1.employee_name AS Employee, e2.employee_name AS Manager
FROM employees e1, employees e2
WHERE e1.manager_id = e2.employee_id;

Note: Always use aliases especially in self joins to differentiate and easily refer to the columns of the table being joined with itself.

Aggregate Functions

Assuming we have a table: sales with columns product_id, product_name, and units_sold.

1. COUNT

Explanation: Returns the number of rows that match the specified criteria.

SELECT COUNT(product_id)
FROM sales;

2. SUM

Explanation: Returns the sum of the values in a numeric column.

SELECT SUM(units_sold)
FROM sales;

3. AVG

Explanation: Calculates the average of the values in a numeric column.

SELECT AVG(units_sold)
FROM sales;

4. MAX

Explanation: Returns the highest value in a numeric column.

SELECT MAX(units_sold)
FROM sales;

5. MIN

Explanation: Returns the lowest value in a numeric column.

SELECT MIN(units_sold)
FROM sales;

6. GROUP BY

Explanation: Groups the rows in the table based on one or more columns and then aggregates data using functions like COUNT, SUM, AVG, etc.

Grouping sales by product_name and getting the total units sold for each product

SELECT product_name, SUM(units_sold) AS total_units_sold
FROM sales
GROUP BY product_name;

Note: When using GROUP BY, the columns that are not part of the aggregation function must be mentioned in the GROUP BY clause.

Subqueries

Nested queries within queries, used in the WHERE, FROM, or SELECT clauses.:

Assuming we have two tables:

- employees with columns `employee_id`, `name`, and `department_id`.

- departments with columns `department_id` and `department_name`.

1. Nested Queries within Queries:

Explanation: These are queries embedded within another query, typically within the WHERE, FROM, or SELECT clauses. They can help filter, structure, or augment data based on the results of the inner query.

SELECT name
FROM employees
WHERE department_id IN (SELECT department_id FROM departments WHERE department_name = 'HR');

This fetches the names of employees working in the HR department.

SELECT dept_info.department_name, count(emp.name) as number_of_employees
FROM (SELECT department_id, department_name FROM departments) AS dept_info
JOIN employees AS emp ON emp.department_id = dept_info.department_id
GROUP BY dept_info.department_name;

This query returns the number of employees in each department.

SELECT name,
(SELECT department_name FROM departments WHERE department_id = employees.department_id) as department_name
FROM employees;

This assigns the department name to each employee.

2. CTEs (Common Table Expressions):

Explanation: A CTE is a named temporary result set that exists only for the duration of the query. It’s often used to simplify complex queries, break down large queries into modular blocks, or to reuse calculations. They are defined using the WITH clause followed by the CTE name.

WITH DepartmentCTE AS (
SELECT department_id, department_name
FROM departments
WHERE department_name = 'HR'
)
SELECT employees.name
FROM employees
JOIN DepartmentCTE ON employees.department_id = DepartmentCTE.department_id;

This uses a CTE to fetch the names of employees in the HR department.

Note: CTEs can be especially useful for recursive queries, though the above example doesn’t showcase recursion.

Set Operations

SQL set operations manipulate and compare rows across multiple queries. They provide methods to combine, intersect, or differentiate results from distinct queries.

UNION: This operation merges the results of two or more queries, excluding duplicates.

  • Example: If table_A and table_B list names, the query SELECT name FROM table_A UNION SELECT name FROM table_B would yield a list of names appearing in either table, without repetition.

UNION ALL: It’s similar to UNION but retains duplicates.

  • Example: Using the same tables, SELECT name FROM table_A UNION ALL SELECT name FROM table_B would produce a list of all names, including those that appear in both tables.

INTERSECT: It fetches rows common to both queries.

  • Example: If you want names that appear in both table_A and table_B, SELECT name FROM table_A INTERSECT SELECT name FROM table_B would provide the shared names.

EXCEPT (or MINUS in some databases): This retrieves rows exclusive to the first query, omitting those found in the second.

  • Example: To find names only in `table_A` and not in `table_B`, you’d use `SELECT name FROM table_A EXCEPT SELECT name FROM table_B`. This would return names unique to `table_A`.

Window Functions

Window functions in SQL allow for complex computations across set frames or “windows” of rows, relative to the current row within a query’s result set. Unlike aggregate functions, which return a single value for each group, window functions return multiple values, often operating within partitions while maintaining the original number of rows in the result.

ROW_NUMBER(), RANK() and DENSERANK():

These functions assign a unique number to each row based on specified ordering criteria, with ROW_NUMBER() giving a distinct number to each row, RANK() potentially skipping numbers in case of ties, and DENSERANK() ensuring no numbers are skipped.

SELECT name, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) as row_num,
RANK() OVER (ORDER BY salary DESC) as rank,
DENSE_RANK() OVER (ORDER BY salary DESC) as dense_rank
FROM employees;

This would list employees, ordering them by salary, and assigning row numbers, ranks, and dense ranks accordingly.

LEAD(): It provides access to a subsequent row’s value.

SELECT name, salary,
LEAD(salary) OVER (ORDER BY salary) as next_salary
FROM employees;

For each employee, this would show their salary and the salary of the employee who earns the next higher amount.

LAG(): It accesses data from a preceding row.

SELECT name, salary,
LAG(salary) OVER (ORDER BY salary) as previous_salary
FROM employees;

Similar to the previous example, but now each entry shows the prior employee’s salary.

NTILE(n): It divides the result set into “n” roughly equal parts.

SELECT name, salary,
NTILE(4) OVER (ORDER BY salary DESC) as quartile
FROM employees;

This would split the employees into 4 groups or quartiles based on their salary.

PARTITION BY: Used within window functions, it divides the result set into partitions to perform calculations within.

SELECT department, name, salary,
AVG(salary) OVER (PARTITION BY department) as avg_dept_salary
FROM employees;

This provides the average salary for each department next to every employee’s details.

ORDER BY: Within window functions, it determines the sequence of rows for computations.

SELECT name, date_joined,
ROW_NUMBER() OVER (ORDER BY date_joined) as join_sequence
FROM employees;

This would number employees based on the sequence in which they joined, with the earliest joiner as number 1.

Data Types and Type Conversion

SQL data types specify the kind of data that can be stored in a particular column of a database table, ensuring consistent data storage and preventing unintended data types from being saved. These types also influence the operations and functions that can be applied to the data in the column.

VARCHAR: A variable-length character data type that can store alphanumeric values. Example Function — CONCAT()

SELECT CONCAT(first_name, ' ', last_name) as full_name FROM users;

This function combines the `first_name` and `last_name` columns to create a `full_name` for each user.

INT: An integer data type for storing whole numbers. Example Function — ABS():

SELECT ABS(negative_value) as positive_value FROM numbers;

The ABS() function returns the absolute value of an integer, turning negative numbers into positive ones.

DATE/TIME: A data type for storing date and/or time values.

INSERT INTO logs (message, logged_time) VALUES ('User logged in', NOW());

The NOW() function returns the current date and time, useful for timestamping events or changes.

Handling Nulls

  1. NULL in SQL:

In SQL, NULL is a special marker used to indicate that a data value does not exist in the database. It’s important to understand that NULL is not equivalent to an empty string, zero, or any other default value. Instead, it represents the absence of a value. Hence, comparisons involving NULL often require special functions or conditions, as NULL is not equal to anything, including itself.

2. COALESCE:

The COALESCE function returns the first non-NULL value in a list of expressions.

If we want to return a default value when a column value is NULL, COALESCE can be useful.

SELECT COALESCE(username, 'Guest') as user_name FROM users;

In the above query, if username is NULL, the output will be Guest.

3. NULLIF:

The NULLIF function compares two expressions and returns NULL if they are equal, otherwise it returns the first expression.It can be useful in situations like avoiding division by zero.

SELECT dividend / NULLIF(divisor, 0) as result FROM numbers;

In this query, if the divisor is 0, the NULLIF function will return NULL, thus preventing a division by zero error. If the divisor is non-zero, the division proceeds as normal.

Understanding how NULL behaves and using functions like COALESCE and NULLIF are essential for handling missing or undefined data in SQL, allowing developers to maintain data integrity and avoid potential runtime errors.

Source: Google

--

--

Mohammad Aftab

AWS Data Engineer | Photographer 📸| Data Enthusiast Writes about Data Engineering | Data Science | AWS | Snowflake ❄ | Pyspark | Python | Airflow | Power BI |