Advanced SQL Techniques: Subqueries, Joins, and Aggregate Functions.

Kunu Wako
Learning SQL
Published in
9 min readDec 27, 2022
Image by Author

Introduction

As a data student or professional, you know that SQL is a crucial tool for querying and manipulating data in relational databases. In this tutorial, we will delve into some of the more advanced features of SQL, including subqueries, joins, and aggregate functions. These techniques will allow you to perform complex operations on your data and extract insights that would be difficult or impossible using basic SQL commands.

We will start by covering subqueries, which are nested queries that allow you to perform operations on data within a SELECT, INSERT, UPDATE, or DELETE statement. Subqueries are a powerful way to filter, aggregate, and update data, and they are an essential tool for any advanced SQL user.

Next, we will look at the various types of joins available in SQL. Joins allow you to combine data from multiple tables, and they are a crucial technique for working with large and complex datasets. We will explore inner, outer, cross, and self-joins.

Finally, we will cover aggregate functions, which are used to perform calculations on sets of data. Aggregate functions are an essential tool for analyzing and summarizing data, and we will look at common functions such as SUM, AVG, MAX, and MIN. These functions allow you to extract insights from your data by performing calculations such as totals, averages, and maximum and minimum values.

By the end of this tutorial, you should have a solid understanding of subqueries, joins, and aggregate functions, and you should be able to apply these techniques to your own data analysis tasks. Whether you are a data analyst, data scientist, or database administrator, these advanced SQL techniques will help you unlock the full potential of your data.

So let’s get started! In the next section, we will dive into the world of subqueries and explore how to use them to perform complex operations on your data.

Subqueries

A subquery is a SELECT statement that is nested within another SELECT, INSERT, UPDATE or DELETE statement. Subqueries are used to perform operations on data within the context of the outer query. They allow you to perform complex operations that would be difficult or impossible using basic SQL commands.

There are three types of subqueries: scalar, row, and table.

A scalar subquery returns a single value. For example, you might use a scalar subquery to find the maximum salary in the employees table:

SELECT MAX(salary) FROM employees;

A row subquery returns a single row of data. For example, you might use a row subquery to find the employee with the highest salary:

SELECT * FROM employees WHERE salary = (SELECT MAX(salary) FROM employees);

A table subquery returns a table of data. For example, you might use a table subquery to find all employees who earn more than the average salary:

SELECT * FROM employees WHERE salary > (SELECT AVG(salary) FROM employees);

Subqueries can be used in a variety of contexts, including WHERE clauses, SELECT clauses, FROM clauses, and HAVING clauses. For example, you might use a subquery in a WHERE clause to filter records based on the results of another query:

SELECT * FROM orders WHERE customer_id IN (SELECT id FROM customers WHERE country = 'USA');

This query returns all orders placed by customers in the USA. The subquery in the WHERE clause returns a list of customer IDs for customers in the USA, and the outer query uses this list to filter orders placed by those customers.

You can also use subqueries to update or insert data. For example, you might use a subquery in an INSERT statement to insert records based on the results of another query:

INSERT INTO sales_by_month SELECT month, SUM(sales) FROM orders GROUP BY month;

This query creates a new table called sales_by_month and inserts a row for each month in the orders table, containing the month and the total sales for that month.

Subqueries are a powerful tool for performing complex operations on your data, and they are an essential part of the advanced SQL toolkit. However, it’s important to use them wisely, as they can be slow and resource-intensive if not used correctly. Here are a few tips for optimizing the performance of subqueries:

  • Avoid using subqueries in WHERE clauses if possible, as they can be slow to execute. Instead, try to use joins or other methods to combine data from multiple tables.
  • If you must use a subquery in a WHERE clause, try to use an IN or NOT IN operator, as these are generally faster than EXISTS or NOT EXISTS.
  • Avoid using correlated subqueries if possible, as these can be especially slow. A correlated subquery is a subquery that references a column from the outer query, and it must be reevaluated for each row in the outer query.
  • Use EXPLAIN or EXPLAIN ANALYZE to understand the execution plan of your subquery and identify any performance bottlenecks.

By following these tips, you can ensure that your subqueries are fast and efficient, and you can avoid slowing down your database. In the next section, we will continue our exploration of advanced SQL techniques by looking at the various types of joins available in SQL.

Joins

Joins are a crucial technique for combining data from multiple tables in a database. They allow you to retrieve data from multiple tables in a single query, making it easy to work with large and complex datasets.

There are four types of joins in SQL: inner, outer, cross, and self.

An inner join combines rows from two tables based on a matching condition. An inner join is the most commonly used type of join, as it only returns rows that match the join condition. It is fast and efficient, but it may not include all relevant rows if there are no matches in the other table. It returns only the rows that match the condition, and it discards any rows that do not match. For example, you might use an inner join to find all employees who work in a department:

SELECT employees.*, departments.* FROM employees INNER JOIN departments ON employees.department_id = departments.id;

This query returns a row for each employee and their corresponding department, with NULL values for any columns from the other table that do not match.

An outer join is similar to an inner join, but it includes all rows from both tables, even if there is no matching row in the other table. There are three types of outer joins: left, right, and full.

A left outer join includes all rows from the left table (the first table in the FROM clause) and only the matching rows from the right table. For example, you might use a left outer join to find all employees and their departments, even if some employees do not belong to a department:

SELECT employees.*, departments.* FROM employees LEFT OUTER JOIN departments ON employees.department_id = departments.id;

A right outer join is similar to a left outer join, but it includes all rows from the right table (the second table in the FROM clause) and only the matching rows from the left table.

A full outer join includes all rows from both tables, even if there is no matching row in the other table.

A cross join combines every row from the first table with every row from the second table, creating a cartesian product. Cross joins are not commonly used, as they can result in very large tables with many duplicated rows.

Here is an example of a cross join:

SELECT * FROM employees CROSS JOIN departments;

This query returns a row for every combination of employee and department, even if there is no relationship between the two.

A self-join is a type of join that combines rows from a table with other rows from the same table. It is useful for comparing rows within a table or for creating hierarchical data structures.

Here is an example of a self-join:

SELECT e1.name AS employee, e2.name AS manager FROM employees e1 INNER JOIN employees e2 ON e1.manager_id = e2.id;

This query returns a row for each employee and their manager, showing the relationship between employees and their managers.

Outer joins are useful when you want to include all rows from both tables, even if there is no matching row in the other table. Left outer joins are the most commonly used type of outer join, as they include all rows from the left table and only the matching rows from the right table. Right outer joins and full outer joins are less commonly used, as they can result in many NULL values and duplicated rows.

In summary, joins are a powerful tool for combining data from multiple tables in a database. They allow you to retrieve data from multiple tables in a single query, making it easy to work with large and complex datasets. Once more, there are four types of joins in SQL: inner, outer, cross, and self, and each has its own use cases and trade-offs. By understanding the different types of joins and how to use them effectively, you can unlock the full potential of your data.

In the next section, we will continue our exploration of advanced SQL techniques by looking at aggregate functions, which are used to perform calculations on sets of data.

Aggregate Functions

Aggregate functions are used to perform calculations on sets of data. They allow you to extract insights from your data by performing calculations such as totals, averages, and maximum and minimum values. There are several common aggregate functions in SQL, including SUM, AVG, MAX, and MIN.

The SUM function returns the sum of a set of values. For example, you might use SUM to find the total sales for a particular month:

SELECT SUM(sales) FROM orders WHERE month = 'January';

The AVG function returns the average of a set of values. For example, you might use AVG to find the average salary of all employees:

SELECT AVG(salary) FROM employees;

The MAX function returns the maximum value in a set of values. For example, you might use MAX to find the highest salary of all employees:

SELECT MAX(salary) FROM employees;

The MIN function returns the minimum value in a set of values. For example, you might use MIN to find the lowest salary of all employees:

SELECT MIN(salary) FROM employees;

These are just a few of the many aggregate functions available in SQL. Other common aggregate functions include COUNT, STDDEV, and VARIANCE.

You can use aggregate functions in combination with GROUP BY clauses to perform calculations on subsets of data. For example, you might use a GROUP BY clause to calculate the total sales for each month:

SELECT month, SUM(sales) FROM orders GROUP BY month;

This query returns a row for each month, showing the total sales for that month.

You can also use aggregate functions in combination with HAVING clauses to filter groups based on the results of an aggregate function. For example, you might use a HAVING clause to find all departments with an average salary greater than $50,000:

SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 50000;

This query returns a row for each department, showing the average salary for that department. The HAVING clause filters out any department with an average salary of less than $50,000.

In summary, aggregate functions are essential for analyzing and summarizing data in SQL. They allow you to extract insights from your data by performing calculations such as totals, averages, and maximum and minimum values.

Conclusion

In this tutorial, we explored some of the more advanced features of SQL, including subqueries, joins, and aggregate functions. These techniques allow you to perform complex operations on your data and extract insights that would be difficult or impossible using basic SQL commands.

We started by looking at subqueries, which are nested queries that allow you to perform operations on data within a SELECT, INSERT, UPDATE, or DELETE statement. We covered the three types of subqueries: scalar, row, and table, and we discussed how to use them in a variety of contexts.

Next, we examined the different types of joins available in SQL: inner, outer, cross, and self. We explored how to use these joins to combine data from multiple tables, and we discussed the pros and cons of each type of join.

Finally, we covered aggregate functions, which are used to perform calculations on sets of data. We looked at common aggregate functions such as SUM, AVG, MAX, and MIN, and we discussed how to use them in combination with GROUP BY and HAVING clauses to analyze and summarize data.

By mastering these advanced SQL techniques, you can perform complex operations on your data and extract insights that would be difficult or impossible using basic SQL commands.

Go from SELECT * to interview-worthy project. Get our free 5-page guide.

--

--