In the realm of SQL (Structured Query Language), accessing and manipulating data from multiple tables is a common task. Two primary methods for achieving this are Joins and Subqueries. While they serve similar purposes, they differ in their approach and usage. Let’s delve into the details of Joins and Subqueries, understand their characteristics, and compare their advantages and disadvantages.
What are Joins?
Joins are powerful constructs in SQL used to combine records from two or more tables. These tables are linked through common fields, and a join operation fetches data based on specified conditions. Joins can be of various types, including INNER JOIN, OUTER JOIN, LEFT(OUTER) JOIN, RIGHT(OUTER) JOIN and CROSS JOIN. Here’s a brief overview:
- INNER JOIN: Returns only the rows that have matching values in both tables.
- OUTER JOIN: Returns all rows from both tables, including unmatched rows.
- LEFT(OUTER) JOIN: Returns all records from the left table, and the matched records from the right table
- RIGHT(OUTER) JOIN: Returns all records from the right table, and the matched records from the left table
- CROSS JOIN: Produces a Cartesian product of the two tables, resulting in a combination of all rows.
This diagram illustrates the different types of SQL joins, showing how they determine the way records from two tables (TABLE1 and TABLE2) are combined based on their common attributes: Inner Join retrieves only the matching records in both tables, Left Join includes all records from TABLE1 and the matched records from TABLE2, Right Join does the opposite, and Full Outer Join selects all records with a match in either table.
Joins are specified in the FROM clause of a query and can significantly enhance the efficiency of data retrieval. They are essential for fetching data from multiple tables based on relationships.
What is a Subquery?
A Subquery, also known as an Inner query or Nested query, is a query nested within another SQL query. It is enclosed within the WHERE clause and serves various purposes, such as filtering rows, calculating values, or retrieving data conditionally. Subqueries can be used in conjunction with SELECT, INSERT, UPDATE, DELETE, or CREATE VIEW statements.
Why JOINS are preferred over Subqueries
The journey of an SQL query starts with parsing and tokenization, where the query is broken down into individual elements such as keywords (e.g., SELECT, FROM, WHERE) and operators (e.g., =, >, <). Following this, the database management system (DBMS) analyzes the query to devise optimal execution plans for data retrieval. With the execution plan in place, the DBMS begins the process of data retrieval. If the query involves multiple tables, the DBMS performs join operations to combine the relevant data. Subsequently, filtering conditions specified in the WHERE clause are applied to assess each row’s eligibility based on user-defined criteria. Additionally, common aggregation functions like SUM, MIN, MAX, AVG are utilized to perform calculations on grouped data. Upon completion of these operations, the DBMS generates the final result set, culminating the query execution process.
This entire process discusses the execution of a single query. Join operations are seamlessly integrated within the query execution steps. However, when using a subquery, the process involves executing the entire inner query first. Afterward, the table generated from this query is utilized in the WHERE clause to execute the outer query. This fundamental difference underscores why joins are favored over subqueries, particularly when dealing with large volumes of data. Sometimes, though, subqueries are preferred over joins when dealing with smaller datasets or when the complexity of the data requires it, mainly because they offer easier readability.
Example: Let’s consider a scenario where we want to find the names of customers who have a loan of a certain type say “Personal”. We can write this query using both a join and a subquery and then compare the execution times.
USING JOINS
USING SUBQUERIES
Here we observe that the execution time of a subquery is nearly double that of performing joins for a dataset of 11,214 rows. Extrapolating this to the scale of real-time data generation, which can reach into the trillions, amplifies the potential impact, suggesting that the performance gap between subqueries and joins could be even more pronounced.
References:
- Medium Article: How SQL Query works? Execution Order Explained with Example
- W3School: SQL Joins
- DATABASE SYSTEMS-The Complete Book: Chapter 2
- https://github.com/Sanjukta2212/bankdb-analysis-and-mgmt/tree/main