SQL Questions for Data Jobs

5 Important SQL Concepts for Your Next Data Science Interview

SQL Interview Questions

Shashvat G
The Startup

--

Photo by Free To Use Sounds on Unsplash

SQL is everywhere. It is probably one of the most widely used languages today and also the primary language for retrieving data from databases. It came into existence after E.F. Codd laid the foundation of the modern relational database in his groundbreaking paper A Relational Model of Data for Large Shared Data Banks.

As a beginner, I was intimidated by different SQL( Structured Query Language) Flavors(Oracle SQL, MySQL, PostgreSQL) but I found out that the basic concept remains the same and they only differ in terms of syntax and implementation. However, In this post, we will use Oracle SQL for query and demonstration. Notably, NoSQL databases are popular nowadays as the majority of the data produced today is non-relational. Nevertheless, this does not mean SQL days are over. It’s not going anywhere. Not anytime soon. It is one of the skills most asked for in the Data job space.

Without wasting any more time, Let’s get started.

  1. SQL Conditional Selects — WHERE/HAVING Clause

As the name suggests, it’s about fetching data that satisfies a given condition.

SQL has a WHERE clause that enables us to filter records that do not meet our desired criteria. We also have a HAVING clause to filter records.

Why do we have two separate clauses for filtering? Well, they are different. Let’s understand this using an example.

Can you guess what’s going on in this query:

SELECTempno, ename, AVG(sal)FROMsalaryWHEREdeptno NOT IN (10,20,30)GROUP BY empno, enameHAVING AVG(sal) >= 10000ORDER BY empno DESC;

We are only fetching employee number, employee name, and their average salaries from the Salary table for employees who are not in department 10, 20 or 30, and whose average salary is greater than or equal to 10000 in decreasing order of employee number.

You probably get a clear picture now. SQL WHERE Clause filters out records before any grouping happens, on the other hand, the HAVING clause works on the grouped data and removes records that do not meet the filter criteria.

HAVING cannot be used as a replacement of WHERE clause, it’s only there as a supplement, at times you would not need where clause at all, and only a having clause would suffice. For instance, in the above example, if we had to select all departments, we can simply get rid of where clause and the query would be

SELECTempno, ename, AVG(sal)FROMsalaryGROUP BY empno, enameHAVING AVG(sal) >= 10000ORDER BY empno DESC;

2. Removing Duplicates in SQL

Let’s say, you are working with a Student database and the database has a table called Student which is something like this:

Image By Author

Now, The Student table has a duplicate record for SID 20. How do you delete the duplicate record?

Well, there are different ways to do it. One of them would be to identify and delete duplicates using ROWID.

ROWID is a unique identifier for each row stored in the database. It consists of the data block address in the data file in which the particular row resides. It is probably the fastest way to locate a row and yes, it is faster than the primary key lookup.

We can write a query something like this:

DELETE FROM STUDENT WHERE rowid NOT IN (SELECT MAX(rowid) FROM STUDENT GROUP By SID);

What’s happening here? Firstly, we are fetching the max of rowid split by Student IDs and then only keeping the next highest rowid. Please note that you can also use greater than (>) or less than(<) operator instead of NOT IN depending upon the requirement.

Another way to delete duplicates can be using a count function based on SID. We can simply find records that have a count greater than 1 split by their marks and subject.

DELETE FROM STUDENT WHERE SID IN (SELECT SID FROM STUDENT GROUP BY Subject, Marks HAVING COUNT(*) > 1));

If you just want to see the list of Student id which are duplicate, you can try this one:

SELECT SID
FROM (
SELECT SID,COUNT(SID) As count
FROM STUDENT
GROUP BY SID
) As sid_count
WHERE count > 1

The above query uses inline views which is nothing but a subquery in the FROM clause of a SELECT statement and selecting student ids that occur more than once in the student table.

3. SQL CASE

SQL CASE is essentially the EXCEL equivalent of IF-THEN logic. Let’s understand it with an example -

Suppose you are working for a hotel management company and you have a table Rooms that contain room id, room type, and charges per night. You want to categorize items into three classes — Standard(up to 2000), Deluxe(2000–4000), and Luxury(more than 4000 ) based on charges and return NA if the price is null sorted in descending order of type.

The query can be written as:

SELECT room_type, room_price, 
CASE
WHEN room_price > 0 AND room_price < 2000 THEN ‘Standard’
WHEN room_price >= 2000 AND room_price <= 4000 THEN ‘Deluxe’
WHEN room_price > 4000 THEN ‘Luxury’
ELSE ‘NA’
END room_category
FROM Rooms
ORDER BY room_type DESC ;

Note: If you wish to create a pivot table in SQL, one way to do it is by using a combination of CASE and Aggregate functions like COUNT.

4. SQL Window Functions

SQL window function performs a calculation across a set of table rows that are somehow related to the current row. This is achieved using an aggregate function but it doesn’t return single output aggregated over a set of rows.

You have an employee table with columns — employee and their respective salary. How do you find the Nth highest salary?

SQL Window function can be used to achieve that.

SELECT * 
FROM
(
SELECT ENAME, SAL, dense_rank()
OVER(ORDER BY sal desc) as rank
FROM
Employee
)
WHERE
rank = &N;

You can enter your desired N and it should return the Nth highest salary. Notably, the above query uses dense rank, you can also get the desired result with rank.

Maybe, Try replacing dense rank with rank and find out the difference between them?

You can read more on the Window functions here.

5. SQL Joins

Joins come into the picture when you need to get data from multiple tables and combine them, may be based on a certain condition or some common column. There are several types of joins available — INNER, OUTER(LEFT, RIGHT), SELF JOIN.

For instance, Let’s say you work for an e-commerce company and you have two different tables for the order management- orders table and product table.

Order and Customers tables have below structure:

********************Orders***************| Order_ID |  ORDER_DATE     | CUSTOMER_ID|********************************************|       1 |       2017-04-03 |          10 ||       2 |       2016-01-22 |          20 ||       3 |       2020-06-23 |          30 ||       4 |       2019-01-07 |          40 |***************************************************************Customers****************| CUSTOMER_ID|  CUSTOMER_NAME     | COUNTRY|********************************************|       10 |      DOMINIC B |          UK ||       30|       PRAGYA C  |          IN ||       40|       JACK D    |          FR ||       50|       KRISHNA M |          IT |********************************************

How do you get a list of customer names and their respective order dates?

Joins can help you here.

SELECT cust.CUSTOMER_NAME, ord.ORDER_DATE
FROM
Orders ord
INNER JOIN
Customers cust
ON
ord.CUSTOMER_ID=cust.CUSTOMER_ID;

If you want to practice further, here’s a common question asked in SQL interviews —

How do you find a list of employees and their managers if given an employee table? Make necessary assumptions.

Hint: Use Self Joins

Bonus Section

  • SELECT DISTINCT Statement

The SELECT DISTINCT statement can be useful when you want to return unique values ignoring duplicates.

UNION is a set operator in SQL and is used to combine result sets of two queries given they have the same number of columns, similar datatype, and in the same order.

UNION ALL is essentially the same as UNION except it includes duplicates, unlike UNION.

These functions aggregate over a column and return a single value.

Ex — AVG(), COUNT(), MAX(), MIN() and SUM(). They can also be used to gather basic statistics.

Conclusion

SQL is quite popular and often a common requirement in Data jobs — be it Data scientist or Analyst. In this post, we covered some important SQL concepts that should help you prepare for your next interview.

Clearly, there is more to SQL and hopefully, we will cover more in the next posts. Stay Tuned 😊

Happy to hear your thoughts on this one!

If you found this useful and know anyone you think would benefit from this, please feel free to send it their way.

--

--

Shashvat G
The Startup

Data Scientist | Analyst who aspires to continuously learn and grow in Data Science Space. Find him on LinkedIn https://www.linkedin.com/in/shashvat-gupta/