Nerd For Tech
Published in

Nerd For Tech

All Important Window Functions in SQL for Interview

When I started my interview preparation a few months back, I realized mastering analytical functions was extremely important to get hired in any analytical firm.

I started looking for resources on the internet and realized there was hardly any organized data resource available. So I decided to write these functions down here just in case anyone is preparing for an interview.

Introduction

I have a basic expectation that you are experienced in writing SQL queries. Before Moving to the functions let’s first talk about some basic concepts.

I assume you know the order of execution for SQL commands. In case you don’t let’s discuss it once. The query execution starts from “FROM” clause and ends in the order as shown below.

  1. FROM clause
  2. ON clause
  3. OUTER clause
  4. WHERE clause
  5. GROUP BY clause
  6. HAVING clause
  7. SELECT clause (Window Functions are executed with the select clause)
  8. DISTINCT clause
  9. ORDER BY clause
  10. TOP clause

The syntax is self-explanatory since the window function is written within a select clause hence is executed along with the select statement. An important point to remember is that you cannot write a where clause on the column created by the windows functions because where is executed before select. For example, in the below example we cannot write where condition on rnk column since where will be executed before rnk column is created.

select empno, ename, job, hiredate, sal,
rank()
over( order by hiredate
)rnk

from emp
order by sal;

Here rnk is just another column name. the calculations will be made by the function and results will be displayed in the column.

The standard syntax of a windows function is

window_function (<column_name>) OVER (
[ PARTITION BY expr_list ]
[ ORDER BY order_list ]
[ windows clause )

You will get used to this later. Just take a look at it and move on.

Ranking Functions

There are multiple functions that can be used for ranking the rows. When the value is the same, Rank() function will display the same rank value but the next value after that will be equal to the row number for example 1,1,3, and so on if the first two values are the same.

Whereas in dense_rank() we values are continuous. For example, 1,1,2 and so on if the first two values are the same

In the row_number() function the values are according to the row value 1,2,3 and so on even if the first two values are the same.

Rank Function

Table structure

Question Statement: Calculate the rank according to the hire date of the employee and order by the salary of the employee

select empno, ename, job, hiredate, sal,
rank()
over( order by hiredate
)rnk

from emp
order by sal;

Dense_Rank

Question Statement: Calculate the rank according to the salary of the employee and order by the salary of the employee in ascending order

select empno, ename, job, hiredate, sal,
rank()over(order by sal) as rnk,
dense_rank()
over( order by sal
)as sal_dense_rnk

from emp
order by sal
;

Row_Number

Question Statement: Calculate the rank according to the salary of the employee and order by the salary of the employee in ascending order. Use Empno when sal is the same.

select empno, ename, job, hiredate, sal,
row_number() over(
order by sal, empno) as row_num,
rank()over(order by sal) as rnk,
dense_rank()
over( order by sal
)as sal_dense_rnk

from emp
order by sal
;

Percent_Rank, Cume_Dist, NTILE

Table Structure

Problem Statement: Show me distributions of Hollywood movie box office sales using PERCENT_RANK, CUME_DIST, and NTILE (quartile, decile, tertile, etc)

select name, gross_sales,
percent_rank()
over(
order by gross_sales
) as pct,
cume_dist()
over(
order by gross_sales
) as cum_dist,
ntile(4)
over(
order by gross_sales) as ntil
from movies;

Dealing With Nulls

Table structure

Table 1:Moons

Table 2: Planets

Problem Statement: Rank the planets (mars, venus, pluto, mercury, neptune)by their radius in Desc order. Nulls should be displayed last.

select p.name as planet,m.name as moon,  m.radius,
row_number()
over(
order by m.radius desc
) as rnk
from
planets p, moons m
where p.name =m.planet_name(+)
and p.name in ('Mars','Venus','Pluto', 'Mercury', 'Neptune')
select p.name as planet,m.name as moon,  m.radius,
row_number()
over(
order by m.radius desc nulls last
) as rnk
from
planets p, moons m
where p.name =m.planet_name(+)
and p.name in ('Mars','Venus','Pluto', 'Mercury', 'Neptune');

Partition Clause

Partition clause is similar to group by however while both clauses aggregates data, Partition by would not reduce the number of rows while group by rolls up the data. partition by simply creates a window on which we can perform our calculations. Let’s understand this through examples.

Problem statement: i need departmental salaries running total by employee name

Table Structure:

select deptno,empno,ename, job,sal,
sum(sal)
over(
partition by deptno
order by ename) as total
from emp
order by deptno, ename;

Windowing aggregation

We will start using windows clauses now. We can perform mathematical operations on the range of rows.

Problem statement: i need a running total of total ocean coverage across the world, largest to smallest

Table Structure:

select name, type, square_km,        
sum(square_km) over ( order by square_km desc
rows between unbounded preceding and current row ) as tot from water;

Problem statement: I need a running total of total ocean coverage for water type across the world, largest to smallest

select name, type, square_km,
sum(square_km) over (
partition by type
order by square_km desc
rows between 1 preceding and 1 following
) as tot
from water
order by square_km desc
;

Problem statement: 6 months moving average of tweets per day.

select dte, tweet_millions,
avg(tweet_millions)
over(
order by dte
range between interval '6' month preceding and current row) as average
from tweets;

Problem Statement: Show me life expectancy throughout history in the UK compared to 5 years on either side

select recorded_year, age,
first_value(age)
over(
order by recorded_year
range between interval '5' year preceding and interval '5' year following
)as fv,
last_value(age)
over(
order by recorded_year
range between interval '5' year preceding and interval '5' year following
) as lv
from life_expectancy;

Problem Statement: compare daily uber trips in new york city with 2nd lowest/Highest from July to September

Table Structure:

select trip_Date, trip_count,nth_value(trip_count, 2)
over(
order by trip_count
rows between unbounded preceding and unbounded following
)as secnd_lowestfrom trips
order by trip_date

Dealing with Nulls

Problem statement: Fill in the blanks for the null values in DEPTNPO table and count the employees according to departments.

select seq, empno, ename, job, mgr, hiredate, sal, deptno,
last_value(deptno ignore nulls)
over(order by seq) as dept
from external_emp
select dept, count(dept)from 
(select seq, empno, ename, job, mgr, hiredate, sal, deptno,
last_value(deptno ignore nulls)
over(order by seq) as dept
from external_emp) agroup by dept
order by dept

Lag and Lead

This is one of the most important windows functions. In my recent SQL interview, I was asked 3 questions related to lead and lag. The lag function will just provide the data of the one row above.

LAG(expression [,offset[,default_value]]) OVER(ORDER BY columns)
SELECT seller_name, sale_value,LAG(sale_value) OVER(ORDER BY sale_value) as previous_sale_valueFROM sale;

The lead function will do the opposite. It will provide me the values of 1 row below. Let’s do a complex problem now.

Problem statement: Only show the dates when the status of the order changed for that particular order. remove the remaining data.

Table structure

select order_id, status_date,lead_status,status,
lag(status_date)
over(partition by order_id order by status_date ) from_date,
status_date to_date
from
(select
order_id,
status_date,
status,
lag(status) over (partition by order_id order by status_date) lag_status,
lead(status) over (partition by order_id order by status_date) lead_status
from ORDERS)a

where lag_status is null
or lead_status is null
or lead_status <> status
order by order_id, from_date nulls first

Conclusion

These functions will make your life easier if you can use them wisely. I have been asked about window functions in all of my interviews so make sure to practice them. Let me drop the youtube channel name that greatly helped in mastering these functions. You can click here to watch the videos

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Anmol Hans

Anmol Hans

Geopolitics and Data Science enthusiast.