The Northwind database- Part 2
CASE WHEN and GROUP BY
I`m going to work with Microsoft’s sample database for SQL Server named Northwind. It describes a fictional store and its customers, suppliers, and orders. You can find the original database in Microsoft’s GitHub repository. The database has been slightly modified.
Classifying Data with CASE WHEN and GROUP BY
To show the availability level for each product in a report, but we don’t want to simply print the units_in_stock
column. The column contains a precise number of units available, but we only need an overall level, like 'low'
or 'high'
.
SELECT
product_id,
product_name,
units_in_stock,
CASE
WHEN units_in_stock > 100 THEN 'high'
WHEN units_in_stock > 50 THEN 'moderate'
WHEN units_in_stock > 0 THEN 'low'
WHEN units_in_stock = 0 THEN 'none'
END AS availability
FROM products;
We create a brand-new column named availability
. You can see that we use a special construction called CASE WHEN
. It returns different values based on the conditions you specify.
Exercise
We want to create a report measuring the level of experience each Northwind employee has with the company. Show the first_name
, last_name
, hire_date
, and experience
columns for each employee. The experience
column should display the following values:
'junior'
for employees hired after Jan. 1, 2014.'middle'
for employees hired after Jan. 1, 2013 but before Jan. 1, 2014.'senior'
for employees hired on or before Jan. 1, 2013.
Non-matching objects: Our store is now introducing a new shipping cost policy: Any package shipped to the US or Canada is free! Now we want to create a report that shows the updated shipping cost for selected orders. For countries other than the US and Canada, the shipping_cost
value is NULL
.
Fallback values
So, orders shipped outside the USA and Canada had NULL
shipping_cost
values. This is because we didn't specify the column value for other countries.
Northwinds’ shipping cost for countries other than the US and Canada is $10.00.
SELECT
order_id,
customer_id,
ship_country,
CASE
WHEN ship_country = 'USA' OR ship_country = 'Canada' THEN 0.0
ELSE 10.0
END AS shipping_cost
FROM orders
WHERE order_id BETWEEN 10720 AND 10730;
We enriched our CASE WHEN
construction with an ELSE
. The ELSE
is executed when no other conditions are satisfied. Thanks to this, all other countries will get a column value of 10.0
instead of NULL
.
Exercise
We want to show the following basic customer information (from the customers
table):
customer_id
company_name
country
language
The value of the language
the column will be decided by the following rules:
'German'
for companies from Germany, Switzerland, and Austria.'English'
for companies from the UK, Canada, the USA, and Ireland.'Other'
for all other countries.
Exercise
Let’s create a report that will divide all products into vegetarian and non-vegetarian categories. For each product, show the following columns:
product_name
category_name
diet_type
:
'Non-vegetarian'
for products from the categories'Meat/Poultry'
and'Seafood'
.'Vegetarian'
for any other category.
Custom grouping
Before we introduce free shipping to the USA and Canada, we’d like to know how many orders are sent to these countries and how many are sent to other places. Take a look:
SELECT
CASE
WHEN ship_country = 'USA' OR ship_country = 'Canada' THEN 0.0
ELSE 10.0
END AS shipping_cost,
COUNT(*) AS order_count
FROM orders
GROUP BY shipping_cost;
Because we now group by the CASE WHEN
construction, we can add a COUNT(*)
column in the SELECT
clause.
Exercise
Create a report that shows the number of products supplied from a specific continent. Display two columns: supplier_continent
and product_count
. The supplier_continent
column should have the following values:
'North America'
for products supplied from'USA'
and'Canada'
.'Asia'
for products from'Japan'
and'Singapore'
.'Other'
for other countries.
Custom grouping
Exercise
We want to create a simple report that will show the number of young and old employees at Northwind. Show two columns: age
and employee_count
.
The age
column has the following values:
'young'
for people born after Jan. 1, 1980.'old'
for all other employees.
CASE WHEN with COUNT
SELECT
COUNT(CASE
WHEN ship_country = 'USA' OR ship_country = 'Canada' THEN order_id
END) AS free_shipping,
COUNT(CASE
WHEN ship_country != 'USA' AND ship_country != 'Canada' THEN order_id
END) AS paid_shipping
FROM orders;
The query will show:
There’s a CASE WHEN
construction inside the COUNT()
function. For each row, the CASE WHEN
construction checks the value in ship_country
. If it's 'USA'
or 'Canada'
, the order_id
is passed to COUNT()
and counted. If there's a different value in ship_country
, CASE WHEN
returns a NULL
– and a NULL
value isn't counted by COUNT()
. This way, the free_shipping
column will only count orders shipped to the USA or Canada. The paid_shipping
column is constructed in a similar way.
You can see that the technique above involves creating a separate column for each group. The query produces a different result than the query with the
CASE WHEN
in theGROUP BY
clause, which listed each group as a row, not a column.
Exercise
How many customers are represented by owners (contact_title = 'Owner'
), and how many aren't? Show two columns with appropriate values: represented_by_owner
and not_represented_by_owner
.
Exercise
Washington (WA) is Northwind’s primary region. How many orders have been processed by employees in the WA region, and how many by employees in other regions? Show two columns with their respective counts: orders_wa_employees
and orders_not_wa_employees
GROUP BY with CASE WHEN
We can also use the same technique to split the results into multiple groups:
SELECT
ship_country,
COUNT(CASE WHEN freight < 40.0 THEN order_id END) AS low_freight,
COUNT(CASE WHEN freight >= 40.0 AND freight < 80.0 THEN order_id END) AS avg_freight,
COUNT(CASE WHEN freight >= 80.0 THEN order_id END) AS high_freight
FROM orders
GROUP BY ship_country;
By combining COUNT(CASE WHEN...)
with GROUP BY
, we've created a more advanced report that shows the number of orders with low, average, and high freight in each country.
Exercise
We need a report that will show the number of products with high and low availability in all product categories. Show three columns: category_name
, high_availability
(count the products with more than 30
units in stock) and low_availability
(count the products with 30
or fewer units in stock).
CASE WHEN with SUM
SELECT
SUM(CASE WHEN ship_country = 'USA' OR ship_country = 'Canada' THEN 1
END) AS free_shipping,
SUM(CASE WHEN ship_country != 'USA' AND ship_country != 'Canada' THEN 1
END) AS paid_shipping
FROM orders;
In the above query, we used
SUM()
withCASE WHEN
instead ofCOUNT()
. TheCASE WHEN
construction inside theSUM()
function is very similar to that insideCOUNT()
, but you can see that we pass a1
toSUM()
when the condition is satisfied. This is a bit different fromCOUNT()
, where we passed in the column name.Despite their minor differences, both
SUM()
andCOUNT()
produce identical results in this query.
Exercise
There have been a lot of orders shipped to France. Of these, how many order items were sold at full price and how many were discounted? Show two columns with the respective counts: full_price
and discounted_price
.
Summing of business values
Suppose we now want to show the total amount paid for each order alongside the amount paid for non-vegetarian products.
Note: Non-vegetarian products have a category_id
of 6 and 8.
SELECT
o.order_id,
SUM(oi.quantity * oi.unit_price * (1 - oi.discount)) AS total_price,
SUM(CASE
WHEN p.category_id in (6, 8) THEN oi.quantity * oi.unit_price * (1 - oi.discount)
ELSE 0
END) AS non_vegetarian_price
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_id
JOIN products p
ON p.product_id = oi.product_id
GROUP BY o.order_id;
So far, we wrote the queries with
SUM(CASE WHEN...)
in such a way that they could all be replaced with equivalentCOUNT(CASE WHEN...)
constructions. Here, however,SUM(CASE WHEN...)
is the only option – we want to sum certain values (oi.quantity * oi.unit_price * (1 - oi.discount)
) instead of merely counting rows.
Exercise
This time, we want a report that will show each supplier alongside their number of units in stock and their number of expensive units in stock. Show four columns: supplier_id
, company_name
, all_units
(all units in stock supplied by that supplier), and expensive_units
(units in stock with a unit price over 40.0
, supplied by that supplier).
Exercise
For each product, show the following columns: product_id
, product_name
, unit_price
, and price_level
. The price_level
the column should show one of the following values:
'expensive'
for products with a unit price above100
.'average'
for products with a unit price above40
but no more than100
.'cheap'
for other products.
Exercise
We would like to categorize all orders based on their total price (before any discount). For each order, show the following columns:
order_id
total_price
(calculated before discount)price_group
, which should have the following values:
'high'
for a total price over $2,000.'average'
for a total price between $600 and $2,000, both inclusive.'low'
for a total price under $600.
Exercise
Group all orders based on the freight
column. Show three columns in your report:
low_freight
– the number of orders where thefreight
value is less than40.0
.avg_freight
– the number of orders where thefreight
value is greater than equal to or40.0
but less than80.0
.high_freight
– the number of orders where thefreight
value is greater than equal to or80.0
.
Summary
- A
CASE WHEN
statement checks for one or more conditions and returns a value when it finds the first matching condition. If there is noELSE
clause and no matching conditions,CASE WHEN
returnsNULL
.
CASE
WHEN condition_1 THEN result_1
WHEN condition_2 THEN result_2
…
ELSE result
END
2. To add a new column, and thus a custom classification of business products, you can use CASE WHEN
in the SELECT
clause:
SELECT
CASE
WHEN … THEN …
END AS sample_column
FROM table;
3. You can use the custom classification to create your own grouping. In PostgreSQL you can use the CASE WHEN clause in the SELECT clause and use the alias in the GROUP BY:
SELECT
CASE
WHEN … THEN …
END AS sample_column,
COUNT(*) AS sample_count
FROM table
…
GROUP BY sample_column;
You can create a custom count of business objects using CASE WHEN
inside a COUNT()
or SUM()
function:
SELECT
COUNT(CASE
WHEN … THEN column_name
END) AS count_column
FROM table;
SELECT
SUM(CASE
WHEN … THEN 1
END) AS count_column
FROM table;
If you got all the way down here, thanks for taking the time to review my work, I hope you found it interesting. Any comments, notes and recommendations are welcomed either here or directly through my LinkedIn.