The Northwind database- Part 2

Mythilyram
10 min readApr 22, 2023

--

CASE WHEN and GROUP BY

I`m going to work with Microsoft’s sample database for SQL Server named Northwind. It describes a fictional store and its customers, suppliers, and orders. You can find the original database in Microsoft’s GitHub repository. The database has been slightly modified.

Classifying Data with CASE WHEN and GROUP BY

To show the availability level for each product in a report, but we don’t want to simply print the units_in_stock column. The column contains a precise number of units available, but we only need an overall level, like 'low' or 'high'.

SELECT
product_id,
product_name,
units_in_stock,
CASE
WHEN units_in_stock > 100 THEN 'high'
WHEN units_in_stock > 50 THEN 'moderate'
WHEN units_in_stock > 0 THEN 'low'
WHEN units_in_stock = 0 THEN 'none'
END AS availability
FROM products;

We create a brand-new column named availability. You can see that we use a special construction called CASE WHEN. It returns different values based on the conditions you specify.

Exercise

We want to create a report measuring the level of experience each Northwind employee has with the company. Show the first_name, last_name, hire_date, and experience columns for each employee. The experience column should display the following values:

  • 'junior' for employees hired after Jan. 1, 2014.
  • 'middle' for employees hired after Jan. 1, 2013 but before Jan. 1, 2014.
  • 'senior' for employees hired on or before Jan. 1, 2013.

Non-matching objects: Our store is now introducing a new shipping cost policy: Any package shipped to the US or Canada is free! Now we want to create a report that shows the updated shipping cost for selected orders. For countries other than the US and Canada, the shipping_cost value is NULL.

Fallback values

So, orders shipped outside the USA and Canada had NULL shipping_cost values. This is because we didn't specify the column value for other countries.

Northwinds’ shipping cost for countries other than the US and Canada is $10.00.

SELECT 
order_id,
customer_id,
ship_country,
CASE
WHEN ship_country = 'USA' OR ship_country = 'Canada' THEN 0.0
ELSE 10.0
END AS shipping_cost
FROM orders
WHERE order_id BETWEEN 10720 AND 10730;

We enriched our CASE WHEN construction with an ELSE. The ELSE is executed when no other conditions are satisfied. Thanks to this, all other countries will get a column value of 10.0 instead of NULL.

Exercise

We want to show the following basic customer information (from the customers table):

  1. customer_id
  2. company_name
  3. country
  4. language

The value of the language the column will be decided by the following rules:

  • 'German' for companies from Germany, Switzerland, and Austria.
  • 'English' for companies from the UK, Canada, the USA, and Ireland.
  • 'Other' for all other countries.

Exercise

Let’s create a report that will divide all products into vegetarian and non-vegetarian categories. For each product, show the following columns:

  1. product_name
  2. category_name
  3. diet_type:
  • 'Non-vegetarian' for products from the categories 'Meat/Poultry' and 'Seafood'.
  • 'Vegetarian' for any other category.

Custom grouping

Before we introduce free shipping to the USA and Canada, we’d like to know how many orders are sent to these countries and how many are sent to other places. Take a look:

SELECT 
CASE
WHEN ship_country = 'USA' OR ship_country = 'Canada' THEN 0.0
ELSE 10.0
END AS shipping_cost,
COUNT(*) AS order_count
FROM orders
GROUP BY shipping_cost;

Because we now group by the CASE WHEN construction, we can add a COUNT(*) column in the SELECT clause.

Exercise

Create a report that shows the number of products supplied from a specific continent. Display two columns: supplier_continent and product_count. The supplier_continent column should have the following values:

  • 'North America' for products supplied from 'USA' and 'Canada'.
  • 'Asia' for products from 'Japan' and 'Singapore'.
  • 'Other' for other countries.

Custom grouping

Exercise

We want to create a simple report that will show the number of young and old employees at Northwind. Show two columns: age and employee_count.

The age column has the following values:

  • 'young' for people born after Jan. 1, 1980.
  • 'old' for all other employees.

CASE WHEN with COUNT

SELECT 
COUNT(CASE
WHEN ship_country = 'USA' OR ship_country = 'Canada' THEN order_id
END) AS free_shipping,
COUNT(CASE
WHEN ship_country != 'USA' AND ship_country != 'Canada' THEN order_id
END) AS paid_shipping
FROM orders;

The query will show:

There’s a CASE WHEN construction inside the COUNT() function. For each row, the CASE WHEN construction checks the value in ship_country. If it's 'USA' or 'Canada', the order_id is passed to COUNT() and counted. If there's a different value in ship_country, CASE WHEN returns a NULL – and a NULL value isn't counted by COUNT(). This way, the free_shipping column will only count orders shipped to the USA or Canada. The paid_shipping column is constructed in a similar way.

You can see that the technique above involves creating a separate column for each group. The query produces a different result than the query with the CASE WHEN in the GROUP BY clause, which listed each group as a row, not a column.

Exercise

How many customers are represented by owners (contact_title = 'Owner'), and how many aren't? Show two columns with appropriate values: represented_by_owner and not_represented_by_owner.

Exercise

Washington (WA) is Northwind’s primary region. How many orders have been processed by employees in the WA region, and how many by employees in other regions? Show two columns with their respective counts: orders_wa_employees and orders_not_wa_employees

GROUP BY with CASE WHEN

We can also use the same technique to split the results into multiple groups:

SELECT 
ship_country,
COUNT(CASE WHEN freight < 40.0 THEN order_id END) AS low_freight,
COUNT(CASE WHEN freight >= 40.0 AND freight < 80.0 THEN order_id END) AS avg_freight,
COUNT(CASE WHEN freight >= 80.0 THEN order_id END) AS high_freight
FROM orders
GROUP BY ship_country;

By combining COUNT(CASE WHEN...) with GROUP BY, we've created a more advanced report that shows the number of orders with low, average, and high freight in each country.

Exercise

We need a report that will show the number of products with high and low availability in all product categories. Show three columns: category_name, high_availability (count the products with more than 30 units in stock) and low_availability (count the products with 30 or fewer units in stock).

CASE WHEN with SUM

SELECT 
SUM(CASE WHEN ship_country = 'USA' OR ship_country = 'Canada' THEN 1
END) AS free_shipping,
SUM(CASE WHEN ship_country != 'USA' AND ship_country != 'Canada' THEN 1
END) AS paid_shipping
FROM orders;

In the above query, we used SUM() with CASE WHEN instead of COUNT(). The CASE WHEN construction inside the SUM() function is very similar to that inside COUNT(), but you can see that we pass a 1 to SUM() when the condition is satisfied. This is a bit different from COUNT(), where we passed in the column name.

Despite their minor differences, both SUM() and COUNT() produce identical results in this query.

Exercise

There have been a lot of orders shipped to France. Of these, how many order items were sold at full price and how many were discounted? Show two columns with the respective counts: full_price and discounted_price.

Summing of business values

Suppose we now want to show the total amount paid for each order alongside the amount paid for non-vegetarian products.

Note: Non-vegetarian products have a category_id of 6 and 8.

SELECT
o.order_id,
SUM(oi.quantity * oi.unit_price * (1 - oi.discount)) AS total_price,
SUM(CASE
WHEN p.category_id in (6, 8) THEN oi.quantity * oi.unit_price * (1 - oi.discount)
ELSE 0
END) AS non_vegetarian_price
FROM orders o
JOIN order_items oi
ON o.order_id = oi.order_id
JOIN products p
ON p.product_id = oi.product_id
GROUP BY o.order_id;

So far, we wrote the queries with SUM(CASE WHEN...) in such a way that they could all be replaced with equivalent COUNT(CASE WHEN...) constructions. Here, however, SUM(CASE WHEN...) is the only option – we want to sum certain values (oi.quantity * oi.unit_price * (1 - oi.discount)) instead of merely counting rows.

Exercise

This time, we want a report that will show each supplier alongside their number of units in stock and their number of expensive units in stock. Show four columns: supplier_id, company_name, all_units (all units in stock supplied by that supplier), and expensive_units (units in stock with a unit price over 40.0, supplied by that supplier).

Exercise

For each product, show the following columns: product_id, product_name, unit_price, and price_level. The price_level the column should show one of the following values:

  • 'expensive' for products with a unit price above 100.
  • 'average' for products with a unit price above 40 but no more than 100.
  • 'cheap' for other products.

Exercise

We would like to categorize all orders based on their total price (before any discount). For each order, show the following columns:

  1. order_id
  2. total_price (calculated before discount)
  3. price_group, which should have the following values:
  • 'high' for a total price over $2,000.
  • 'average' for a total price between $600 and $2,000, both inclusive.
  • 'low' for a total price under $600.

Exercise

Group all orders based on the freight column. Show three columns in your report:

  • low_freight – the number of orders where the freight value is less than 40.0.
  • avg_freight – the number of orders where the freight value is greater than equal to or 40.0 but less than 80.0.
  • high_freight – the number of orders where the freight value is greater than equal to or 80.0.

Summary

  1. A CASE WHEN statement checks for one or more conditions and returns a value when it finds the first matching condition. If there is no ELSE clause and no matching conditions, CASE WHEN returns NULL.
CASE 
WHEN condition_1 THEN result_1
WHEN condition_2 THEN result_2

ELSE result
END

2. To add a new column, and thus a custom classification of business products, you can use CASE WHEN in the SELECT clause:

SELECT 
CASE
WHEN … THEN …
END AS sample_column
FROM table;

3. You can use the custom classification to create your own grouping. In PostgreSQL you can use the CASE WHEN clause in the SELECT clause and use the alias in the GROUP BY:

SELECT 
CASE
WHEN … THEN …
END AS sample_column,
COUNT(*) AS sample_count
FROM table

GROUP BY sample_column;

You can create a custom count of business objects using CASE WHEN inside a COUNT() or SUM() function:

SELECT   
COUNT(CASE
WHEN … THEN column_name
END) AS count_column
FROM table;
SELECT 
SUM(CASE
WHEN … THEN 1
END) AS count_column
FROM table;

If you got all the way down here, thanks for taking the time to review my work, I hope you found it interesting. Any comments, notes and recommendations are welcomed either here or directly through my LinkedIn.

--

--