Dunder Mifflin’s competition reached out to me — Here’s what I found out

An analysis of the Parch and Posey paper company

Oreoluwa Olatunji
10 min readNov 16, 2023

ABOUT PARCH & POSEY

Parch & Posey, a fictional company created by Udacity as part of its SQL training, is a paper company that sells three kinds of paper to fortune 100 companies with the help of 50 sales reps spread across four regions in the United States.

PROBLEM STATEMENT

The stakeholders at Parch & Posey came together to discuss the company and see how it had fared over the years. I was tasked with making discoveries about the company to see where it had improved and where there was further room for improvement. The analysis was carried out using postgresql.

DATA

The Parch & Posey database schema contained 5 tables:
1). accounts.
2). orders.
3). web_events.
4). sales_reps.
5). region.
The Entity Relationship Diagram (ERD) can be seen below:

The ERD of the Parch & Posey Database

DATA EXPLORATION AND ANALYSIS

The exploration and analysis was broken down into 5 categories:
1.) Company analysis.
2). Region analysis.
3). Sales Rep analysis.
4). Web Events analysis.
5). Paper analysis.

Before beginning the analysis, I needed to find out the timeframe I was working with:

SELECT 
MIN(occurred_at) AS earliest_order,
MAX(occurred_at) AS latest_order
FROM orders;

The earliest order was placed on the 4th of December, 2013 and the latest order was placed on the 2nd of January, 2017.

It was also important to note the years and months present in the dataset:

SELECT 
DISTINCT DATE_part('year', occurred_at) AS year,
DATE_part('month', occurred_at) AS month
FROM orders
ORDER BY 1;
A section of the result

The years 2014, 2015 and 2016 had all 12 months present but 2013 had only December present and 2017 had only January present in the dataset.

With these information above, I could go ahead with the analysis.

Company analysis

  1. How many companies have an account with Parch & Posey?
SELECT COUNT(*) AS no_of_companies
FROM accounts;

A total of 351 companies have an account with Parch & Posey.

2. How many orders were made by companies in total?

SELECT COUNT(*) AS no_of_orders
FROM accounts AS a
JOIN orders AS o
ON a.id = o.account_id;

A total of 6,912 orders were placed between the 4th of December, 2013 and 2nd of January, 2017.

3. What companies made the most orders?

SELECT a.name AS company, COUNT(*) AS no_of_orders
FROM accounts AS a
JOIN orders AS o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;

Leucadia National placed 71 orders, making it the company with the most orders placed.

4. What company made no orders?

SELECT a.name AS company
FROM accounts AS a
LEFT JOIN orders AS o
ON a.id = o.account_id
WHERE o.total IS NULL;

Goldman Sachs Group placed no orders to get paper from Parch & Posey.

5. What companies bought the most paper?

SELECT a.name AS company, 
COUNT(*) AS no_of_orders,
SUM(total) AS no_of_papers
FROM accounts AS a
JOIN orders AS o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 3 DESC
LIMIT 5;

With a total of 56,410 units of paper, EOG resources bought the most paper.

6. What are the most profitable companies in terms of revenue?

SELECT a.name AS company,
COUNT(*) no_of_orders,
SUM(total_amt_usd) AS revenue
FROM accounts AS a
JOIN orders AS o
ON a.id = o.account_id
GROUP BY 1
ORDER BY 3 DESC
LIMIT 5;

EOG Resources proved to be a good source of income for the company with $382,873.30 in revenue.

7. What was the total number of paper bought, total revenue, average number of paper bought & average revenue per order?

SELECT 
SUM(total) AS total_paper_units,
ROUND(AVG(total), 2) AS paper_per_order,
SUM(total_amt_usd) AS total_revenue,
ROUND(AVG(total_amt_usd), 2) AS revenue_per_order
FROM accounts AS a
JOIN orders AS o
ON a.id = o.account_id;

A total of 3,675,765 units of paper was purchased which amassed a revenue of $23,141,511.83.

8. How many companies had a minimum of the average number of orders and a revenue greater than the average revenue?

SELECT COUNT(*) AS no_of_companies
FROM(
SELECT
a.name AS company,
COUNT(*) AS no_of_orders,
SUM(o.total_amt_usd) AS revenue
FROM accounts AS a
JOIN orders AS o
ON a.id = o.account_id
GROUP BY 1
HAVING COUNT(*) >= 19 AND SUM(o.total_amt_usd) > (
SELECT ROUND(AVG(total_amt_usd), 2) AS revenue_per_order
FROM orders)
ORDER BY 2) AS t1;

A total of 134 companies made at least 19 orders and spent at least $3348.02 on all orders.

Analysis by region

  1. What are the regions present in the database?
SELECT name as region
FROM region;

Four regions — Northeast, Midwest, Southeast and West.

2. How many companies are present in each region?

SELECT 
r.name AS region,
COUNT(DISTINCT a.name) AS no_of_companies
FROM region AS r
JOIN sales_reps AS s
ON r.id = s.region_id
JOIN accounts AS a
ON a.sales_rep_id = s.id
GROUP BY 1;

Northeast had the most companies in its region, 106.

3. How many sales reps are present in region?

SELECT 
r.name AS region,
COUNT(*) AS no_of_reps
FROM region AS r
JOIN sales_reps AS s
ON r.id = s.region_id
GROUP BY 1;

Northeast also had the most sales reps in its region, 21.

4. What are the number of orders, papers purchased and revenue by region?

SELECT 
r.name AS region,
COUNT(*) AS no_of_orders,
SUM(o.total) AS papers_sold,
SUM(o.total_amt_usd) AS revenue
FROM region AS r
JOIN sales_reps AS s
ON r.id = s.region_id
JOIN accounts AS a
ON a.sales_rep_id = s.id
JOIN orders AS o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 3;

Companies in the Northeast placed the most orders (2,357), bought the most paper (1,230,378) and amassed the most revenue ($7,744,405.36).

5. What was the most popular channel in each region?

SELECT
region,
channel,
no_of_events
FROM(SELECT
r.name AS region,
w.channel AS channel,
COUNT(*) AS no_of_events,
RANK() OVER (PARTITION BY r.name ORDER BY COUNT(*) DESC) AS ranking
FROM region AS r
JOIN sales_reps AS s
ON r.id = s.region_id
JOIN accounts AS a
ON a.sales_rep_id = s.id
JOIN web_events AS w
ON w.account_id = a.id
GROUP BY 1, 2
ORDER BY 1, 3 DESC) AS t1
WHERE ranking = 1;

The direct method of marketing was the most popular across all regions.

Analysis by Sales Reps

  1. Who are the sales reps who handled the most orders?
SELECT 
s.name AS sales_rep,
COUNT(*) AS no_of_orders
FROM sales_reps AS s
JOIN accounts AS a
ON s.id = a.sales_rep_id
JOIN orders AS o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;

In the timeframe provided, Earlie Schleusner handled the most orders with a number of 335.

2. Who are the sales reps who had the most paper sales?

SELECT 
s.name AS sales_rep,
SUM(total) AS total_paper_sold
FROM sales_reps AS s
JOIN accounts AS a
ON s.id = a.sales_rep_id
JOIN orders AS o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;

Earlie Schleusner, who had the highest number of orders, sold the most number of paper, 174,430 units of paper sold.

3. Who are the sales reps who had the most revenue from paper sales?

SELECT 
s.name AS sales_rep,
SUM(total_amt_usd) AS revenue
FROM sales_reps AS s
JOIN accounts AS a
ON s.id = a.sales_rep_id
JOIN orders AS o
ON o.account_id = a.id
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5;

With the most orders and most number of papers sold, it is no surprise Earlie Schleusner made the most revenue for the company amassing $1,098,137.72 in total.

4. What channel was the most used by each sales rep?

SELECT
sales_rep,
channel,
no_of_events
FROM(
SELECT
s.name AS sales_rep,
w.channel AS channel,
COUNT(*) AS no_of_events,
RANK() OVER(PARTITION BY s.name ORDER BY COUNT(*) DESC) AS ranking
FROM sales_reps AS s
JOIN accounts AS a
ON s.id = a.sales_rep_id
JOIN web_events as w
ON w.account_id = a.id
GROUP BY 1, 2
ORDER BY 1, 3 DESC
) AS t1
WHERE ranking = 1;
A section of the results showing the sales reps and their preferred channels

All 50 sales reps used the direct method of marketing the most.

Analysis by Web Events

  1. What are the channels used by Parch & Posey?
SELECT DISTINCT(channel) AS channels
FROM web_events;

The channels listed above were used as part of the marketing strategies.

2. What was the most popular channel and its percentage?

SELECT 
channel,
no_of_events,
ROUND(no_of_events/(total_events + 0.01) * 100, 2) AS percentage
FROM(
SELECT
channel,
COUNT(*) AS no_of_events,
(SELECT COUNT(*) FROM web_events) AS total_events
FROM web_events
GROUP BY 1
ORDER BY 2 DESC
) AS t1;

The direct channel was the most popular, accounting for 58.39% of all events.

3. Channel usage by Companies.

SELECT
channel,
no_of_companies,
ROUND(no_of_companies/((SELECT COUNT(*) FROM accounts) + 0.01) * 100, 2) AS percentage
FROM(
SELECT
w.channel AS channel,
COUNT(DISTINCT a.name) AS no_of_companies
FROM web_events AS w
JOIN accounts AS a
ON a.id = w.account_id
GROUP BY 1
ORDER BY 2 DESC
) AS t1;

All companies that have accounts with Parch & Posey used the direct method at least once to communicate with the sales reps. On the other hand, only about 53% used twitter to communicate.

4. Channel usage by Sales Reps.

SELECT
channel,
no_of_reps,
ROUND(no_of_reps/((SELECT COUNT(*) FROM sales_reps) + 0.01) * 100, 1) AS percentage
FROM(
SELECT
w.channel AS channel,
COUNT(DISTINCT s.name) AS no_of_reps
FROM web_events AS w
JOIN accounts AS a
ON a.id = w.account_id
JOIN sales_reps AS s
ON s.id = a.sales_rep_id
GROUP BY 1
ORDER BY 2 DESC
) AS t1;

Banners and direct communication were utilised by all sales reps at least once.

Analysis By Paper

  1. What is the unit price of each type of paper?
SELECT 
ROUND(SUM(gloss_amt_usd)/SUM(gloss_qty), 2) AS gloss_unit_price,
ROUND(SUM(standard_amt_usd)/SUM(standard_qty), 2) AS standard_unit_price,
ROUND(SUM(poster_amt_usd)/SUM(poster_qty), 2) AS poster_unit_price
FROM orders;

The gloss paper sold for $7.49 per unit, poster at $8.12 and standard at $4.99, making standard paper the cheapest and poster paper the most expensive.

2. What is the sum total of all papers sold and revenue by paper?

SELECT
SUM(gloss_qty) AS gloss_paper_sold,
SUM(gloss_amt_usd) AS gloss_revenue,
SUM(standard_qty) AS standard_papers_sold,
SUM(standard_amt_usd) AS standard_revenue,
SUM(poster_qty) AS poster_papers_sold,
SUM(poster_amt_usd) AS poster_revenue
FROM orders;

Standard paper sold the most and also generated the most revenue selling 1,938,346 units and amassing $9,672,346.54 in revenue.

3. How did each paper perform by year?

SELECT 
DATE_PART('year', occurred_at) AS year,
SUM(gloss_qty) AS gloss_paper_sold,
SUM(gloss_amt_usd) AS gloss_revenue,
SUM(standard_qty) AS standard_papers_sold,
SUM(standard_amt_usd) AS standard_revenue,
SUM(poster_qty) AS poster_papers_sold,
SUM(poster_amt_usd) AS poster_revenue
FROM orders
GROUP BY 1
ORDER BY 1;

2013 and 2017 only had a month each in the table so the measurement would not be accurate. However, 2016 was the most profitable year across the different types of paper and there was an upward progression in revenue from 2014 to 2016.

4. How did the company perform by year?

SELECT 
DATE_PART('year', occurred_at) AS year,
SUM(total) AS papers_sold,
SUM(total_amt_usd) AS total_revenue
FROM orders
GROUP BY 1
ORDER BY 3 DESC;

As stated earlier, the years 2013 and 2017 were not properly represented. 2016 was the most profitable year ($12,864,917.92) and there was an upward projection in revenue from 2014–2016.

5. What was the running total of revenue by month and year?

SELECT
year,
month,
total_revenue,
SUM(total_revenue) OVER (ORDER BY year, month) AS running_total
FROM(
SELECT
DATE_PART('year', occurred_at) AS year,
DATE_PART('month', occurred_at) AS month,
SUM(total_amt_usd) AS total_revenue
FROM orders
GROUP BY 1,2
ORDER BY 1,2
) AS t1;
A section of the result displaying the running total

The dataset had a total of 38 months. The running total from December, 2013 to January, 2017 was calculated and the latest total for January, 2017 was $23,141,511.83.

INSIGHTS

1). The company had an increase in profits at the end of the year two years running, 2014 to 2015 and 2015 to 2016. Without, January — November, 2013, I was unable to ascertain revenue growth from 2013 to 2014.

2). For every sales rep in the Midwest and Northeast, there are 5 companies but in the Southeast, 1 sales rep to 9 companies and in the West, 1 sales rep to 10 companies. Hiring more sales reps is an avenue the company could pursue to look at improving revenue in the Southeast and West and also boosting client satisfaction.

CONCLUSION

I had a wonderful time using and understanding various SQL functions in this project. Further analysis will be done and in addition, an article on visuals will also be published subsequently. Thank you for reading!

UPDATE: The link to the further analysis can be found here and the link to the visualisation, here.

You can connect with me on LinkedIn. All csv files and codes used for this project can be found on my GitHub page.

--

--