Hypothesis Testing Using Northwind Database

Alla Gonzalez
4 min readJul 22, 2019

--

Northwind Traders Entity Relationship Diagram

Northwind Traders is a fictional company that was invented by Microsoft back in the year 2000 to showcase its SQL Server technology. Since then, the Northwind Traders database became a sandbox for many SQL newbies, and it also formed the foundation for this research.

What is Northwind Traders?

Running a few simple queries we find out that Northwind Traders is a company that employs 9 people in the United States and the United Kingdom. They offer 77 food products in 8 categories. The company works with 29 suppliers worldwide and ships to 88 customers worldwide using the services of 3 shipper companies. Northwind Traders processed 11,077 orders offering 10 discount tiers.

One of the main questions for any business in how to increase customer lifetime value which stands for net profit that the company gains out of the future relationship with the customer. While new customer acquisition proves to be costly and time-consuming, many defer to discounts as a means to creating loyalty in the age of a price-conscious consumer. That is why the question we want to answer first is whether the discount amount has a statistically significant effect on the quantity of a product in an order. If yes, at what level(s) of discount?

Based on the question we set, we then formulate the null and alternative hypotheses.

𝐻0: Discount amount does not have a statistically significant effect on the quantity of a product in an order.

𝐻1: Discount amount has a statistically significant effect on the quantity of a product in an order.

a = 0.05

To answer the question, we first do feature engineering, namely, create a Net Price column.

Next, let’s have a look at how many orders were sold at a discounted price and how big that discount was.

Turns out, 1%, 2%, 3%, 4%, and 6% discount only occurred in 8 cases, we’ll drop those rows and continue working with products sold at no discount or at a discount of 5%, 10%, 15%, 20%, and 25%.

When doing visualizations, we find out that our data is not normally distributed and is generally skewed right. Most of the orders weren’t discounted, and the number of orders sold at a discounted price looks to be more or less the same.

Quantity of units ordered at a specific discount tier
Discount histogram

Upon a closer look at orders sold at a discount, we find that the most popular discount is 5%. 5%, 15%, and 25% bring about the same unit quantity to the company. Out of the 5 discount tiers reviewed, a 5% discount brought the company the most revenue.

As we have 5 discount tiers, we’ll proceed with selecting ANOVA test, because it explains the total variance as combination of variances from different groups. The F-distribution will become our test statistic. We set our significance level at 0.05, so we reject the null hypothesis if F>Fa.

Since our F value is higher than our alpha value, we can reject the null hypothesis.

So what did we find out?

Discounts produce a statistically significant effect on the quantity of a product in an order based on the analysis of variance. The data shows that the 5% discount is the best way to move forward because this discount tier was used for the majority of the orders, it sells the highest number of units and brings the most revenue to the company.

--

--

Alla Gonzalez
0 Followers

Code. Run. Travel. Data Science student at Flatiron School.