Hypothesis testing with the Northwind database

Levi Raichik
3 min readJul 25, 2019

--

There were a few questions I wanted to try and figure out from this database, and I will present one of them here.

Does discount amount have a statistically significant effect on the quantity of a product in an order? If so, at what level(s) of discount?

Null Hypothesis: Discount does not affect quantity of product ordered
Alternative Hypothesis: Discount does effect the quantity of product ordered
Alpha: .05

First I needed to answer if we can reject the null hypothesis in general or not. For this I made an ANOVA test:

As we can see the p-value is very low so we are able to reject the null hypothesis that discounts do not affect the quantity sold.
I also checked the effect size of this and the difference in means of quantity sold for the discount and no discounts was 5.68, with a Cohen’s d of .39 so that is about a medium effect. The power level for this was 1 so I did not need to worry about many type 2 errors here.

I then needed to figure out at what levels of discounts is this true and to what extent.

My approach was to first see how many of each discount amount there is in the dataset.

Based on this, I grouped .01-.06 discounts into one category, after splitting the database into a discount and no discount dataframe. I also cut down on a few outliers that were there.

Then I made an ANOVA test on all the different discounts with these results:

Based on this I can reject the null hypothesis on all discount levels except with Dis_1, which refers to 10% discounts.

This held up when looking at the effect sizes of each discount when compared to quantity sold of no discount as well.
The respective difference in means, Cohen’s d, and power were:

01-06%
Difference in means: 5.01
Cohen’s d: 0.37
Power: 0.95

10%
Difference in means: 3.47
Cohen’s d: 0.26
Power: 0.69
As we see from the effect size and Cohen’s D, the effect of a %10 discount is low, so this confirms our P-value above that told us the same thing.
We also have a power under .8 so our type 2 errors would be higher as well

15%
Difference in means: 6.55
Cohen’s d: 0.50
Power: 1.0

20%
Difference in means: 6.45
Cohen’s d: 0.49
Power: 0.99

25%
Difference in means: 7.29
Cohen’s d: 0.56
Power: 0.99

As we can see the 15-25% discounts have similar differences in means and Cohen’s d, so I would not say that going above 15% is a useful business practice for the products being sold by this company. We can see that in this bar graph as well.

With regards to profits, I was unable to get the cost of the item for the company from this data so I was not able to figure out which discount level would work best from a profit margin standpoint.

I would suggest further work be done on 10% discounts as it seems weird that a lower and higher discount then that has a bigger effect on quantity sold but that specific discount does not.

--

--

Levi Raichik

Data scientist and machine learning engineer with a passion for turning raw data into useful, actionable insights. https://www.linkedin.com/in/levi-raichik/