A/B Testing Using the Northwind Dataset

Andy Martin del Campo
Analytics Vidhya
Published in
4 min readOct 16, 2019
Photo by Agence Olloweb on Unsplash

For my most recent project, I was tasked with using the fictional Northwind dataset to showcase hypothesis or A/B testing. For this blog post I wanted to look into the question “Does a country the order is shipped to have a statistically significant effect on discounts?”

First I had to develop my null and alternative hypotheses. The null hypothesis is that there is no effect on discounts based on where the order is being shipped to. This means that the average discount value is the same for all countries. The alternative hypothesis is that certain countries being shipped to have either a higher or lower average discount.

Now that the hypotheses have been established, the data needs to be explored. The data is stored in an SQLite file and needs to be converted to a pandas dataframe for easier viewing and manipulation.

def query_to_df(query):
“””Takes in a SQL query and returns a pandas dataframe with column names”””
cur.execute(query)
df = pd.DataFrame(cur.fetchall())
df.columns = [x[0] for x in cur.description]
return df

The above code is a function that takes in a SQL query and returns a pandas dataframe with column names ready to be worked on. The information I needed was the countries the orders were shipped to and the discount on each order. A simple SQL query if you are already familiar with writing them. For a reminder on SQL you can visit this post.

query4 = “””SELECT ShipCountry, OrderId, Discount
FROM [Order]
JOIN OrderDetail
ON [Order].Id = OrderDetail.OrderId;”””

With the pandas dataframe, I was able to determine all of the countries that had an order shipped to them and what discount level was given. The countries are considered as categorical variables and will be treated as such. The first test I want to run is an ANOVA test comparing the “ShipCountry” column to the “Discount” column

Analysis of variance (ANOVA) is a statistical technique that is used to check if the means of two or more groups are significantly different from each other. ANOVA checks the impact of one or more factors by comparing the means of different samples. In this case the samples will be each ‘ShipCountry’ and we want to see if there is any difference in the means of the ‘Discount’ per order.

Before the ANOVA test I plotted each ShipCountry’s Discount distribution:

country_list = df4.ShipCountry.tolist()
country_list = np.unique(np.array(country_list))
plt.figure(figsize=(10,8))
plt.title(‘Distribution of Discounts per ShipCountry’)
for country in country_list:
country_df = df4[df4[‘ShipCountry’] == country]
sns.distplot(country_df.Discount, label=country)

plt.legend();

The ANOVA test will tell if there is a statistical significance from the ShipCountry. The ANOVA test is relatively easy to run and can be interpreted easily as well.

import statsmodels.api as sm
from statsmodels.formula.api import ols
formula = ‘Discount ~ C(ShipCountry)’
lm = ols(formula, df4).fit()
table = sm.stats.anova_lm(lm, typ=2)
print(table)
sum_sq df F PR(>F)
C(ShipCountry) 0.609834 16.0 5.662696 3.781260e-12
Residual 14.390506 2138.0 NaN NaN

The above code returns a clean table from which to draw conclusions from. The two important values are the F value and the PR(>F) or p-value. The F-statistic measures if the means of different samples are significantly different. Using an F-table you can then find the p-value or in this case, it is already given. The value is very, very small (3.781260e-12). Much smaller than the usual threshold of 0.05. The null hypothesis can be rejected. There is probably a significant difference between certain countries and more tests need to be run.

A bar chart showing the average discount values can help to visualize the size of the differences in Discount averages:

In order to evaluate each country, further analysis is needed. The ANOVA test has told us there is a significant difference from country to country; however, the ANOVA test does not specify which countries. Post Hoc tests will have to be considered for further exploration. A few tests to choose from would be Welch’s T-Tests on each country with a Bonferroni correction or Tukey’s HSD.

--

--

Andy Martin del Campo
Analytics Vidhya

Aspiring Data Scientist with a background in Electrical Engineering and Networking. Passionate about motorcycles and coffee.