Q#57: Candy Production Increases

The following dataset shows the U.S. candy industry’s ‘industrial production index’.

Given the above data, determine if the production in 2015 is significantly higher than in 2016.

TRY IT YOURSELF

ANSWER

This question tests our Data Science skills mainly in retrieving data and creating a hypothesis test. Since we are testing between two groups independently we will use a T-Test.

Recall from statistics, that a T-Test is a statistical test that compares the difference between the means of two datasets in order to test a hypothesis. For more information review (T-Test Definition (investopedia.com).

First, we will load in the data using a Pandas dataframe. We will specify the first column as our index with the index_col = 0 argument and we will convert the dates to a Pandas datetime object with the parse_dates = True argument. This will help us easily isolate the two datasets of interest using the .loc method.

import pandas as pddf = pd.read_csv('https://raw.githubusercontent.com/erood/interviewqs.com_code_snippets/master/Datasets/candy_production.csv', index_col = 0, parse_dates = True)data_2015 = df.loc['2015']
data_2016 = df.loc['2016']

Next, we will use the Scipy Stats package in python to run our T-Test between the two groups.

tStat, pValue = stats.ttest_ind(data_2015, data_2016, equal_var = False)

The p-value is 0.46 which is not significant, therefore we cannot support the hypothesis that the two groups are different. (Note: if it was significant we still have to check the means of both groups to determine if there was an increase in 2016).

--

--