Hypothesis Testing with Python

Apurva Misra
Analytics Vidhya
Published in
3 min readMay 11, 2020

We will start with a brief overview about the idea and then move over to the variety of tests and try to include an example to work with, in Python.

Hypothesis testing is a way to form Statistical Conclusions about the population from data collected from a smaller sample size compared to the population size. Hypothesis is a statement about a parameter that we would want to prove or disprove hence the names:

Null Hypothesis=Ho= Status quo [For example: Treating humans to a particular sunsceen does no change the rate of getting burnt]

Alternate Hypothesis=Ha=Reason why data is being collected[For example: Treating humans to a particular sunscreen does change the rate of getting burnt]

Since, we have not mentioned in what direction is the rate of change of getting burnt after using the sunscreen in Alternative Hypothesis, its a Two-sided test.

Ho: Θ=Θo vs. Ha: Θ ≠ Θo

On the other hand, the one sided hypothesis is when we are being specific about the direction that is the sunscreen decreases the rate of burning.

Ho: Θ ≥ Θo vs. Ha: Θ < Θo

Reference for details on p-values & Type 1 and Type 2 errors:https://www.abtasty.com/blog/type-1-and-type-2-errors/

We want a hypothesis to have small significance and large power.

Common choices: α = 0.05 and β = 0.2

Imagine Netflix was doing a test with the positioning of tabs in their UI(user-interface)- “Browse DVDs” and “Watch Instantly” . They conduct an experiment to collect data to observe whether the positioning of tabs positively or negatively affects streaming service.

Website interface(excuse the design skills)

Here,

  1. Metric of Interest is Average weekly streaming hours.
  2. Design factor is UI and the levels are {DVD tab first(1), Streaming tab first(2)}

Hence, the Hypotheses are:

Ho: μ1 ≥ μ2 & Ha: μ1 < μ2

where μ1 =avg weekly streaming hours in DVD tab first

μ2 = avg weekly streaming hours in streaming tab first

The dataset and the code can be accessed at https://github.com/ApurvaMisra/statistics_experimental_design, here I will just be putting in snippets of the code and results.

Netflix.csv contains observations on number of hours per week that 1000 users(500 in each condition) stream video content.

Testing the hypothesis at a 5% significance level

Hypothesis test will have to be carried out with a t-test. It is a statistic which checks if the two means are reliably different from each other and not purely based on chance. But to determine whether Student’s(variances are equal) or Welch’s version(variances are unequal) of t-test is appropriate we will do an F test for variances to compare σ1²and σ2² for equality.

Here, hypotheses are-

Ho: σ1²= σ2²& Ha: σ1² ≠ σ2²

where ^= values obtained from the samples.

If Ho is rejected we should do Welch’s t-test and vice-versa.

Given that Ho: σ1²/σ2²=1 the observed value of the statistic simplifies to:

Null hypothesis could be rejected in the shaded region that is

Fo> F α/2,n1–1,n2–1 or Fo<F 1-α/2,n1–1,n2–1

Since, the p-value achieved through the F-test is very small< 0.05, we reject the null-hypothesis and conclude that Welch’s t-test is appropriate here.

Welch’s t-test:

Code for F-test and T-test
Corresponding p-values obtained

Hence, we can conclude that the average weekly streaming hours are larger in the “Streaming Tab First” condition relative to the “DVD Tab First” condition.

Disclaimer: “All data and scenarios were drawn from STAT 830: Experimental Design at the University of Waterloo, taught by Professor Stevens.”

--

--