Hypothesis tests with Python

Valentina Alto
Analytics Vidhya
Published in
5 min readSep 2, 2019

--

In my previous article, I’ve been talking about statistical Hypothesis tests. Those are pivotal in Statistics and Data Science since we are always asked to ‘summarize’ the huge amount of data we want to analyze in samples.

Once provided with samples, which can be arranged with different techniques, like Bootstrap sampling, the general purpose is making inferences on real parameters, belonging to the original populations, by computing so-called statistics or estimators from our sample.

However, we need some kind of ‘insurance’ that our estimates are close to the reality of facts. That’s why we use Hypothesis tests.

In this article, I’m going to provide a practical example with Python, with randomly generated data, so that you can easily visualize all the potential outcomes of the test.

So let’s start by generating our data:

import numpy as np
mu, sigma = 3, 2
s = np.random.normal(mu, sigma, 10000)
import matplotlib.pyplot as plt
count, bins, ignored = plt.hist(s, 30, density=True)
plt.plot(bins, 1/(sigma * np.sqrt(2 * np.pi)) *np.exp( - (bins - mu)**2 / (2 * sigma**2) ),linewidth=2, color='r')
plt.show()

As you can see, I manually generated normally distributed data, with mean=3 and standard deviation=2. Now, the idea is extracting a sample from this population and…

--

--

Valentina Alto
Analytics Vidhya

Data&AI Specialist at @Microsoft | MSc in Data Science | AI, Machine Learning and Running enthusiast