Illustration with Python: Weak Law of Large Numbers

Chaya Chaipitakporn
Analytics Vidhya
Published in
3 min readOct 20, 2019

--

The weak law of large numbers states that with a sufficiently larger number of trials there is a very high probability that the mean of observation will be close to an expected value. In other words, as the number of trials goes to infinity the mean of observation convergers to an expected value with a very high probability.

Where Xn is a mean of n trials, μ is an expected value, ε is a margin of error which is greater than 0. The full detail of the theorem can be found in this link.

Before I show the code, there is one thing I want to point out, even though the theorem said about a number of trials, we can use this theorem with sample size if each sample is independent and identically distributed. For example, suppose we are interested in a number of right-handed people and we want to study 100 samples. If we go to a mall and pick one person at a time, ask and repeat 100 times (100 trials), the expectation will be the same as picking 100 people and ask at once. Because each sample has the same expected value, so we sum the expected values and divided by the number of sample and we will get the same expected values. I will use python code to show the theorem as the following steps

Step:

1.) Create a gamma distribution with shape = 2 and scale = 2 as a population.

shape, scale = 2., 2.  # mean=4, std=2*sqrt(2)
s = np.random.gamma(shape, scale, 1000000)

2.) Set a sample size to 100 at first, sample 50 times and collect a mean of each time, then increase the sample size by 500, repeat the step until the sample size reach 8100.

samplemeanlist = [] # list of sample mean
l = [] # list of smaple size, for x-axis of box plots
numberofsample = 50 # number of sample in each sample size

# set sample size (i) between 100 to 8100, step by 500
for i in range(100,8101,500):
# set x-axis
l.append(i)
# list of mean of each sample
ml = []
# sample 50 time.
for n in range(0,numberofsample):
# random pick from population with sample size = i
rs = random.choices(s, k=i)
# calculate the mean of each sample and save it in list of mean.
ml.append(sum(rs)/i)

# save the 50 sample mean in samplemeanlist for box plots.
samplemeanlist.append(ml)

3.) Plot a boxplot of each sample size

# set figure size
plt.figure(figsize=(20,10))
# plot box plots of each sample mean
plt.boxplot(samplemeanlist,labels = l)
# show plot
plt.show()
boxplots of each sample size, the y-axis is mean of samples, the x-axis is sample sizes.

In the plot, we can see that as a sample size increases, the distributions of sample mean decrease and centric around an expected value.

an orange histogram is a distribution of sample means with an 8100 sample size. a blue histogram is a distribution of sample means with a 100 sample size.

I plot two histograms to compare distributions of two sample mean, the blue one is a sample mean with 100 sample size and the orange one is a sample mean with an 8100 sample size.

One last thing, what you should take from this blog is the fact that the sample size has a huge effect on the accuracy of sample means to the expected values. If your study has a large sample size the mean of your sample will be close to the population mean.

The code can be found in this link: Jupyter Notebook, Python file

--

--