Introduction To Probability And Statistics in ML

Published in

Analytics Vidhya

3 min readJan 6, 2021

Statistics

Statistics simply means numerical data, and is field of math that generally deals with collection of data, tabulation, and interpretation of numerical data. It is actually a form of mathematical analysis that uses different quantitative models to produce a set of experimental data or studies of real life. It is an area of applied mathematics concern with data collection analysis, interpretation, and presentation. Statistics deals with how data can be used to solve complex problems. Some people consider statistics to be a distinct mathematical science rather than a branch of mathematics.

Install Statistics

To install the statistics we use the command

pip install statistics

Importing the statistics

import statistics as st

Getting started

import statistics as st
import seaborn as sn

Mean() :
It is measure of average of all value in a sample set.

import statistics as st
import seaborn as snn=[2,23,4,55,5,54,5,54,6,15]
print(“Mean of the number is :”,st.mean(n))out : Mean of the number is : 22.3

Mode ():
It is value most frequently arrived in sample set. The value repeated most of time in central set is actually mode.

n=[34,5,5,5,54,343,5,45,4]
print("Mode of the number is :",st.mode(n))out: Mode of the number is : 5

Median() :
It is measure of central value of a sample set. In these, data set is ordered from lowest to highest value and then finds exact middle.

n=[34,5,5,5,54,343,456,56,6,56,4,5,45,4]
print("Median of the number is :",st.median(n))
 
out: Median of the number is : 20.0

Variance :
It simply describes how much a random variable defers from expected value and it is also computed as square of deviation.

S2= ∑ni=1 [(xi - ͞x)2 ÷ n]n=[34,5,5,5,54,343,5,45,4]
print(st.variance(n))out:
12010.527777777777

Normal Distribution

from scipy.stats import norm
data=norm.rvs(size=10000,loc=0,scale=1)
a=sn.distplot(data,bins=100,kde=True,color='blue',hist_kws={"linewidth":15,"alpha":1})
a.set(xlabel='Normal Distribution', ylabel='Frequency')out:[Text(0, 0.5, 'Frequency'), Text(0.5, 0, 'Normal Distribution')

Poisson Distribution

from scipy.stats import poisson
d=poisson.rvs(size=1000,mu=3)
a=sn.distplot(d,bins=100,kde=False,color='blue',hist_kws={"linewidth":15,"alpha":1})
a.set(xlabel='Poisson Distribution',ylabel='Frequency')out:[Text(0, 0.5, 'Frequency'), Text(0.5, 0, 'Poisson Distribution')]

Gamma Distribution

from scipy.stats import gamma
data=gamma.rvs(a=5,size=10000)
a=sn.distplot(data,kde=True,bins=100,color='blue',hist_kws={"linewidth":15,"alpha":1})
a.set(xlabel='Gamma Distribution',ylabel='Frequency')out:[Text(0, 0.5, 'Frequency'), Text(0.5, 0, 'Gamma Distribution')]

Uniform Distribution

from scipy.stats import uniform
d=uniform.rvs(size=1000,loc=5,scale=20)
a=sn.distplot(d,bins=100,kde=True,color='blue',hist_kws={"linewidth":15,"alpha":1})
a.set(xlabel='Uniform Distribution',ylabel='Frequency')out:[Text(0, 0.5, 'Frequency'), Text(0.5, 0, 'Uniform Distribution')]