Introduction To Probability And Statistics in ML
Statistics
Statistics simply means numerical data, and is field of math that generally deals with collection of data, tabulation, and interpretation of numerical data. It is actually a form of mathematical analysis that uses different quantitative models to produce a set of experimental data or studies of real life. It is an area of applied mathematics concern with data collection analysis, interpretation, and presentation. Statistics deals with how data can be used to solve complex problems. Some people consider statistics to be a distinct mathematical science rather than a branch of mathematics.
Install Statistics
To install the statistics we use the command
pip install statistics
Importing the statistics
import statistics as st
Getting started
import statistics as st
import seaborn as sn
Mean() :
It is measure of average of all value in a sample set.
import statistics as st
import seaborn as snn=[2,23,4,55,5,54,5,54,6,15]
print(“Mean of the number is :”,st.mean(n))out : Mean of the number is : 22.3
Mode ():
It is value most frequently arrived in sample set. The value repeated most of time in central set is actually mode.
n=[34,5,5,5,54,343,5,45,4]
print("Mode of the number is :",st.mode(n))out: Mode of the number is : 5
Median() :
It is measure of central value of a sample set. In these, data set is ordered from lowest to highest value and then finds exact middle.
n=[34,5,5,5,54,343,456,56,6,56,4,5,45,4]
print("Median of the number is :",st.median(n))
out: Median of the number is : 20.0
Variance :
It simply describes how much a random variable defers from expected value and it is also computed as square of deviation.
S2= ∑ni=1 [(xi - ͞x)2 ÷ n]n=[34,5,5,5,54,343,5,45,4]
print(st.variance(n))out:
12010.527777777777
Normal Distribution
from scipy.stats import norm
data=norm.rvs(size=10000,loc=0,scale=1)
a=sn.distplot(data,bins=100,kde=True,color='blue',hist_kws={"linewidth":15,"alpha":1})
a.set(xlabel='Normal Distribution', ylabel='Frequency')out:[Text(0, 0.5, 'Frequency'), Text(0.5, 0, 'Normal Distribution')
Poisson Distribution
from scipy.stats import poisson
d=poisson.rvs(size=1000,mu=3)
a=sn.distplot(d,bins=100,kde=False,color='blue',hist_kws={"linewidth":15,"alpha":1})
a.set(xlabel='Poisson Distribution',ylabel='Frequency')out:[Text(0, 0.5, 'Frequency'), Text(0.5, 0, 'Poisson Distribution')]
Gamma Distribution
from scipy.stats import gamma
data=gamma.rvs(a=5,size=10000)
a=sn.distplot(data,kde=True,bins=100,color='blue',hist_kws={"linewidth":15,"alpha":1})
a.set(xlabel='Gamma Distribution',ylabel='Frequency')out:[Text(0, 0.5, 'Frequency'), Text(0.5, 0, 'Gamma Distribution')]
Uniform Distribution
from scipy.stats import uniform
d=uniform.rvs(size=1000,loc=5,scale=20)
a=sn.distplot(d,bins=100,kde=True,color='blue',hist_kws={"linewidth":15,"alpha":1})
a.set(xlabel='Uniform Distribution',ylabel='Frequency')out:[Text(0, 0.5, 'Frequency'), Text(0.5, 0, 'Uniform Distribution')]