Q-Q plot

Worksha
Analytics Vidhya
Published in
4 min readApr 24, 2020
Photo by Stephen Dawson on Unsplash

Contents

  1. Definition of Q-Q plot
  2. Q-Q plot interpretation
  3. Q-Q plot in Python
  4. Conclusion

Definition of Q-Q plot

Q-Q plot is a graphical method which is used to check whether two different sets of data are related to same theoretical distributions are not . Here one of two sets of data is generated by us that is we know the type of distribution it is. We check the distribution of other based on our generated data which distribution is known to us.

After we plot the scatter graph using both data sets if the graph forms straight line then we can say that both data is belongs to same distribution or else not. The points need not to be form exactly a straight line sometimes nearly also consider.

Q-Q plot interpretation

Initially in Q-Q plot we assume the type of distribution it is after that we generated one set of data samples of that distribution. Later we select few reasonable quantiles from both data sets. After that both sort in non-decreasing order. Now assume one data is x-coordinates and other is y-coordinates and plot the scatter graph. If the points form nearly straight line then we can say both belong to same distribution . If not our assumption is wrong and we need rebuild our assumption and follow recursive manner until our assumption is true.

For example: I have data points data=[9, 4, 9, 1, 7, 5, 9, 5, 1, 1] . Now, i need to know the distribution of data. So, now I make an assumption as data is normal distribution. Now, I generate 10 data points of mean=5 the normal data is assume=[4, 5, 7, 6, 3, 4, 4, 5, 4, 6]

Now, sort the both data and assume. After sort data=[1, 1, 1, 4, 5, 5, 7, 9, 9, 9] , assume= [3, 4, 4, 4, 4, 5, 5, 6, 6, 7].

Here the length of data and assume is same so we don’t need to find quantiles. We find quantiles when our data is large or assume data and data are not in same length. We don’t need to plot all the data we can get few quantiles in both data sets and plot.

so in our example pick 5 quantiles 1st, 3rd, 5th, 7th, 9th. data=[1,1,5,7,9]. assume=[3,4,4,5,6]. plot the graph using(1,3)(1,4)(5,4)(7,5)(9,6)

Our points didn’t looks like on a straight line. So our assumption of data is normal distributed and it is true. Here , i take very small data and plot that. so you might not be clear about this you will get more understand next.

Q-Q plot in Python

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
# our data is generated
X=[]
for i in range(1000):
X.append(np.random.normal(2,10))
# Generate assume data
li=[]
for i in range(1000):
li.append(np.random.normal(0,1))

# find 100 quantiles of data and plot
quantile_li=[]
quantile_x=[]
for i in range(100):
quantile_li.append(np.percentile(li,i))
quantile_x.append(np.percentile(X,i))
plt.scatter(quantile_li,quantile_x)
plot.show()

Here, for your understanding I take both data and assumed data as normal distributed data. in above diagram the points looks like a straight line means the data belongs to same distribution.

Our Source code can make short using:

# our data is generated
X=[]
for i in range(1000):
X.append(np.random.normal(2,10))
prob=stats.probplot(X,dist=stats.norm,plot=plt)
plt.show()

As both are same points are in same line

Lets plot with different distributions:

X=[]
for i in range(1000):
X.append(np.random.normal(2,10))
prob=stats.probplot(X,dist=stats.uniform,plot=plt)
plt.show()

In Above graph points form a curve shape as both are different distributions.

Conclusion

Q-Q plot is graphical representation of data whether data is belongs to same distribution or not.If the data is belongs to same theoretical distribution it forms nearly a straight line.

--

--

Worksha
Analytics Vidhya

Worksha develops tools to reduce manual effort in our daily life