What is Pearson’s Correlation Coefficient ‘r’ in Statistics | Analytics Steps

Ritesh Pathak
Analytics Steps
Published in
4 min readDec 17, 2020

Statistics finds its use in various disciplines in our lives. Nowadays, businesses require statistics to better understand their customers. We also refer to it as stats.

Statistics is a kind of mathematical analysis that uses quantified models, representations, and synopses from a given set of data obtained from experiments and real-life studies. It is also a study of methodologies to gather, review, and analyze the given set of data and draw a conclusion. There are some theories and sets of formulae that have been given in statistics.

One such concept is correlation. Correlation measures the strength of association between two variables as well as the direction. There are mainly three types of correlation that are measured. One significant type is Pearson’s correlation coefficient. This type of correlation is used to measure the relationship between two continuous variables.

In this blog, we will be discussing everything about Pearson’s correlation coefficient. We will start with a definition of Statistics and correlation.

Later in the blog, we will look at the origin of Pearson’s correlation coefficient and also how it is calculated. We will also briefly discuss the three other types of correlations measured in statistics.

What is Statistics?

Statistics is not just a branch of mathematics but rather it is a science. It is the science of collecting, analyzing, presenting, and interpreting empirical data.

Statistics is a highly interdisciplinary field. Researches in statistics are applied to almost all scientific fields and also the researches in different scientific fields motivate the development of new statistical methods and theory.

Statistics is used in various disciplines such as psychology, business, physical and social sciences, humanities, government, and manufacturing. Statistics finds its use in business to make better-informed decisions. The two types of statistics are Descriptive statistics and Inferential statistics.

Descriptive statistics are used to gather from a sample exercising the mean or standard deviation. Inferential statistics are used when data is viewed as a subclass of a specific population.

For more detailed knowledge of statistics, you can read our blog on What is Statistics? Types, Variance and Bayesian Statistics.

“Statistics is the best area to be in because statistics are everywhere! They are all around us in our daily lives. It is important to be able to think critically about all of the data and information that surround us. Statistics and statistical thinking help us to make sense out of all of it.”

- Jeri Mulrow, Vice President, ASA

What is Correlation?

Correlation is a statistic that measures the relationship between two variables in the finance and investment industries. It shows the strength of the relationship between the two variables as well as the direction and is represented numerically by the correlation coefficient. The numerical values of the correlation coefficient lies between -1.0 and +1.0.

A negative value of the correlation coefficient means that when there is a change in one variable, the other changes in proportion but in the opposite direction, and if the value of the correlation coefficient is positive, both the variables change in proportion and the same direction.

Also Read: Introduction to Bayesian Statistics

Pearson’s Correlation Coefficient ‘r’

In Statistics, the Pearson’s Correlation Coefficient is also referred to as Pearson’s r, the Pearson product-moment correlation coefficient (PPMCC), or bivariate correlation. It is a statistic that measures the linear correlation between two variables. Like all correlations, it also has a numerical value that lies between -1.0 and +1.0.

How is the Correlation coefficient calculated?

Using the formula proposed by Karl Pearson, we can calculate a linear relationship between the two given variables. For example, a child’s height increases with his increasing age (different factors affect this biological change).

So, we can calculate the relationship between these two variables by obtaining the value of Pearson’s Correlation Coefficient r. There are certain requirements for Pearson’s Correlation Coefficient:

  • Scale of measurement should be interval or ratio
  • Variables should be approximately normally distributed
  • The association should be linear
  • There should be no outliers in the data

The formula given is:

Where,

N = the number of pairs of scores

Σxy = the sum of the products of paired scores

Σx = the sum of x scores

Σy = the sum of y scores

Σx2 = the sum of squared x scores

Σy2 = the sum of squared y scores

What do the terms strength and direction mean in Statistics?

We have been mentioning the two terms ‘strength’ and ‘direction’, throughout the blog. These terms have a great statistical significance. Let us discuss them in detail.

Strength: Strength implies the relationship connection between the two given factors. It implies how reliably one variable will change because of the adjustment in the other. Qualities that are near +1 or — 1 show a solid relationship.

Direction: The direction of the line demonstrates a positive direct or negative straight connection between factors. On the off chance that the line has an upward slant, the factors have a positive relationship.

The other two types of correlations

As mentioned above, there are mainly three types of correlations-

  1. Pearson Product Moment Correlation
  2. Spearman’s Rank Correlation
  3. Kendall Rank Correlation

To Read the detailed blog, click on the link below.

Originally published at https://www.analyticssteps.com on December 17, 2020.

--

--

Ritesh Pathak
Analytics Steps

I am an enthusiast who is always eager to learn. I have a mass communication background that helps me explore different areas.