Pearson correlation: Methodology, Limitations & Alternatives — Part 3: Alternatives

6 min readJun 10, 2023

This article is the third and final one of a series of three articles providing a non-mathematical overview of Pearson correlation analysis. These articles use Layman terms, without diving into the mathematical aspects (nothing more complicated than y=ax+b), technical implementation or coding. They are to serve as a guide to properly design, understand and interpret a Pearson correlation analysis.

After reviewing the key properties of the Pearsond correlation coefficient in the first article, and its limitations in the second article, this article introduces some of the alternatives and complements that can be used to overcome the Pearson correlation coefficient limitations.

III. Alternatives to the Pearson correlation coefficient

1. Spearman rank correlation and Kendall Tau

The Spearman rank correlation coefficient is one of the most intuitive alternatives to the Pearson correlation coefficient. It can be simply viewed as the Pearson correlation coefficient calculated between the ranks of the x and y values¹. Imagine that you replace the smallest value of x by 1, the next value by 2, the next value by 3 and so on and so forth, that you do the same thing for the values of y, then the Spearman rank correlation is simply the Pearson correlation between the two new list of values.

The Kendall Tau is another rank-based correlation that is often encountered. Both the Spearman rank correlation and the Kendall Tau are useful to evaluate whether y consistently increases or decreases when x increases, even if the relationship is not linear. This type of relationships in which the value ofone variable consistently move in the same direction when the other increases are called as monotonic relationships. You can compare the Kendall Tau and the Spearman rank correlation coefficient with the Pearson correlation coefficient on the following examples.

Both Spearman rank correlation’s and Kendall Tau’s values are 1 in the first example and -1 in the second example. This indicates a perfectly monotonic relationship, increasing for +1 and decreasing for -1. As a comparison point, the Pearson correlation coefficient is 0.86 in the first example and 0.85 in the second example as it is affected by non-linear character of the relationship between x and y. Now, what happens if the relationship isn’t monotonic?

As expected, the values of Spearman rank correlation and Kendall’s Tau are affected by the fact that the relationship between x and y is non-monotonic and, just like Pearson’s coefficient, they are not helpful in detecting the association between the two variables in the second case.

2. Distance correlation & Maximal Information coefficient

In order to identify more complex, non-monotonic associations between variables, other metrics can be used, in particular the distance correlation² and the maximal information coefficient³. Unlike the metrics already discussed, the distance correlation and the maximal information coefficient take values ranging from 0 to 1, with higher values showing a stronger association between the variables. Without further ado, let’s see how they perform on a few examples:

For comparison, here is the outcome when applied on random noise:

Even though the value of the distance correlation tends to decrease, these indicators are able to indicate the existence of a relationship between the two variables on complex examples, even in the case of non-monotonic functions with a vertical axis of symmetry for which the Pearson and rank-based correlation coefficients virtually drop to 0. Nonetheless, both the distance correlation and the maximal information coefficient come with their associated challenges — see that their value is not 0 on random noise — but they are definitely great tools when it comes to identifying complex associations between variables.

3. Predictive Power Score

The Predictive Power Score⁴ (PPS) is a potential alternative to the Pearson correlation coefficient that was proposed in this post. There is still little documentation in the literature, but it is an interesting concept when the objective is to predict the value of a dependent variable based on the value of another variable. Unlike the indicators discussed so far, it is a non-symmetric indicator. This means that the predictive power of x on y is not necessarily equal to the predictive power of y on x. That makes a lot of sense in many real-life use-cases where a variable x can be used to predict another variable y, while the value of y can’t be used to predict the value of x with the same accuracy. The value of the Predictive Power Score ranges from 0 to 1, with values close to 1 corresponding to a stronger Predictive Power.

Here are 2 examples using the same signal, without noise and with some noise:

On the first graph, the Predictive Power Score of x on y is close to 1, indicating that the value of x can be used to predict the value of y with a good accuracy. An indeed, knowing the value of x, it is possible to estimate y just by looking at the graph. For instance, if x = -10, y will be in the vicinity of 0.7. On the contrary, knowing that y = 0.1 gives no indication as to whether x is rather close to 2.5 or to -12.5, though the uncertainty about the value of x is reduced as a value of x=-5 seems unlikely in this case. The inability to predict the value of x knowing y is reflected by the Predictive Power Score of y on x whose value is 0. The second chart represents the same dataset with some noise added to it. This would decrease the accuracy of the predictions of y values based on x. That is reflected by the value of the Predictive Power Score of x on y, which decreases from 0.97 to 0.88.

4. Other common alternatives

So far, this article described some of the most common alternatives and complements to the Pearson correlation coefficient but many others exist. The table below recaps a few of them:

Other common alternatives to the Pearson Correlation Correlation

Conclusion

The Pearson coefficient is ubiquitous in correlation and variable association studies, and it is commonly used for feature selection in machine learning. It is definitely a tool that most people working with data should be aware of, but it is essential to be familiar with its limitations in order to avoid and recognize the associated pitfalls and misuses. It is also important not to lose sight that alternatives and complements exist and that the suitability of an indicator should always be carefully considered based on the pursued objective and the conditions under which it has to be used, be it the Pearson coefficient or any other tool.

If you found this article helpful, please show your support by clapping for this article and considering subscribing for more articles on machine learning and data analysis. Your engagement and feedback are highly valued as they play a crucial role in the continued delivery of high-quality content.

You can also support my work by buying me a coffee. Your support helps me continue to create and share informative content. It’s a simple and appreciated gesture that keeps the momentum going : Buy Me a Coffee.

References
[1] Everitt, B. S., and Anders Skrondal. 2010. The Cambridge Dictionary of Statistics: Fourth Edition. Cambridge, England: Cambridge University Press.

[2] Theory — dcor 0.6 documentation. (n.d.). Readthedocs.Io. Retrieved June 10, 2023, from https://dcor.readthedocs.io/en/latest/theory.html

[3] Reshef, D. N., Reshef, Y. A., Finucane, H. K., Grossman, S. R., McVean, G., Turnbaugh, P. J., Lander, E. S., Mitzenmacher, M., & Sabeti, P. C. (2011). Detecting novel associations in large data sets. Science (New York, N.Y.), 334(6062), 1518–1524. https://doi.org/10.1126/science.1205438

[4] RIP correlation. Introducing the Predictive Power Score. (n.d.). 8080labs.com. Retrieved December 28, 2022, from https://8080labs.com/blog/posts/rip-correlation-introducing-the-predictive-power-score-pps/