Member-only story

Matthews Correlation Coefficient is The Best Classification Metric You’ve Never Heard Of

Boaz Shmueli
Towards Data Science
7 min readNov 22, 2019

Congratulations! You’ve built a binary classifier —a fancy-schmancy neural network using 128 GPUs with their dedicated power station, or perhaps a robust logistic regression model that runs on your good old ThinkPad. You’ve designed the model and fed the data; now the time has finally come to measure the classifier’s performance.

Don’t get me wrong: ROC curves are the best choice for comparing models. However, scalar metrics still remain popular among the machine-learning community with the four most common being accuracy, recall, precision, and F1-score. Scalar metrics are ubiquitous in textbooks, web articles, online courses, and they are the metrics that most data scientists are familiar with. But a couple of weeks ago, I stumbled upon another scalar metric for binary classification: the Matthews Correlation Coefficient (MCC). Following my “discovery”, I asked around and was surprised to find that many people in the field are not familiar with this classification metric. As a born-again believer, I’m here to spread the gospel!

Let’s start with a quick overview of the “Famous Four” metrics, including a discussion on why they are sometimes not very useful, or even downright misleading. Following that, I’ll introduce the other metric.

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Boaz Shmueli
Boaz Shmueli

Written by Boaz Shmueli

I enjoy explaining stuff. Postdoc Scholar at Academia Sinica

Responses (9)