Phi Coefficient A.K.A Matthews Correlation Coefficient (Binary Classification)

Christiaan Defaux
3 min readJan 27, 2020

--

In machine learning/data science, we often run into problems where we’re trying to classify binary (two-class) data. In this case, you might’ve fit the data to a model which successfully (or so it appears) classifies your data. However, one common issue might be a class imbalance problem in the data, this can learn to a misuse or misunderstanding of the evaluation metrics.

Enter the phi coefficient (mean square contingency coefficient, φ), otherwise known as Matthews correlation coefficient (MCC). As a brief aside, the phi coefficient was first introduced by Karl Pearson but is equivalent to the MCC which was introduced by Brian W. Matthews in 1975 and is often used in bioinformatics. They are measures of association between two binary variables where:

  • at least one of the variables is nominal (named, e.g. black/white)
  • both variables are dichotomous (two opposing categories, e.g. dead/alive or levels (continuous variable) e.g. pass/fail above/below 60%)

The measures are often used in contingency tables to assess the correlation of the variables. The benefits of these are that they are balanced measures, meaning they can provide insight even when there is a class imbalance. For example, the accuracy measure (proportion of correct predictions) is not effective when there is a large class imbalance.

Calculating the Statistic

Whether using a confusion matrix or a contingency table, the calculation is pretty much identical. When reading off a confusion matrix, the following calculation can be done:

where TP is the number of true positives, TN the number of true negatives, FP the number of false positives and FN the number of false negatives. This is the mathematical formula for MCC, however, for added efficiency, scikit learn contains an MCC function which is contained in the metrics framework. This can easily be accessed in python by using:

from sklearn.metrics import matthews_corrcoef sklearn.metrics.matthews_corrcoef(y_true, y_pred, sample_weight=None)

where the parameters y_true and y_pred are ground truth (correct) target values and estimated targets respectively. There is support in the function for both binary and multiclass labels, however only in the binary case does the information relate to true and false positives and negatives.

Interpreting the Statistic

The range of these metrics is from -1 to 1, where:

  • 0 indicates no relationship
  • 1 indicates a perfect positive relationship
  • -1 indicates a perfect negative relationship

In the case of an MCC of 1, we can assume that FP and FN equal 0, whereas in the case of an MCC of -1 we get a classifier that always misclassifies, hence we get TP and TN equal to 0.

For the range of values between -1 and 1, there are come crude estimates or rules of thumb which are as follows:

  • .70 and higher — very strong positive relationship
  • .40 to .69 — strong positive relationship
  • .30 to .39 — moderate positive relationship
  • .20 to .29 — weak positive relationship
  • .01 to .19 — no or negligible relationship
  • 0 — no relationship
  • -.01 to -.19 — no or negligible relationship
  • -.20 to -.29 — weak negative relationship
  • -.30 to -.39 — moderate negative relationship
  • -.40 to -.69 — strong negative relationship
  • -.70 and lower — very strong negative relationship

*note these values are estimates and should be used with respect to the intrinsic qualities of the data being analyzed.

--

--

Christiaan Defaux

physicist NOT physician (crippling fear of blood) and machine learning/data science enthusiast/practitioner. MATH IS FUN.