Concordance Index as an Evaluation Metric

Published in

Analytics Vidhya

7 min readOct 8, 2019

All code related to this post can be found here.

The concordance index or c-index is a metric to evaluate the predictions made by an algorithm. It is defined as the proportion of concordant pairs divided by the total number of possible evaluation pairs. Let’s see through some examples what does this definition mean in practice.

We first install the library lifelines and the usual imports:

Imagine we are a telecom operator with 5 customers (Alice, Bob, Carol, Dave, and Eve) and we’re trying to predict who are going to unsubscribe from our services (also known as churn) first. Suppose that Alice churned after 1 year, Bob after 2 years, Carol after 3 years, Dave after 4 years and Eve after 5 years. Our algorithm made the following prediction: Alice will churn after 1 year, Bob after 2 years, Carol after 3 years, Dave after 4 years, and Eve after 5 years.

That’s a perfect prediction. In that case, the concordance index is equal to its maximum value 1.

However, the concordance index is interested on the order of the predictions, not the predictions themselves. Assume now that the algorithm predicted: Alice will churn after 2 years, Bob after 3 years, Carol after 5 years, Dave after 8 years and Eve after 14 years. In that case, the concordance index is still equal to its maximum value 1 since the order of the predictions is right (despite the fact that none of the predictions is right in isolation!).

This is very different from other evaluation measures such as mean squared error or mean absolute error. Actually for any strictly increasing function of the churn times the concordance index will still be equal to 1. Here is an example with the logarithm function:

Let’s consider the case where the algorithm got the order of the predictions completely wrong. Let’s say the algorithm predicted that Eve will churn first after 1 year, Dave after 2 years, Carol after 3 years, Bob after 4 years, and Alice after 5 years, so the order is completely reversed (although notice that Carol did leave after 3 years!). In that case, the concordance index will be equal to its minimum value 0.

Actually the concordance index will be zero for every strictly decreasing function of the churn times. Here is an example for the multiplicative inverse function:

Now comes the interesting part. What does the concordance index do when the order of the predictions is neither completely right nor completely wrong? Which score do you give to the following prediction?

Prediction 1: Alice will leave after 3 years, Bob after 2 years, Carol after 1 year, Dave after 5 years, and Eve after 4 years.

Well, the concordance index lists all the possible pairs, in our case there are 10 potential pairs: (Alice, Bob), (Alice, Carol), (Alice, Dave), (Alice, Eve), (Bob, Carol), (Bob, Dave), (Bob, Eve), (Carol, Dave), (Carol, Eve), (Dave, Eve), and then it counts how many of the order of these predictions of pairs were correct. For example, for Prediction 1:

Alice is predicted to churn after Bob, while Bob churned after Alice.
Alice is predicted to churn after Carol, while Carol churned after Alice.
Bob is predicted to churn after Carol, while Carol churned after Bob.
Dave is predicted to churn after Eve, while Eve churned after Dave.

All the other predictions over pairs are correct. There are 6 concordant pairs out of 10 possible pairs so the concordance index is equal to 6/10 or 0.6.

What happens in case of ties? Ties in your predictions are counted as half concordant pairs:

Another interesting characteristic of the concordance index is that it supports right censoring, i.e., the case when by the end of the study, the event of interest (for example, in medicine ‘death of a patient’ or in our example ‘churn of a customer’) has only occurred for a subset of the observations. Notice that specifying that an event of interest haven’t occurred yet gives us an information: the event of interest, if it happens, it will happen in a time greater or equal than the specified time. Let’s see this through an example.

Up to now, we never had to specify if our observations were right censored or not because the concordance index default option is that all events are observed. Let’s change that.

Suppose that Alice churned after 1 year, Bob after 2 years, Carol haven’t churned after 3 years, Dave churned after 4 years and Eve after 5 years.

and our prediction is the following:

Prediction 2: Alice will churn after 1 year, Bob after 2 years, Carol after 3 years, Dave after 5 years, and Eve after 4 years.

Notice that the fact that Carol haven’t churned after 3 years gives us the information that her churn time, if it happens, it will happen in a time greater or equal than 3 years. Therefore, we cannot give a score to the pairs (Carol, Dave) and (Carol, Eve) since we simply don’t know who churned first. Therefore we only have 8 potential pairs. Dave is predicted to churn after Eve, while Eve churned after Dave. All the other predictions over pairs are correct. There are 7 concordant pairs out of 8 possible pairs so the concordance index is equal to 7/8 or 0.875.

In our code, we need to specify if the event was observed (True) or if the event hasn’t been observed (False):

Let’s do random predictions and see which concordance index we obtain.

Concordance indexes of random predictions.

Random predictions give a mean concordance index of approximately 0.5. So basically a model that hasn’t learned anything give a mean concordance index of 0.5.

Difference of concordance indexes between libraries: lifelines, scikit-survival, and PySurvival

To do this comparison, we install PySurvival and scikit-survival:

Notice that lifelines gives the concordance between the actual event times and the predicted scores, while scikit-survival gives the concordance between the actuals and the predicted risks, so over the same lists, they are the complete opposite.

Concordance index (lifelines) = 1-Concordance index (scikit-survival)

Scikit-survival also gives the number of concordant pairs, number of discordant pairs, number of pairs having tied estimated risks, and the number of comparable pairs sharing the same time.

Let’s see a more interesting example:

PySurvival concordance index needs a model as input so it’s harder to see what does its concordance index do. Let’s try it with a Random survival forest:

We notice that the three concordance indexes give us the same result (notice that we added a minus in front of the predicted risks of lifelines). Let’s look at the additional results provided by PySurvival’s concordance index:

We notice that PySurvival counts pairs two times (Alice, Bob) and (Bob, Alice) while scikit-survival counts it only once. That explains the difference between the number of concordant pairs (558+43+179=780 and 780*2=1560). Similarly the number of concordant pairs in PySurvival considers the concordant pairs and the tied risks pairs as half concordant pair (2*558+179=1295).

Conclusions

Concordance index is a useful metric to evaluate the predictions made by an algorithm. It can consider the case of right-censoring, i.e., when by the end of the study, the event of interest (for example, in medicine ‘death of a patient’ or in our example ‘churn of a customer’) has only occurred for a subset of the observations. We have seen three libraries (lifelines, scikit-survival, and PySurvival) who implement the concordance index but with some subtle differences.

Sources

Harrell F.E Jr., Lee K.L., Mark D.B., “Multivariable prognostic models: issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors”, Statistics in Medicine, 15(4), 361–87, 1996.
Lifelines’ concordance index
Scikit-survival’s concordance index
PySurvival’s concordance index

Concordance Index as an Evaluation Metric

Difference of concordance indexes between libraries: lifelines, scikit-survival, and PySurvival

Conclusions

Sources

Written by Alonso Silva Allende