Understanding ROC AUC (part 2/2)

Published in

Building Ibotta

6 min readJan 31, 2019

In my last post, I motivated the reason for computing ROC AUC.

It turns out there are more informative interpretations of ROC AUC instead of just “area under the curve”. And that is ROC AUC is…

The probability a positive observation is ranked higher than a negative observation
The Mann-Whitney U Statistic (aside from a constant)

Let’s start with the first. In mathematical notation, this means:

I do not need to derive this formally here, as its derivation is available here. And a simpler geometric derivation can be found here. This probabilistic interpretation is easier to communicate:

Given any pair of a positive and negative observation, the probability your model ranks a positive observation higher than a negative observation.

Second Interpretation

To understand the second, it has its roots in 1945 when Frank Wilcoxon, while at American Cyanimid Company, published a short paper in the journal Biometrics Bulletin on comparing means of two groups without assuming the data was generated from a particular distribution (I found it interesting this paper uses only uses five references!).

The idea is how observations rank when both groups of observations (i.e., our 1s and 0s) are pooled together. If both groups were drawn from the same distribution, you’d expect them to be ranked pretty uniformly. To compute the probability observing these data, you need to compute the total number of ways the data will rank and compare that to how the data actually ranked. The test statistic comes down to summing the ranks of positive observations. We’ll call this test statistic W, that is, the Wilcoxon rank sum statistic:

Computes the sum of ranks for all positive observations. N+ is the number of positive observations.

Two years later, in the journal The Annals of Mathematical Statistics, H.B. Mann and D. R. Whitney at Ohio State University took Frank’s work and generalized it (by the way, their paper only uses three references — their paper can be found here). In this paper, the U statistic counts the number of times a positive example is greater than negative example. The authors give:

The Mann-Whitney U Statistic. Note that it involves the Wilcoxon Rank Sum statistic. N+ is the number of positive observations, and N- is the number of negative observations.

Can we transform U into a probability? Well, what is the total number of times a positive could rank greater than a negative? This corresponds to an arrangement of something like:

[1, 1, 1, 1, 0, 0, 0, ….]

For each ‘0’ (there are N- of them) we count the number 1s to the left. But we will always count N+ of them. Thus, the total number of times a positive is greater than a negative is N+N-. So we divide by this product:

But recall this is the first interpretation! Therefore:

We see ROC AUC is a direct measure of the Mann-Whitney U statistic — aside from a constant!

We can also recast this a bit to get an interpretation that is useful in deciding the utility of ROC AUC in recommender system (which is concerned with ranking items). Note that:

is equal to the sum of ranks if we had ranked all positive examples first (if this not clear to you, see this). Let’s denote this quantity as:

Corresponds to the sum of the best possible ranking

This quantity is what we would like our classifier to get — and W is the sum of positive ranks our classifier actually produced is. So we get:

Thus, aside from some constants, we see ROC AUC measures the difference between the best possible ranking and the actual ranking! Also note that ROC AUC is invariant if the sum of positive ranks remains constant. So if you had:

[1, 0, 0, 0, 0, 0, 0, 1]

This is equivalent to a ranking of:

[0, 1, 0, 0, 0, 0, 1, 0]

And the ROC AUC would be the same. So while ROC AUC is fine for evaluating binary classification algorithms, it’s not really optimal for evaluating recommendation systems (in fact, it’s even worse than that. Charles Martin pointed out to me that offline metrics may not work at all because they frequently do not correlate with expected user behavior. See Thorsten Joachim’s talk).

If a positive example at the top of the list moves down by K positions and a positive example at the bottom of the list moves up by K positions (which would most likely never be seen a customer), ROC AUC remains unchanged. But in recommender systems, we care about positive observations remaining at the top of the list (e.g. top 25). So we desire a metric that penalizes positive examples leaving the top 25 positions (such as NDCG@25).

Computational Benefit

The final detail to consider here, is that this formulation actually permits a much faster computation of ROC AUC. For example, it seems scikit-learn (one of the most popular machine learning packages available) leverages the trapezoidal rule to calculate ROC AUC. But we demonstrated earlier one can calculate it via:

But instead of summing over the positive ranks, we can convert this into a dot product. Define I the index vector for all observations (i.e., [1, 2, 3, …, N]). Now, let R the ranking vector of all observations (e.g., [1, 0, 0, 1, 1, …]). Then:

This is a vectorized implementation and should be pretty fast in languages such as Python. Of course, we cannot escape the O(nlog(n)) complexity required to sort, but nevertheless, this is still faster. Let’s compare. In the following simple experiment, let’s generate a vector of 1s and 0s of increasing lengths with random proportions of 1s and 0s. We’ll pretend these vectors are our final prediction. Also, to prove the ROC AUC is the same in both cases, we’ll assert equality up to 13 decimal places (code for these plots can be found here):

Using a 2017 MacBook Pro 3.1 GHz Intel Core i5 16 GB memory

As the number of observations increase, the computational benefit of the dot product implement is more drastic. In fact, let’s plot the difference in times as a function of the number of observations:

This difference may seem negligible, but in large data applications, these savings might be significant.

Thanks for reading!

Interested in working at Ibotta? Checkout https://ibotta.com/careers

Also, my technical blog can be found here: http://mattshomepage.com/

Understanding ROC AUC (part 2/2)

Written by matt johnson