Discounted Cumulative Gain: the ranking metrics you should know about

Maeliza S.
3 min readMay 24, 2020

--

Kaggle Leaderboard using DCG

The Discounted Cumulative Gain (DCG) is a relevance metric in information science and information retrieval. Unlike pure classification use cases where you are right or wrong, in a ranking problem, you are more or less right or wrong. The relevance actually denotes how well the retrieved set of documents properly answers the initial query.

Classical use cases of ranking problems can be:

  1. Web search: you want to filter out the most relevant documents for a query
  2. Query-code match: you have a natural language query and you want to match it with the closest code snippet

We assume a document can be relevant, marginally relevant, or not relevant.

The DCG will calculate how much information you gain with the ranking of the predicted set of documents retrieved.

However, according to the query complexity the set of available documents can vary. For example: “best restaurants in London” vs “best 1-star seafood restaurant in London”. To cover these variations, normalized DCG (nDCG) has been introduced.

The math behind the DCG metric

The most frequently used (web search companies, Kaggle, Github) version of DCG is:

DCG formula

DCG emphasizes highly relevant documents appearing early in the result list using the logarithmic scale for reduction.

nDCG formula
nDCG formula

The nDCG is calculated against an ideal ranking. In case you have the ideal ranking (IDGC), your predicted ranking is compared to it. This measure helps to get the relative performance of a model given the complexity of a query.

DCG with scikit-learn

Scikit-learn sums the true scores ranked in the order induced by the predicted scores, after applying a logarithmic discount.

from sklearn.metrics import dcg_score
from sklearn.metrics import ndcg_score
import numpy as np

Let’s take the simple example of a 6-document corpus:

# Corpus# True relevance score - scale from 0-10
true_relevance = {'d1': 10, 'd2': 9, 'd3':7, 'd4':6, 'd5':4}
# Predicted relevence score
predicted_relevance = {'d1': 8, 'd2': 9, 'd3':6, 'd4':6, 'd5':5}
# relevance list processed as array
true_rel = np.asarray([list(true_relevance.values())])
predicted_rel = np.asarray([list(predicted_relevance.values())])

The ouput metrics are:

dcg_score(true_rel, predicted_rel)
>> 22.906106392129793
ndcg_score(true_rel, predicted_rel)
>> 0.9826797611735933

Let’s add a seventh document to our corpus:

# True relevance
{'d1': 10, 'd2': 9, 'd7':8, 'd3':7, 'd4':6, 'd5':4}
# Predicted relevance
{'d1': 8, 'd2': 9, 'd7':7, 'd3':6, 'd4':6, 'd5':5}

The new ouput metrics are:

dcg_score(true_rel, predicted_rel)
>> 26.048067158648237
ndcg_score(true_rel, predicted_rel)
>> 0.9852119447374988
# the new document classification improves the model metrics

DCG fluctuates a lot based on the length of the corpus. Using the normalized metric gives you a much stable and comparable metric to assess your model.

Discounted Cumulative Gain favors the relevancy of a ranking prediction. If you want to take into account the query complexity then the normalized DCG is the best metric to go to.

Happy ranking 😉

--

--