Discounted Cumulative Gain: the ranking metrics you should know about
The Discounted Cumulative Gain (DCG) is a relevance metric in information science and information retrieval. Unlike pure classification use cases where you are right or wrong, in a ranking problem, you are more or less right or wrong. The relevance actually denotes how well the retrieved set of documents properly answers the initial query.
Classical use cases of ranking problems can be:
- Web search: you want to filter out the most relevant documents for a query
- Query-code match: you have a natural language query and you want to match it with the closest code snippet
We assume a document can be relevant, marginally relevant, or not relevant.
The DCG will calculate how much information you gain with the ranking of the predicted set of documents retrieved.
However, according to the query complexity the set of available documents can vary. For example: “best restaurants in London” vs “best 1-star seafood restaurant in London”. To cover these variations, normalized DCG (nDCG) has been introduced.
The math behind the DCG metric
The most frequently used (web search companies, Kaggle, Github) version of DCG is:
DCG emphasizes highly relevant documents appearing early in the result list using the logarithmic scale for reduction.
The nDCG is calculated against an ideal ranking. In case you have the ideal ranking (IDGC), your predicted ranking is compared to it. This measure helps to get the relative performance of a model given the complexity of a query.
DCG with scikit-learn
Scikit-learn sums the true scores ranked in the order induced by the predicted scores, after applying a logarithmic discount.
from sklearn.metrics import dcg_score
from sklearn.metrics import ndcg_score
import numpy as np
Let’s take the simple example of a 6-document corpus:
# Corpus# True relevance score - scale from 0-10
true_relevance = {'d1': 10, 'd2': 9, 'd3':7, 'd4':6, 'd5':4}# Predicted relevence score
predicted_relevance = {'d1': 8, 'd2': 9, 'd3':6, 'd4':6, 'd5':5}# relevance list processed as array
true_rel = np.asarray([list(true_relevance.values())])
predicted_rel = np.asarray([list(predicted_relevance.values())])
The ouput metrics are:
dcg_score(true_rel, predicted_rel)
>> 22.906106392129793ndcg_score(true_rel, predicted_rel)
>> 0.9826797611735933
Let’s add a seventh document to our corpus:
# True relevance
{'d1': 10, 'd2': 9, 'd7':8, 'd3':7, 'd4':6, 'd5':4}# Predicted relevance
{'d1': 8, 'd2': 9, 'd7':7, 'd3':6, 'd4':6, 'd5':5}
The new ouput metrics are:
dcg_score(true_rel, predicted_rel)
>> 26.048067158648237ndcg_score(true_rel, predicted_rel)
>> 0.9852119447374988# the new document classification improves the model metrics
DCG fluctuates a lot based on the length of the corpus. Using the normalized metric gives you a much stable and comparable metric to assess your model.
Discounted Cumulative Gain favors the relevancy of a ranking prediction. If you want to take into account the query complexity then the normalized DCG is the best metric to go to.
Happy ranking 😉