Content Quality

Despite its importance, content quality is hard to define without devolving into circularity. What we mean by content quality is a measure or collection of measures that represents the content’s value independent of any searcher’s particular information need.

In other words, content quality complements relevance. While relevance measures how well the content matches the searcher’s information need, quality is a need-independent measure of the content’s utility or desirability.

Quality Complements Relevance

When a searcher pursues an information-seeking task, relevance is important. Not only important, but necessary: by definition, irrelevant content fails to satisfy the requirement of addressing the searcher’s information need.

But relevance, while necessary, is not sufficient.

Consider an analogy. If you’re hungry, then all food is relevant to your needs. But that doesn’t mean you’ll eat anything that’s put on your plate. You have preferences among potential food options. Moreover, while your preferences are personal, they are probably aligned with objective attributes, such as freshness or sweetness, that collectively represent the food’s quality.

Returning to search: relevance is necessary, but content quality —whether measured objectively or subjectively — is what makes a result sufficient.

Measuring Content Quality

Broadly speaking, there are two strategies for measuring content quality. The first is to use information available at indexing time. The second is to use searcher behavior as a source of implicit judgments.

How do we use information available at indexing time? In the simplest case, we have explicit quality judgments, such as ratings from experts or users. If not, we can derive a quality score from the data we have, such as measuring image quality using its resolution. In general, quality measures can come from raw data, hand-tuned formulas, or machine-learned models.

Alternatively, we can use searcher behavior as a source of implicit quality judgments. The results that searchers engage with or skip provide a wealth of positive and negative implicit judgments. But we have to take such implicit judgments with a grain of salt. The results should be relevant, since a searcher’s dislike of an irrelevant result may not reflect on its quality. There’s also presentation bias: searchers can only engage with results they see and are more likely to engage with the top-ranked results. Nonetheless, searcher behavior is a great cost-effective resource for measuring content quality.

These two strategies work best together. The index is the best source for objective data, while search behavior aggregates searcher preferences. Moreover, we can distill what we learn from searcher behavior and bring it into the index. Conversely, we can train models by using historical search behavior as labels, and then apply the models to new or unseen content.

Combining Quality With Relevance

Content quality should play a key role in ranking search results. As I’ve written elsewhere, ranking and relevance are related but distinct concepts. Relevance measures whether a result addresses the searcher’s need, while ranking sorts relevant results based on searcher and business objectives.

To the extent that searcher and business objectives align, this amounts to sorting relevant results based on their quality. Once the search engine has established relevance, ranking should focus on query-independent signals, and content quality is the fundamental query-independent signal. We need a bit more nuance if we don’t model relevance as binary. Nonetheless, content quality should ensure that ranks more desirable relevant results ahead of less desirable relevant results. Most importantly, it should not override relevance.


As we said in the introduction, content quality is hard to define without devolving into circularity. But that doesn’t diminish its importance. Content quality measures content independent of the searcher’s need, complementing relevance. While relevance is necessary, quality is what makes a result sufficient to satisfy the searcher. And in practice, a search engine should essentially rank relevant results by their quality.

Content Moderation



