Trash or Trend?

How we find the best performing news articles.

Published in

Axel Springer Tech

9 min readMay 15, 2024

At Axel Springer National Media and Tech, we take care of all things data for our German media brands. A central part of this is to identify which articles perform well and which ones do not. This turns out to be much harder than one would expect, because counting clicks is not enough. Here is why this is the case, and what we do instead.

Counting clicks is not enough.

Not all clicks are created equal. Whether we talk about search results, product catalogs in online shops, or articles on a news website, people tend to click what is displayed with a large image on the top left of the page and they tend to ignore what is on the bottom right with only a small headline. This effect has given us the famous SEO aphorism: “The best place to hide a dead body is the second page of Google”.

All data produced by the internet is heavily influenced by how things are displayed. This effect is usually refereed to as “position bias”. In the media industry, we also call it “promotion bias”, since other factors such as the premium status of the articles also play an important role.

The chart above shows this effect for one of our media brands. Each curve represents how the click-through-rate (CTR) within a certain widget depends on the position of the article, ordered from the top left, to the bottom right. The different colours represent different widgets such as “latest news” or “politics”.

As one can see, the CTR drops off quickly, as we move further down the page. But there are more factors that influence the CTR.

The first article in a widget is often displayed more prominently than the others.
We have a freemium business model, which means there are premium articles that are only accessible to subscribers. These naturally have a much lower CTR.
People are often not in the mood or the right situation to watch a video, so videos tend to have a lower CTR.
If we consider the total number of clicks instead of the CTR, this also heavily depends on the traffic on the page in the respective time period.

Promotion bias is everywhere.

We refer to the selection of the articles and their placement on the page as a curation. It is a key part of our journalistic product. Readers come to news websites to learn what is happening in the world. They usually do not come to read a specific article, but to learn what is going on that is important. When they see something they are interested in, they may read specific articles to dive deeper.

Our newsrooms decide what is news and which topics deserve more attention than others. These decisions are as much part of product, as the actual content of the articles.

But due to the promotion bias, it is also a self-fulfilling prophecy. Modern newsrooms use a host of life metrics to determine which topics readers are interested in and which articles might have outlived the readers’ attention.

We refer to metrics like the click-through-rate or the total number of page impressions an article received as popularity metrics. If readers mostly click what is presented to them most prominently, it is hard for these metrics to uncover hidden gems that are placed lower on the page, or to find news articles that readers appear to care less about than expected.

“Trendingness” vs Popularity

We had to address this issue when trying to build a news recommendation system to personalise the selection of articles presented to the readers according to their interests. Since selecting articles is called curation and the process is automatic, we call this “autocuration”.

What should autocuration show to readers if they come to the page for the first time? Probably the articles that average readers like best. But how do we identify these articles if promotion bias is everywhere and counting clicks is not enough?

Our solution to the promotion bias problem is “trendingness”.

There are two main ideas here:

1. The performance of an article should be compared to that of an average article that received the same promotion.

2. Relative performance metrics are better than absolute ones, because they can be compared between articles in different positions on the page.

It is pretty simple: If an article on the top of the page is expected to receive 1000 clicks and it gets 1100, then it is 10 percent better than expected — or 10 percent trending. Similarly, an article in a less prominent position might get only 9 clicks, even though it should have gotten 10 clicks, so it is trending with minus 10 percent.

To be able to implement this, we first have to determine how many clicks an article should have gotten given the promotion it received. Our solution is based on a machine learning model that takes factors such as the premium status, the total traffic and the position in which the article is displayed into account and learns to predict the number of clicks from past data.

Regression models predict the expected value of the target variable conditional on the values of the features. That means that if our features encode the promotion the article received, the predicted values should correspond to the behaviour of the average article with the same promotion.

Other features that might influence the performance, such as the topic or the author, are intentionally omitted. The trendingness is essentially the error of the model, after the promotion is accounted for. If a particular author tends to write very engaging articles, we want her articles to show up as trending. If the model would learn how good the author is, her articles would not stand out anymore. We are looking to get an objective metric for the performance of an article and not to build a model that can explain the the performance of articles in general.

Equipped with these predictions, we can simply calculate a score for the trendingness of an article as

But what about variance?

But there is one more problem… These trendscores become very noisy if the number of predicted clicks is small. This can be the case on positions very low on the page, or during the night. If only 0.1 clicks are predicted, but there is actually a click, the article is trending by 900 percent. Conversely, if no click occurs, the article becomes -100 percent trending.

To fix this, we resort to using pseudocounts. Pseudocounts are often used in practice, and I’ve rarely seen them discussed in textbooks. They are a trick of the trade for data scientists.

The idea is to add a certain number K to the number of clicks the article got and to those it should have gotten. The trendscore then becomes

It is easy to see, how this fixes our variance problem, if we plug in different values for K. If K=0, we have the original trendscore, and as we just discussed

With a pseudocount of K=10, the same article is only considered as 8.9 percent trending

Pseudocounts can be considered a brutish trick of the trade, but there’s actually quite a bit of theory supporting it. For example, if you want to be fancy about it, pseudocounts are what you get if you impose a beta distribution as a Bayesian prior on the quantities that you’re trying to estimate. In frequentist statistics, you would consider this a regularization technique.

The figure above shows the distribution of trendscores, with and without pseudocounts. It is clear that the pseudocounts do a great job in taming outliers.

There is an inherent asymmetry in the trendscore, since an article can not be worse than -100 percent, whereas it can be infinitely positive trending. Furthermore, like many other metrics, clicks on news articles tend to have very large positive outliers. There are some articles that everybody wants to read! A trendscore of 0 means that the performance of the article is perfectly average. Therefore, the distribution of the trendscores is relative symmetric around zero, but the average is slightly negative and there are some large positive trendscores.

The problem with introducing a pseudocount is that we trade in a variance problem for a pseudocount problem.

How do we pick the right K? There are a few different approaches:

Experiment with different values of K and pick one that tends to give good results across many cases.
Try to come up with a formula that determines the right K, for example based on the total traffic and the position of the article on the page, or based on the predicted number of clicks.
Come up with an algorithm to automatically determine the best K for every article and at every point in time based on some kind of optimality criterion, similar to hyperparameter tuning in machine learning.

What we learned.

Depending on the news cycle, the page overall can be trending.

The plot above shows the predicted number of clicks and the actual number of clicks for one of our pages. Since the predicted clicks are based on average articles from the recent past, the page overall can trend positive or negative, depending on the news cycle. If there is a news situation that readers are particularly interested in, they will click more than on average. If the news situation is calm, users might click less than usual.

Trendingness is relative to the environment an article is in.

Different widgets on the homepage of a news website typically have different editorial concepts. The highest positions are usually reserved for the most important articles across a wide number of topics, whereas lower widgets are often dedicated to a specific topic such as sports or politics. Articles are normally moved into these widgets when they seem less important, or if they become older and therefore lose relevance. That means we compare an article with stronger competitors if they are on top of the page, whereas the competition is weaker in the thematic widgets lower on the page.

Consequently, the trendingness of an article can change considerably if it is moved from one widget into the other. Imagine, for example, a sports article that performed slightly below average among the top news, but it gets moved down quite quickly to the sports widget, where it is of major interest compared to other (and potentially older) sports articles.

Trendingness can be different between desktop and mobile.

Our experiments show that trendscores are much more predictive for what future users will click, if we determine them separately for desktop and mobile devices. Even if the selection of articles on the page is the same for both platforms, the page layout usually differs considerably.

Furthermore, it seems that there are topics that work better on mobile then on desktop and vice versa. This can be seen nicely in the example above, where we see the trendscores for the same article over time. While the performance was comparable across platforms for most of the day, it performed much better on mobile devices in the evening hours.

Conclusion

As the examples above show, the performance of news articles can not fully be understood without accounting for promotion bias. Counting clicks is not enough to uncover hidden gems. Furthermore, when working with absolute numbers, the journalists need an intimate understanding of what constitutes a high or a low number. This makes metrics inaccessible and is an obstacle for newsrooms to become more data driven. In contrast to that, trendscores can be interpreted immediately — without any prior knowledge of comparable data. An article simply performs better or worse than it should by a certain percentage.

Analytics also becomes much easier if analyses are based on trendscores that already account for promotion bias. When working with absolute numbers, it is likely that a lot of findings are due to the page structure and newsroom processes and not actually because of reader preferences.

Altogether, trendscores are a powerful tool to gain deeper insights about the performance of content.