Acast Case Studies: Listening Data

Acast Tech Blog

Published in

acast-tech

7 min readFeb 8, 2019

By Katie Rogers and Jonas Björk

Real-world examples of using data to deal with growth and change in podcasting.

What?

At Acast, we have always put accurate data at the forefront of our efforts to help grow a mature podcast marketplace. The breadth and variety of shows we host and monetise means we have a unique vantage point to analyse listening behaviour and changes as and when they occur.

With over 3000 shows and over 100 million monthly listens, our listening statistics register changes in audiences and platforms around the world. We see lots of different types of listening patterns, most of which we can correlate with known causes out there, in the podcasting world:

The exponential growth of an indie podcast taking off and becoming a hit through word-of-mouth and favourable blog reviews.
A large audio streaming platform doubling down on its efforts to push people towards podcast listening.
The listening spike caused by one of our major publishing houses launching a daily news show, spending significant resources on its promotion and high production values.
The ‘summer lull’, traditional with the break in sports podcast coverage and the reduction in commuter listening to news programmes (but the increase in beach-side binge listening!)

However, there are types of growth where sometimes the old adage ‘too good to be true’ applies. In those situations, it’s on us to identify the pattern and work out what should be considered a ‘listen’ — that is, a download or stream from a human being that intends to listen to a specific piece of content. Through a mixture of data analysis and market understanding, we can hone in on what is real listening behavior.

Why?

With these case studies we hope to illustrate how we deal with these variations in observed listening behaviours. By providing context based on real world examples we have come across, we want to show what ongoing questions such as ‘what is a listen?’ actually mean for hosting services such as ours.

We believe this should be shared in the interest of transparency and collaboration. We aim to continually improve our methodology for storing and filtering data, and by publicly documenting those efforts we can all learn more about how shows are affected by external changes, and how best to manage them.

#1: New app, overnight success?

Anyone can make a podcatcher and submit it to the app store. Established audio companies can decide to enter the podcasting space, with teams of developers focused on optimising the listener experience for smooth playback. New modes of listening — home devices, speech activated smart speakers, wearables — can only be expected to find new ways of making efficient download requests from RSS feeds.

In all of these cases, teams developing these products won’t necessarily have read the IAB podcast metric guidelines or want to adhere to these standards over improving the ability for people to access audio content.

During Autumn this year, we noticed a new user-agent (aka device) with listening figures that exploded overnight. We were registering hundreds of thousands of requests from IP addresses all over the world, with a sustained pattern rather than spikes of requests.

What could this be? We had some theories:

An established client changed their user-agent without notice. We found no significant drop in another user-agent that correlated with this sudden growth.

A major app launch had successfully driven thousands of new users to become podcast listeners. This kind of growth would be unprecedented, but podcasting is a (relatively) young medium and sudden mainstream success shouldn’t be discounted. However, looking at the market there were no signs or reports of an app or platform release that would be big enough to generate this amount of new traffic.

We reached out to our contacts at the platform we believed were likely generating the traffic. After some communication back and forth, the client let us know that they had patched what we believe was a bug in the way downloads were requested. We observed a steady decrease in traffic that followed a slow roll out pattern until we by the end of 2018 saw the traffic starting to stabilize and converge to around about 20% of the initial spike.

***Figure 1***: Graph visualising the rapid growth and subsequent decrease of daily requests from a new user-agent

#2: Just how big is desktop listening?

To support a mature, professional podcast market, we need to reach a point where advertisers are confident enough to double down spending on podcasts. It’s therefore our responsibility to only serve ads on what we can reasonably be assured are actual human ‘listens’.

At Acast, our content team is constantly on the hunt for great shows to champion and support through monetisation. We want to seek out new and diverse voices, and partner with established shows looking to leverage our dynamic insertion technology.

In Autumn, our content team signed a podcast with a reported 2 million monthly listens. Once the show was migrated onto the Acast system, we were able to get a better understanding of the spread of these listens. Two key points about their listening figures struck us as out of the ordinary:

In their first month on the Acast platform, 81% of these listens came from Windows desktop devices
A similar sized show from the same market had about 1% of their listens from Windows devices

We needed to understand more about where this traffic was coming from and, more importantly, whether we should count these requests as ‘listens’ to monetise through advertisements. By investigating the requests we could see that many of them came with referrals from web based ads-networks.

Meaning, the requests to stream this show and resulting traffic were being generated through purchased ads and linked clicks on the show directly. These listens are generated by humans, but for sure the vast majority of them did not have the intention to listens to a podcast when clicking the ad and hence the listens wouldn’t be legit.

Episode releases were being broadcasted via mailing lists. We reached out to the podcaster to see if they were able shed some light on this pattern of listening behaviour. Their response was that they had partnered with a company to help them with marketing on social media.

It is possible that the mailing lists were linking directly to the mp3 files and hence generated a lot of clicks. That together with internet ads generating downloads of the mp3s would explain this behaviour since the market share for Windows is approximately 80% for desktop users. Understanding how this traffic was generated and explaining to the podcaster that their listens are not legit isn’t short term a win-win situation. Their listening numbers would go down drastically and we lose inventory to sell ads on. However, to continue to gain trust in podcasting as a medium we have to constantly improve the quality of the listening data.

***Figure 2***: Graph visualising the distribution of monthly requests from Windows based web browsers for a podcast with abnormal traffic patterns

#3: Mr. Opera

A podcaster reached out to us to report some strange behaviour in their listening stats. It was a series-based sports show that stopped releasing during the summer, but was starting to see a sudden and rapid increase in listens. The podcaster had used Acast’s Insights tool themselves to find out that that traffic was coming from the Opera Web browser and was asking us if this was reasonable.

We further investigated the requests on that show and discovered that they were coming in from similar IP ranges and each IP downloaded the same 20 episodes during the same minute. During a single day there were patterns we identified showing when the bursts of downloads occurred. In this case it was 26 minutes and 56 minutes past the hour. But that pattern sometimes changed during the day and sometimes the downloads even ceased for a couple of hours and then came back. Most podcast measurements have a defined caping of listens per hour or day, but for a show that is averaging around 5000 listens per month, this would of course be very significant if the numbers of IPs is in the hundreds.

Further analysis led to us identify the same pattern of request from the same IPs for a similar sports show. A coincidence, perhaps, but it’s interesting to note that the two shows affected were from the same genre. Wild guess, but maybe it was a software engineer developing a distributed RSS crawler and used their two favorite shows when testing.

It is hard to standardise how this should be dealt with. iAB recommends keeping a blacklist of IPs that generates traffic like this. That makes sense in some cases, but in other cases IPs are assigned to someone doing something legit. It basically boils down to each podcast platform to detect and block this type of bot traffic the best possible way they can. An alternative approach is to rate limit the requests using a sliding window where each incoming request increments the counter for the window. If the counter exceeds a threshold, the request is discarded. Keeping a counter per IP, user-agent, and show would be rate limited as soon as it hit the threshold and since the window is sliding it wouldn’t count any more requests until the traffic ceases for a period longer than the window size. This also removes the work of keeping IP blacklists updated. All podcast measurement initiatives mentions rate limiting the requests per ip and user-agent, but it is very open for interpretation what type of rate limiting algorithm that should be used and the results are quite different depending on choice.

***Figure 3***: Graph visualising the number of daily requests from user agent mapping to Opera web browsers for a podcast with likely bot traffic.

Next gen data infrastructure at Acast

We are currently investing and revamping our data infrastructure at Acast in order to continue to lead the effort of increasing the accuracy of measurements in podcasting. There are many interesting initiatives on-going in the ecosystem, e.g. RAD and Audible Impressions to name two. We are following them closely and hope that we can together find a solution where there is incentives for both the podcatchers as well the platforms to report accurately.