Measuring Podcast Trail Effectiveness

Daniel Hills
Jan 6 · 5 min read
Image for post
Image for post

Onward journeys are really important to BBC Sounds. Building better journeys on our app allows our users to better explore the huge range of audio content we have on offer. In-episode trailing is a great way to recommend shows to listen to next — but how do we know if it’s working?

In an ideal world we’d do this via a standard A/B test, where we split users randomly into control and treatment groups. However in this specific case we’d also need to create two versions of the audio content, one with the trail and one without, requiring a significant amount of engineering resource.

Despite this, with a bit of extra analytical work, we can still measure the effectiveness of trails. This article will focus on how we applied this approach to a marketing campaign on That Peter Crouch Podcast, which trailed several different podcasts over a number of weeks.

Creating control & treatment groups

The critical feature of the podcast trails was that they occurred at the end of each episode. This might not necessarily be the ideal place to have trails, but it makes creating our testing groups much easier.

This is because it allows us to filter to users who completed at least 80% of the episode as a proxy for engagement, and then split this group into control and treatment based on whether they heard the trail or not. If the trails had occurred in the middle this wouldn’t work since users in the control would have heard less than half of the episode, so probably wouldn’t have been that engaged and therefore would not have been a suitable comparison group.

Image for post
Image for post
Visualisation of how we split users into control and treatment groups for our trail test

One key assumption

This creates our control and treatment groups, but unfortunately there is still a key difference between the two groups since treatment has listened to more content than control. Therefore, if we want to test this rigorously we need to first test the following assumption:

Listening to slightly more of an episode does not make you more likely to listen to the trailed content.

To do this, we conduct an initial assumption test, where we split users into ‘fake’ control and treatment groups as follows:

Image for post
Image for post
Visualisation of how we split users into control and treatment groups for our assumption test

In this case, neither group has listened to the trail, and therefore we’re directly testing the assumption. If the assumption is true, we should measure no significant difference in conversion between the groups.

Defining conversions

Now that we’ve set up our control and treatment groups, we define a framework for measuring success based on what fraction of users are converted to listen to the trailed content within 2 weeks of hearing the trail.

It’s important to ignore users who have listened to the trailed content in the 13 weeks (1 quarter) prior to hearing the trail. This is to stop existing listeners of the trailed content muddying the analysis.

Image for post
Image for post
Pre- & post-trail listen periods

Measuring impact

Using the conversion definition we can therefore classify each user as:

Using the beta distribution, given by the parameters alpha (successes) & beta (failures), we can generate a probability distribution centred around our conversion rate:

Conversion Rate = Successes / (Successes + Failures)

Measuring impact is then a case of comparing the two beta distributions for control and treatment to see if there is any significant difference. The method I used is described in much more detail in this article.

The assumption test

Dealing with the assumption test first, fortunately we saw no significant results for any of the 6 episodes tested. It’s important to note this doesn’t mean we can definitely say there is no effect from listening to more of an episode, it instead means we can’t measure any effect at the accuracy enabled from our sample size.

The plot below shows a comparison of two beta distributions for ‘That Captains Episode’, which shows a large overlap in distributions.

Image for post
Image for post
Beta distributions for assumption test

The actual results

With our assumption tested, the actual results from the trails showed significant results across each episode. So basically, trailing works!

To give you an example of the effect size, using the same episode from the example above, we see the conversion rate has gone from 4.4% with no trail, to 6.3% with a trail. Whilst the conversion rate percentages are quite small in general, this represents an increase of 46%.

Image for post
Image for post
Beta distributions for trail test

It was promising to see uplifts across the board since we trailed a range of content from other football specific content, such as Football Daily, to less obvious recommendations, such as Radio 1’s Scott Mills Daily Podcast. This can give our marketing teams more confidence in taking more risk in terms of recommending less typical onward journeys.

Finally, probably the most positive aspect of these results were that we saw larger increases, in some cases almost a 200% increase, in conversion rates for both our under-35 and infrequent audience. This makes intuitive sense since these users tend to be less familiar with our content, but as these are target groups for BBC Sounds it was great to measure this directly.

Generalising the method

To summarise the above approach, the keys steps to this method are:

Future trail testing at BBC Sounds

This analysis has given us clear proof that our trailing works and defined a clear framework for future tests. With this measured, we now have the freedom to experiment and test out different trail hypothesis, such as:

This allows our marketing teams to have more confidence in their campaigns and take the necessary risks required to broaden our audiences listening habits.

BBC Data Science

Learn more about how the BBC collects, interprets…

Daniel Hills

Written by

Data Scientist @ BBC

BBC Data Science

Learn more about how the BBC collects, interprets, visualises and democratises data to achieve our goal of putting the audience at the heart of everything we do. https://www.bbc.co.uk/

Daniel Hills

Written by

Data Scientist @ BBC

BBC Data Science

Learn more about how the BBC collects, interprets, visualises and democratises data to achieve our goal of putting the audience at the heart of everything we do. https://www.bbc.co.uk/

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store