Find your new favorite podcast with personalized recommendations

How Breaker finds podcast doppelgängers and data sciences the shit out of them.

Published in

Breaker

6 min readMay 30, 2018

Breaker has always been an amazing social podcast app, thanks entirely to impressively hip and unnervingly attractive listeners like you. Recently, you may have noticed a fun new feature at the top of the Discover tab — personalized recommendations. I’m here to tell you the tale of where those recommendations come from.

Quick author check: I’m not your usual Erik or Leah blog writer. What?! you say. Who is this guy? My name is Tyler, and, I admit it, I love machine learning and data science. It’s even my job, over at Unbox Research, where we do custom machine learning work like building recommendation systems, making text classifiers, clustering data, and extracting user behavior or topic models from your data. With our powers of science and podcast combined, Erik and I built your friendly Breaker recommendations engine together.

Discovery is hard

I see great podcasts (think Serial or Radiolab) as the brightest stars peppering a velvety night sky. They stand out — they’re landmarks in an otherwise nebulous landscape. The problem is that there are so many other less obvious but beautiful stars of content worth visiting — and many more you aren’t that excited about. It’s hard to find those not-as-famous episodes that are a perfect match for your style because, unlike family shows like Rick & Morty or Game of Thrones, not everyone and their mom is talking about their favorite podcast all day. Podcasts can be weird and esoteric and delightful in ways that turn off some people but excite their equally weird niche audience. Plus, while we’re listing out negatives, if you scream for more great content in the dark black nothingness between podcasts, no one can hear you. Also, another minus, no oxygen.

How Breaker saves your (podcast) life

Breath in the sweet, life-sustaining goodness of a glorious new podcast episode. How do we algorithmically find new episodes for you? The secret sauce is learning from savvy listeners with good taste.

Breaker recommendations are built primarily on likes. Essentially, if a very similar set of users likes both episode A and episode B, then those episodes are likely to have a similar kind of appeal. Based on that, fans of episode A will be recommended episode B, and vice versa. That’s the key idea.

A spoonful of mathiness

Once we dive into the details, some challenges arise. How do we mathematically quantify the similarity of users who liked various episodes, and how can we do that efficiently? How can we keep the data up to date as new user actions hit the system continuously?

When I look at the matrix, I see **Radiolab**, **This American Life**, **Welcome to Night Vale**.

The basic math considers all user actions as a giant sparse matrix. Each user gets their own row, and each episode has its own column. This complete matrix would be enormous if stored entirely in memory — but, like our vacuum between podcasts, it’s mostly empty, that is, it’s mostly filled with zeros indicating that most users have simply had no interaction yet with most available episodes.

The basic means of comparing two episode relies on a normalized dot product between their columns in this matrix. Intuitively, this is very similar to asking what percentage of users who interacted with either episode happened to like both episodes. High scores here mean the episodes are probably very similar. Not only do we have a way to group episodes together, but we also have a sense of how related the episodes are — this will come in handy at recommendation time since it allows us to put the best-fitting (highest scoring) episode at the top of the list.

Finally, when it’s time to update your personal recommendations we take all of your most recent likes, add up the similarity scores of all the related episodes, and give you the top results from that computation.

If you’re curious, the key math looks a bit like this:

Note that you’ll have to like episodes, not just listen, so Breaker has a solid sense of what you want to hear more of! All of this happens behind the scenes, including some near-magic matrix update operations that are triggered every time you interact with Breaker.

Example: Reply All

On July 27th, 2017, Gimlet Media released episode #102 of their podcast Reply All, entitled Long Distance, in which a phone scammer gets reverse scammed. It’s a great two-part story — good enough to merit it’s own stories in Wired and Ars Technica. Based on this single episode, the new Breaker algorithm makes these episode suggestions as well:

The algorithm’s primary recommendation — if you haven’t already heard it — is actually to listen to part 2 of the source episode (Long Distance). It also recommends episodes like Gregor from the show Heavyweight — a show from the same podcast company as Reply All, and one that similarly peers into the drama and introspection of being human.

Keep in mind that when you check your own recommendations, they’ll actually be based on more than a single episode — assuming you’ve liked at least two episodes so far! And they change in the background every time you like a new episode. Like more episodes, get new recommendations.

The future of podcast-space exploration

I’m excited about the new world this Breaker feature opens for listeners — but at the same time, it’s just the beginning. There are many possible directions to explore from here. A few things we’re considering include:

Incorporating overall episode popularity in the recommendations ranking; currently the popularity is only implicitly used by the user-likes matrix.
Learn how much particular listeners enjoy entirely new podcasts versus just finding the best episodes within the podcasts they already enjoy.
Learn which podcasts are most hop-in friendly, as opposed to those that really make the most sense to listen to sequentially.

The end goal is to get the skinny on what you, personally would most love to hear. Essentially, it’s to find the great taste of a sophisticated curator who knows everything you like, and everything everyone else likes as well. This is a lofty goal, but we do have an awesome data set, thanks to you, and for now we’re content to let that good taste shine through to people who are about to discover something new and weird and wonderful.

So do yourself a solid: download Breaker, listen and like your favorite episodes, and let us know what you think of your personalized recommendations!