
How Much Fun is Maximum Fun?
I’m a big consumer of podcasts. Ever since I started living on my own while in graduate school I’ve found that having funny and interesting people in my ears helps me get through the day. Even now that I’m cohabiting with my wife I haven’t left my trusty podcasts behind. They’re great for the long commutes back and forth to San Diego, for reducing my stress while stuck in LA traffic, and for making my laugh while I cook, clean, and exercise.
I’m a fan and donor (support the things you love!) of one podcast network in particular, Maximum Fun, so much so that I’m a semi-regular participant in their Facebook group and subreddit. Recently, someone in the Facebook group asked for some data about the network, in particular the number of shows that have been published, in order to visualize the growth of the network. As someone who’s keen to keep my data analysis skills fresh and nimble I took this as an opportunity to dive back into R. Here’s what I’ve done so far:
Podcasts are unique in that they’re basically just a simple feed of audio files. That feed has data embedded into it that we can access and save. I’m pretty new to web-scrapping, but I was able to find a really nice example of how to scrape an RSS feed in R here. I adapted that to scrape and save data from each of the podcasts in the Maximum Fun network.
I probably could have created a function to run through all the shows, but instead I processed each show individually. It was actually useful as a few of the shows had missing episodes or titles and durations that didn’t match up.
Edit: I was able to find the feed for the The Goosedown and the entire backlog of Bullseye/The Sound of Young America and have updated the data/visualization to include it.
Once I had all the data scrapped from the feeds I was able to combine it into one dataset of 4,202 episodes from 25 different shows. The date/duration variables were pretty messy so I noodled around a bit and cleaned them up into something manageable. I’ve saved that final data in Rdata and .csv formats if you want to play with them yourself.
Visualizations
Once we have all the data in a good format creating visualizations is actually pretty easy! Let’s start with a simple bar chart that plots the number of shows per month:

That’s not bad, but what if we wanted to know which shows were on the network over time? We can use the “id” variable we created in the initial data scrapping process to label each show. This visualization needs a better color palette to better differentiate between each show, but I’ll leave it here for now:

What can we find out about each show? Let’s start with visualizing the total number of hours each podcast has published:

How about the number of episodes per show?

I also got around to reformatting the data so that we could look at the number of shows and amount of content produced by Maximum Fun over time.

What about the amount of content over time?

Unfortunately, this doesn’t include some of the great shows that have moved on to be either independtly operated or part of another network, but it’s still a pretty good approximation of the growth over time.
I’ll probably keep noodling around with this data. Probably a lot more I can do with visualizing particular shows and the network as a whole. If you have ideas get in touch!
Originally published at ernestoramirez.com.