Apple Podcasts: Poor for Privacy
Apple Podcasts is surprisingly keen to share information with podcast hosts, it turns out.
You’d think, from some of the coverage on privacy, that Apple is the saviour of the internet, and single-handedly saving us from those that would spread our personal data as far as it’ll go.
Apple is, undoubtedly, doing some good. The new privacy label in the App Store is certainly alarming some companies, as it should. The EFF, which I’m a personal member of, calls it “one more step in the right direction”, and I’d agree.
Apple are using their heft and scale to force change for online privacy. This is a good thing.
But not all of Apple’s products are as private as they should be. And one of them, which doesn’t have a privacy label since it’s part of the underlying OS, is built very poorly indeed when it comes to privacy.
Let’s talk about podcasts
When Michael Barbaro releases a new strangely-intonated show for the New York Times, the first thing that happens is that an audio file is uploaded to a company called ART19. The audio file sits on ART19’s servers.
Within minutes of Michael forcing his way through another rendition of the oddly-spaced “Here’s… what… else you-need tknwtdy”, ART19 make a change to a thing called an RSS feed — a small computer file that contains all the available episodes for that podcast. It looks a little like this:
A computer server working for Overcast, or Pocket Casts, or Google Podcasts, or even Spotify, checks this RSS feed quite often: probably about every five minutes (or roughly 400 times a day).
If Overcast’s server notices that there’s a new episode in this file, then it immediately tells every single app that is subscribed to The Daily, and within seconds, Overcast apps across the world are downloading the new audio.
Every podcast app works this way: a computer server in “the cloud” somewhere, continuously checking these podcast RSS feeds. When it changes, it either tells every app out there that there’s a new show to download, or it waits for the app to check in with it. “Any new shows for me?” “Yes, there’s a new edition of The Daily, and a new episode of Crime Junkie.”
Every podcast app works this way.
Except Apple Podcasts.
Why’s Apple Podcasts different?
Apple Podcasts doesn’t use a computer server in the cloud for this sort of thing. Instead, by design, every copy of the Apple Podcasts app checks each RSS feed you’re subscribed to.
So your phone is checking directly with the podcast hosting company, every hour of every day by default, whether Michael Barbaro has uttered the words “We’ll be right back” a new time. If you subscribe to twenty podcasts, it’ll check twenty different RSS feeds.
This is quite bad for privacy, in a few different ways:
When you connect to my server, I know a few things about your device: an IP address and a useragent: a signature of the device that is asking for the data. A typical useragent is
Podcasts/1530.3 CFNetwork/1209 Darwin/20.3.0 which tells me the version of a few different pieces of software on your device. Which means…
- The IP address may, loosely or tightly, define your location or your household. There are plenty of ‘device graph’ data brokers who will be able to work out what company you work for, or whether you share your household with someone who fits the pattern of a 35 year-old woman.
That’s not that unremarkable. Every proper podcast app has to, because of the way it works, allow the phone itself to download the audio file: and the useragent and the IP address are the only thing they get there.
- Relatively unusually, the useragent for Apple Podcasts is helpfully translated into the language that your phone is set to; so “Podcasty”, or “Подкасти” also appear in the list.
This is a bit more unusual: but it’s not the worst thing in the world. It’s a bit more of an identifying piece of data, but that’s about as far as it goes.
- With Apple Podcasts, the podcast hosting company isn’t just getting this data every time a piece of audio is being downloaded. They’re getting it every time your phone checks the RSS feed.
- Because the useragent contains version numbers of three separate libraries, as well as the language that the user’s device is set to, it becomes quite easy to fingerprint. But it’s made much easier by the fact that — uniquely with Apple Podcasts — this data is sent every time your phone checks the RSS feeds.
If I’m a big podcast host, chances are that a podcast listener might be listening to more than one of my shows. There might be hundreds of thousands of subscribers to The Daily — but how many also listen to Grumpy Old Geeks as well as comedy variety show Just between Us? They’re all hosted by the same podcast host, as one example. And if they’re all checked by the same device every hour, a phone running a French language version of the Podcasts app, and version 1489.2, that’s likely to be relatively unique.
Meg, our listener to The Daily, who also enjoys geek talk and a girlie comedy chat podcast, is checking into the podcast host every hour. When she leaves the house, her phone keeps checking: but from a different IP address, one which is clearly from her cellphone company. When she gets to work, and connects to the office wifi, she again changes IP address. And the podcast hosting company knows. Every hour.
It’s not beyond the realms of possibility for the podcast hosting company to know when Meg leaves the house every day. And whether she’s working late tonight. And when she’s on holiday, or working from home.
But only with Apple Podcasts.
It isn’t clear where this data is going, nor is it done with consent
Apple Podcasts don’t tell you who hosts the podcast you’re subscribing to: it’s not surfaced anywhere in their app.
So, you’ve no knowledge as to whether this personal data is going to a nice, sensible podcast hosting company, or one that isn’t so nice: since it’s not clear anywhere within the podcast app who is the hosting company.
There’s some open data (which, disclosure, I contribute to) which could tell the listener who is hosting what podcast. That can also tell you how else the company might use your data. Apple doesn’t show this data.
Now, that’s not too dissimilar to a web-browser; but the difference is that a web-browser isn’t reporting in every single hour telling NefariousPodcastHost where you are; and also never mentioning who NefariousPodcastHost is.
Those hourly checks are also visible to your mobile data provider, and your ISP, by the way; and since some podcast hosts use unique, identifiable domains for each podcast they host, it’s also advertising what you listen to, to your internet company or anyone else who can see that data.
Nobody else works this way
Every other app leaves the potentially privacy-problematic RSS polling to a central set of servers. It makes life much easier for the app — which can check with one central place every hour whether there are any new shows; or, better, be notified instantly when there are new shows to download.
Apple Podcasts is unique in that it both stores your podcast subscriptions centrally (so Apple knows which podcasts you’re subscribed to), but also polls directly to the podcast hosting company.
Apple’s approach also causes an interesting set of problems for podcast hosts: because, not only do they need to be careful with user privacy, they also need to be resilient to an awful lot of traffic.
I host my own podcast, which has about 1,500 download a day: and, yes, I see all those pings to my own RSS feed. (Since I only host one podcast, I’m unlikely be able to track you, you’ll be glad to know.)
Overcast made 408 requests for my RSS feed yesterday. Amazon Music made 482. PodcastAddict, 379. And Apple Podcasts made… 3,438.
If I doubled the amount of listeners to my podcast, Overcast would still make 400 or so requests, as would PodcastAddict. But Apple Podcasts would make almost 7,000 requests. And so it goes on.
I can see all kinds of Apple Podcasts clients connecting to my server, with different build numbers and from different IP addresses. And to be clear, of any of the major podcast apps, it’s only Apple Podcasts that does this. Nobody else* disrespects the privacy of their users in this way.
Apple could do a lot to make its podcast app better: but not letting podcast hosts spy on their listeners, every hour of every day, would be a good first step. After all, most podcast hosts don’t even have clear privacy policies.
. . .
(* Of course, Spotify is overall much worse; though it doesn’t tell podcast hosts about you every hour.)