Do exclusive releases drive increases in copyright infringement?
Our new project looks at the way that discourse about copyright infringement changes in relation to changes in digital media markets. We want to find out what impact shutting down high profile peer-to-peer sites or launching new subscription services has on consumer attitudes to infringement.
Our prior research (with Paula Dootson) confirmed what we often hear anecdotally: consumers often justify infringing copyright because of the lack of cheap, easy, accessible channels to legitimately access content. Consumers blame rightsholders for failing to meet market demand, and this encourages a social norm that infringing copyright, while illegal, is not morally wrongful.
We want to be able to explore how market changes drive changes in attitude. We are looking at how people talk about copyright infringement on social media. Using machine learning techniques, we set out to measure the volume and proportion of statements in support and in opposition to illicit downloading.
This is a first look at some of our very preliminary findings. This is very much work-in-progress; it should not be relied upon, but we are very excited about the potential to dig deeper and explore some of these issues.
We have been working with Max Kelsen, a data analytics agency, and using software developed by Crimson Hexagon to examine discussions about infringement on popular social media (at this stage, primarily but not exclusively Twitter). We used keyword searches designed to find explicit discussions about copyright infringement and downloading.
We use Crimson Hexagon’s Brightview supervised machine learning algorithm to classify tweets into three buckets: for copyright infringement (yellow in our graphs), against copyright infringement (blue), and irrelevant (not shown). We trained the monitor by hand-coding approximately 750 relevant tweets and excluding about 2000 irrelevant tweets. We did this iteratively; coding a sample first, then validating and improving the reliability of the algorithm by exploring the results. This is exploratory — while we are happy enough that the algorithm is working to help us understand trends, we have not rigorously validated the quantitative results.
The first striking issue to note is that Pro-infringement discourse seems to fall off quite sharply after 2014 after building steadily from Twitter’s launch. This is true both in terms of volume and proportion of tweets. Explicit anti-infringement sentiment is visible, but at a relatively much lower volume throughout most of our sample.
Note that explicit discussions about copyright infringement are relatively quite low in volume compared to other discussions around popular media. Most discussions concern music downloading — which correlates with our concurrent analysis of mainstream media, where stories about music downloading also dominate.
Exclusive releases are the first phenomenon we thought would be interesting to drill-down into. One of Tidal’s key strategies this year was to get high profile artists to release their albums exclusively on the platform. Tidal hopes to drive subscribers up, but we also suspect that its strategy will also drive up infringement. We think we might be able to see both short term and long term effects in our data.
Here we see a massive spike in pro-infringement discourse when Kanye released Life of Pablo exclusively on Tidal. When we look at the data, there are many tweets that are very critical of Kanye’s decision: many people were very willing to say that they would download it illicitly rather than sign up for Tidal.
We need to be careful here: we see in many other places that large spikes in volume of pro-infringement tweets accompany many highly anticipated launches. This may not be distinctive to exclusive releases. We need to dig a little deeper, but Kendrick Lamar’s release of ‘untitled unmastered’ (which also started at the top of the Billboard 200) a few weeks later on all major digital channels did not produce much of a spike in pro-infringement tweets.
So let’s dig deeper, and have a look at Beyonce’s release of Lemonade as a Tidal exclusive on 24 April 2016. This move was so controversial that Beyonce ended up releasing on iTunes only 24 hours later — although Tidal is still the only streaming service that carries the album.
This one is fascinating. The first peak is Beyonce’s release of Lemonade, which also debuted at #1 on the Billboard 200. Here we see a big relative increase in pro-infringement discourse.
The second peak is Drake’s release of Views, also a Billboard chart topper. Views was released exclusively on Apple Music and iTunes. Here too we see a big increase in pro-infringement tweets — users were complaining a lot about iTunes/Apple Music exclusivity. What is fascinating to note here, though, is that we can see Drake fans also criticising people for illicitly downloading the album — making up much of the blue spike on 29 April.
Let’s just quickly compare to Taylor Swift’s release of 1989 on 27 October 2014 on iTunes, Google Play, and Amazon Music.
After an initial peak in pro-infringement discourse, Swift fans quickly started criticising others for illicit downloading. But what is really striking here is Swift’s decision to pull her other albums from Spotify after 3 November 2016. This prompted a lot of criticism from upset users who vocally supported infringement as a response.
Next steps and limitations
- First, I need to point out again that this is very preliminary exploration. We are excited about the possibility of using machine learning tools to help with this type of analysis, but we need to spend a lot more time refining the categories and validating the results.
- We are also only catching a relatively small proportion of discourse here — where people are explicitly talking about infringement. To be really useful, we need to look much more closely at other posts around media releases. There are a lot of tweets where people explain that they are buying these albums legitimately that we excluded because we couldn’t easily separate them from marketing tweets. We need to find a way to work through this data in a more comprehensive way.
- Overall, though, I am very excited about what we might be able to do here. The initial results are very promising. Next, we will try to do this same classification with the new batch of deep learning tools that are now available (e.g. Watson’s Natural Language Classifier), and spend some more time validating the results. This should dramatically increase the reliability of our classification and allow us to draw more robust conclusions.