A review of Pitchfork reviews (part 1: exploratory and time-series analysis)

Daniel Cañueto
8 min readAug 13, 2019

An analysis of Pitchfork reviews across 20 years with Python

(The notebook of part 1 is here: https://www.kaggle.com/danielcanueto/pitchfork-reviews-analysis-part-1)

Reader:

Heh, I don’t know if I wanna waste some minutes of my life reading this.

My answer:

Read this if you wanna know about Pitchfork the:

  • Review scores depending on genre
  • Percentage of reviews by genre
  • Review scores depending on review author
  • The trend changes in review scores during these years
  • The importance of the day of the week and the period of the year
  • In which eras were they bolder to give <5.0 scores?
  • Do they really use the decimals as they should do?
  • And, in the 2nd part, analyses about their writing style!

Introduction

Hey! Did you know Pitchfork is building a (pay) wall? From the end of 2019 on, it’ll be necessary to pay for having access to all their database.

How did I react to this news? Well, I feel it like some rite-of-passage. Pitchfork has accompanied me for fifteen years. During years, it was almost the first website I read after waking up. I follow it less and less these days. But, still, it seems some coming-of-age symbol. I guess that, for example, football fans feel the same when their idols start retiring.

In case you don’t know what I’m talking about, Pitchfork is an important music website oriented to urban and alternative music for young people and people who want to still feel young. Their influence to set and destroy trends or their famous reviews, which used decimals in the scores and at times a verbose writing (and other times not so verbose…)

, have caused love or hate reactions to thousands of music lovers during these twenty years.

Count me between their lovers. Without their mix of highbrow and lowbrow, for example, I would have never discovered Steve Reich or multiple genres of African music. His end-of-year lists were mandatory to receive the first good taste of some not-really-well-listened-yet styles. Probably, without them, I would have restricted myself much more to a specific genre instead of jumping between rap, classical music, electronic music or many other styles, depending on what I needed in that moment of my life. They were the first drug that paved the way to other drugs like Resident Advisor or RateYourMusic.

If there is something I would miss of not having access to their archive would be their reviews. Fortunately, there is a dataset of Pitchfork reviews with the complete review, the score, the author, the date, the artist, the album and other interesting information:

It easy to see that this database is interesting by the number of kernels using the database. Some interesting analyses have been published based on this dataset (examples 1, 2 and 3). However, some of these analyses are a little outdated. Also, I feel they do not explore juicy information that I believe other w̶e̶i̶r̶d̶o̶s̶ fans like me will find interesting. By the way, it was a cool chance to refresh my Python Data Science skills.

So, let’s start with some exploratory and time-series analysis. For the second part, I’ll go more into Natural Language Processing analysis to analyze their use of language.

Analysis

Warning: there will be some wild assumptions during my analysis based on my experience years reading the website. Maybe some of them are not completely valid.

Review scores depending on genre

Some sybarites of REAL music like ambient, drone, glitch pop, thrash metal or whatever are going to laugh at this, but experimental, jazz and global are the favorites genres of Pitchfork:

This has a normal explanation: survivorship bias.

They only write reviews for these genres if they really believe it is worth spreading these albums to the average reader. If not, Pitchfork will prefer serving them albums from their preferred genre.

Percentage of reviews by genre

Whoever who has been following Pitchfork during these years knows about their gradual switch from indie rock fans to more rap and pop lovers. The analysis of the % of reviews for each genre shows clearly that the turning point is around 2010:

The graph suggests that the outbreak of the allergy for rock was quite sudden from 2010 to 2012, with also a sharp drop in 2016. This suggests an editorial decision behind this drop.

Time-series evolution of Pitchfork review scores:

The next figure, done thanks to the Facebook Prophet module, shows the average evolution of Pitchfork review scores across these 20 years and the importance of

  • the day of the week
  • the moment of the year

to try to predict the score:

  • General trend: During the first years, there was a trend to give higher scores. This trend changed in 2005 to give lower scores until 2009. In 2009, the trend reverses again to a constantly higher average score. This change seems also correlated with the decrease in rock reviews.
  • Weekly: The best scores are clearly on Sunday. This is normal, as Sunday reviews are mostly to reclaim old essential albums. More interestingly, scores are also higher on Fridays and Mondays. This fits the trend of artists to release their albums on Fridays. Reviewers can review the best weekly albums on Friday-Monday and “scrape the bottom of the barrel” during the other days.
  • Yearly: there are two periods with the higher scores: the beginning of May and the end of October. Anyone aware of music release trends will have an intuition about potential causes. My intuitions: 1. around May a lot of important albums are released before the high season of music festivals. 2. on October, many good albums are released to increase chances to be in best-of-year lists.

When there was more boldness to give <5.0 scores?

We have observed that there was a period between 2008–09 when the average score decreased. Then, the bad harvest ended until the greener and greener pastures of nowadays. Or a C-suite decided that it was not worth it to keep using half of the score range.

To evaluate when there was more boldness to give negative reviews, I’ve prepared a density plot of scores for each year. Density plots help us know the most typical scores and how often there are unusual scores (i.e., not the typical 6–8.5 ones). The higher the line, the more reviews with this score. This is how the distribution of scores has been evolving during these years:

With dashed line, I marked the ‘5.0’ score, so it is easier to evaluate the % of Fs across the years.

The boldness to give negative reviews is mostly over. Almost no reviews deserve an F now, but most ones sit comfortably between the 7 and 8. Good job, PR people and safe-spacey artist fans, you got what you wanted.

The golden eras of boldness to give negative reviews seem to have been 2002–2003 and 2007–09.

Review scores depending on review author

I love this one:

  1. If you wanted a >8.0 review, @MarkRichardson was your man.
  2. 2. If you were reviewed by @robmitchum, my condolences.
  3. 3. I’d like having @Marcissist as a teacher. Almost impossible to fail an exam.

As @Marcissist commented to this figure:

Anyway, I think some flower power and sadistic impulses are involved in the figure.

Finally, the most important question:

Are score decimals really representative about the album quality?

The next figure shows the comparison between a histogram and a density plot of the review scores. To translate Data Science into English, the blue line shows the expected distribution of scores and each bar the volume of reviews with a specific score. If reviewers are really using the decimals as expected, these two kinds of information should match.

Spoiler: They don’t.

There are specific scores with a much lower volume of reviews than expected. So, the bar is much lower than the line. Almost all of these ones are when the digital is ‘.9’ (e.g., 4.9, 5.9, 6.9…). Use the white lines, which separate the ‘.9’ and ‘.0’ scores, to identify these ‘.9’ scores.

So, yeah, as the standard teacher, reviewers like to give a little push to the score and round it so the student has a good enough note.

The other score with a much lower volume of reviews than expected is 8.1. Any Pitchfork freak like me knows the importance of having an 8.1 score compared to 8.2–8.3: the difference between receiving or not the Best New Music label. So, it seems, reviewers might prefer giving either 8.0 or Best New Music reviews, but not in between. This way, they might avoid any backlash from artist PRs or fans.

Conclusion

It’d be easy to make critiques to Pitchfork based on this analysis.

My vision is another one, however. I see that these evil trendsetters are rather only trend hoppers. I see other victims of the circumstances of the current age. An age when a negative comment is forbidden in any talent show. When artists have to review their twitter accounts in case they want to present the Oscar awards someday. They are subject to the trends of the age and try to ride the waves the best they can to survive the crisis in music journalism.

Remember: for the second part, I’ll go more into Natural Language Processing analysis to solve questions necessary to answer such as:

  • Which words are indicative of receiving a better or worse review?
  • Which reviews should have a much better or worse score according to the wording used than the final score?
  • If I put a random text, which will be the review most similar to this random text?

--

--

Daniel Cañueto

Data torturer at Glovo Fintech Risk team. A bad model is better than no model.