How We Rate Things: A One Bite Pizza Review Analysis

Brennan Bugbee
6 min readApr 2, 2020

--

The website One Bite (https://onebite.app/) is a pizza rating application built for and operated by Barstool Sports. It is the spin-off, gps-enabled companion for a YouTube channel of the same name. Birthed out of a love of eating pizza, the company’s owner, David Portnoy, and his cameraman Frankie, travel New York and beyond to discover and rate as many pizza joints as possible. Each video review contains a descriptive breakdown and color commentary by Dave who carefully assesses a wide range a characteristics the average pizza-goer might neglect to consider such as: crisp, flop, grease, char, style, presentation, and possible ties to the Mafia. This culminates in a decimal score on a scale of 0–10, 0 being a disastrous miss and 10 being the almost unreachable, high watermark of pizza excellence. Beginning in 2013 but really taking of in 2016, these reviews number almost 800 and unassumingly contain very useful data for statistical analysis.

Although taste-tests are subjective, user-supplied reviews rely on averaging thousands of scores based on independently contrived scoring scales to converge on a rating, whereas Dave is an independent and uniquely reliable resource for two reasons: First, he has demonstrated a relatively consistent pattern of scoring with only a slight bias towards his favorite style of pizza (bar pizza) which is an unavoidable human trait. Second, That bias is mitigated by the selection of plain cheese pizza which provides a controlled variable in the review process. The question is: how accurate are Dave’s Ratings and how do they compare to the many thousands of user ratings the application has amassed? It’s useful to add some context for Dave’s scoring system and all its perceived biases. Here are his ratings distribution broken down by year to get started:

Table 1

Graph 1 is a breakdown by year showing how the score distribution has varied within the past 5 years. The years 2013–2015 have been removed because there are so few samples however, Graph 1 is a great indicator of how his scoring system is devised; clearly there is left skew here (Pearson skew = -1.63 for the combined 798 reviews) so 5 is in fact not an average rating but a sub-par rating more akin to a letter grade scoring system A-F where F is <60%. Interesting to note that the standard deviation has been slowly declining over the years while the mean and median have increased. The pattern in this scoring system suggests that not much separates a good pizza from an excellent pizza, but a lot has to go wrong for a pizza to be considered of poor quality and taste (like kicking him off property in the middle of a review https://www.youtube.com/watch?v=IbQHPcMORQs). However, comparing Dave’s reviews to user reviews indicates a deeper mystery about how people rate things. I’ll move on to user vs Dave rating comparison but first I want to remove the outliers and help mitigate the skew and negate any scores brought about by Dave’s rebuke.

outliers = df[np.abs(stats.zscore(df.DavesRating)) > 3].index
df.drop(outliers, inplace=True)

Here is the comparison for the distribution of Dave’s scores vs user scores:

Graph 2, 3

Graphs 2 and 3 pertain only to restaurants that have been rated by both parties. The quadrants are very similar which can be explained by Graph 3 where the absolute value of the delta (User minus Dave) is plotted against the number of reviews. In general, the more user reviews a pizza joint has, the closer the average score is to Dave’s rating. Because Dave’s ratings and user-supplied ratings converge rather tightly and users demonstrate a similar amount of skewness to Dave’s rating system, the conclusion I draw is that Dave has an impressive ability to accurately match pizza ratings to the general public at large. One crucial bit of information is that Dave normally travels to pizza joints based on recommendations, so those locations are potentially going to represent higher than average ratings because they have an active fan base and are therefore likely to have higher quality pizza. In that lens, you might not assume that the skew presented in his dataset is not the result of the ratings system, but of his restaurant selection process for sampling, but below is the full dataset for all 172,342 user ratings that mirrors the distribution of the sample set of Dave’s reviews.

Graph 4, Table 2

Graph 4 and Table 2 describe all user ratings for all 18,500 restaurants on the list. When analyzing all user reviews, They center around a mean of 7.04 and have moderately high skew of -0.9. For that reason I did not trust the results of a t-test between Dave and the other reviewers, but the data shows that 50% of the scores falling between a 6.6 and a 7.6. So where did this propensity to rate pizza on an academic grading scale of a C- average and negative skew come from? It even crops up in restaurant chain ratings too! Take MOD Pizza for example, it has the highest number of reviewed restaurants in the dataset at 113 locations with 591 total reviews, and the sample skew looks identical to the population skew in Graph 5:

Graph 5

Logic would assume that the chain restaurants do their best to make a consistent product and therefor the user reviews would be normally distributed around the mean. McDonald’s for example, has mass-produced a Big Mac that will look and taste the same no matter which franchise you visit. The inclination to give a higher rating when reviewing pizza and possibly food items in general is an interesting phenomenon. How does one man who prides himself on (mostly) fair and honest critique, so closely characterize the results of 172,342 independent reviews? And why do we use a misleading scale resembling academic grades instead of using 5 as an average? It might be that people are more inclined to flatter than denigrate or that meeting the threshold for tasty pizza is easy to accomplish. Or maybe school has ingrained in us the notion that 50% is a failing score, and not the average which lies in the 70–80% range. In any measure, Dave, the self-described “Pizza King”, has not demonstrated a significantly different approach to pizza ratings as the hoard of users on his app, but that just might make him an excellent reference guide for pizza lovers to make reliable comparisons.

As a reward for reading this, here are the top rated pizza joints by both Dave and Users:

--

--