Dancing with the Stars (DWTS): Score Inflation?
Have you ever wondered how the scores could drift one season to the next for DWTS? As a result of my wife, I started watching DWTS Season 3. Now at Season 27, I have begun to wonder if the scores have inflated over time. Some times it has seemed that couples do far better in the first few rounds than they used to. I decided to study this empirically for fun.
The general scoring is done by 3 judges and some times a guest judge. These scores are combined with viewer scores to determine who moves on to the next round. I’ll focus solely on the judges scores because through 27 seasons, it’s been the same three judges.
Data Cleaning: I had to clean the data to remove group dances that gave extra scores, and I also normalized scores for four judges to three judges to maintain an even comparison. I mainly focused on the score distribution across seasons and weeks within a season.
There could be two causes to score inflation:
- The judges are softening up over time.
- The Stars are coming from a more dance oriented background like musicians, ice skaters, or gymnasts.
If there is no inflation, neither of these theories matter. If there is score inflation, one could tease these out by looking at individual Stars per season. I don’t have scores by individual judges, but that information could be gathered. The fastest way to collect this data was using the summary tables on Wikipedia, and I didn’t want to spend a great deal of time messing around if there was no evidence in the overall score.
My investigation follows the fail fast philosophy. Fail fast means you want to explore a particular part of an area to determine if it will not work. This causes you to fail faster than if you spent more time developing it. For example, if Face ID worked on men but not women, we would not have continued developing the feature until we fixed the problem or just stopped working on it.
A distribution of scores over each season shows a pretty even distribution. There is not a clear trend.
However, we need to look at week to week. As the season wears on, scores generally get higher as the stars become better dances. The number of weeks per season varied, so I have the actual scores per week and then normalized to a 13 week season:
Some seasons have higher initial scores like Season 15 which was an All-Star Season, but there is not a clear trend of scores drifting one direction or another. Instead, let’s look at modifying our colorization. Below is colorizing based on the minimum and maximum across weeks (each season normalized to 13 weeks).
Again, Season 15 stands out and should because it should have had the most capable stars dancing. Now there seems to be a slight trend where the most recent ten seasons have higher scores per week than the previous. Looking at the average, there is a definite jump to suggest scores have changed, but it doesn’t look like a slow inflation. One could argue the Stars weren’t as good between seasons 7 and 12, but then after that, there is nothing particularly troublesome.
Let’s look at two more pieces of information: minimum score and percent of perfect scores. The number of perfect scores should increase as the weeks go on in a particular season, but if they occur early or at a higher rate, that could be an indicator of drift that average doesn’t show well. Also, if the minimum score increases across season, that could also indicate a type of drift.
It seems for both metrics, there is not a clear trend.
Extra Bonus: Judge Agreement
In looking at the distributions of scores per season, one tread is clear: the judges vote more often the same than not. This is important when considering the point of having multiple judges if they are giving mostly the same scores. This can be seen when the total score is a multiple of 3.
Per season, we can look at when all the judges agree (multiple of 3) vs when at least one doesn’t.
As a recap, we looked at box plots for score distributions across seasons, colorized plots for average score per week per season, normalized those scores across seasons to 13 weeks, normalized those scores across weeks, and finally minimum and perfect scores. Across all these metrics, there is not a clear indication that score drift is occur. There was definitely a score jump, but the change in scores from one season to the next is not gradually increasing.