Predicting the 2022 World Cup result with Google Sheets and Zen
As you probably know, the World Cup is currently ongoing in Qatar. With that in mind, here at Pebble Stream, we decided to build a simple predictor using our free Google Sheets plugin, Zen.
You can access it here to copy it to your own Google Drive. You also need to install Zen to run it for yourself.
The Model
Our model has two inputs and multiple parameters. Let’s review them.
On the one hand, we have player data. We gathered it from footystats.org for both national leagues and international games. While our data is not exhaustive, it is sufficient for development.
On the other hand, we have games played during the World Cup itself.
We keep our strategy simple. Using the players’ statistics, we start by computing a score associated with each country, using only pre World-Cup data. Once this score (and the associated ranking) is known, we can use this to compute points to give after each game.
Pre-World-Cup scoring
We must compute scores at the player level to compute any team’s total score. After all, the data we have is Messi (get it?); for example, Portugal’s star player, Cristiano Ronaldo, will have stats from Manchester United and his national team. You can see the calculations in the `Player-scores` worksheet if you want to get to the dirty bits of how we did it. The player’s score depends on his position, and we take the maximum of their performance between their scores with their national team and their club. An additional parameter can be fiddled with in the `Parameters` sheet. We give a value to the professional league the players belong to, which becomes a multiplicative bonus (defaulting to 1). It is a subjective factor that might reveal our biases, but anybody using this model is encouraged to change them to their liking.
Now that we have a value for each player, we compute their team’s score using a `group-by` and a `sum` directive. Next, we group the players by country and sum their scores. This calculation is in the `Country-scores` worksheet.
As we said earlier, our data is incomplete. Also, in some cases, teams have a shortened roster (e.g., France only brought 25 players). So the next step is to normalize these scores by averaging them over the number of players in each lineup. See the `Country-normalized` worksheet. Finally, we use the `sort` directive to produce a pre-World Cup ranking.
Incorporating World Cup games
We specified earlier two inputs: players’ data and world cup games. How do we consider the latter? Matches can be appended to the `Game-actual-results` worksheet.
The games are evaluated in the `Scored-games` worksheet. We start by computing two things:
- The goal difference.
- Whether the game represents an upset (by upset, we mean whether the lower-ranking team in our pre-WC order beats or ties the higher-ranking one).
Obviously, the higher the goal difference, the more points the winning team gets. Similarly, if a game is an upset, then the number of points given to the winning team (and deducted from the losing team) increases significantly.
Once again, if you want the detail of the formula, we invite you to look at the spreadsheet.
Once this is done, we calculate the score adjustments in the `Adjust` worksheet and combine it with the `Country-normalized` one to produce the final scores. Finally, one last `sort` directive is used to produce the `Prediction` result sheet.
Fun things to try
Now that we understand how this Zen-enhanced spreadsheet works have some fun with it. Change the World Cup games results to indulge in `what if?` analysis. Or try removing recent matches to see the rankings earlier in the tournament. Fiddle with the sheet as much as you like. You can always return to our original spreadsheet to get a new copy.
Conclusion
Zen allows us not to worry about the rest of the spreadsheet anymore and only with the input data. Press the `Run all` button to recompute the predictions after modifying player data or adding a new game to the list.
While this model is far from perfect and is very much a toy project, it calculates an objective and reproducible ranking that is easy to understand and modify. For example, it would be relatively simple to change how we assign player scores and recompute the predictions in one click.