Tale of the Tape

Published in

Analyzing the World Cup using Google Cloud

6 min readJul 15, 2018

One more match to go in our analysis of the World Cup using Google Cloud. Prior to and throughout the World Cup, we’ve talked through a lot of analysis workflows. We talked about data coverage, how we use GCP to to manage all data, turn it into predictive features and finally model it. In the end, our models fall into one of four categories.

Team based features
Player based features
A blend of 1 & 2
ELO

The blended version combining team and player based features performed the best on our test cases, but we thought it’d be interesting to run our models and compare results from on a more interesting test set than a randomized sample of historical games. We are of course referring to the 2018 World Cup.

For each group stage game we consider three possible outcomes — win, loss or draw and only consider the pick correct if the highest projected outcome occurred. This hurt our performance some — as teams needed to be very evenly matched for a draw to be the highest probability outcome. Then when it came to the knockout rounds, a different model was used that did not include the possibility of a draw — in which case a pick was deemed correct if the team advanced regardless of extra time or penalties.

Through the Group stage, our four models all performed relatively evenly. Both the player based model and the ELO based model correctly picked 56.3% of games (though it was a very different set of correctly picked games), while the blended model and the ELO based model did slightly better correctly picking 58.3% and 60.4%, respectively.

The analysis below represents the accuracy of picking the favored team. This is a harsh mechanism to judge model performance as this form of scoring does not take into account the probability weight. For example, an incorrect pick on a high win probability is less precise and therefore should be scored “worse”.

During the tournament we received some negative feedback when a favored team did not win. For example, we had Portugal at only a 6.5% margin in their match v Uruguay. When Uruguay won, the internet was poo-poo on the estimate citing it was “WRONG!!!”, “TOTAL FAIL”. The England v Croatia match was a similar scenario that also yielded some tasty internet trolling when Croatia an 8% underdog knocked off England. Unfortunately, not everyone understands probabilities.

In the knock stage (modulo the final match yet to be played) our blended model correctly picked the favorite 66.7% in a World Cup with several cataclysmic upsets. Post the final match we will update this post with a log-loss numbers to shed light on the precision of the models. For now here is the simple accuracy view for each model broken out by group stage and knockout stage.

Detailed results from group play are below:

The glaring weakness from Group A was a serious overestimation of Egypt. Our models picked them to beat both Russia and Saudi Arabia, advancing to the knockout round — the hosts made sure that wasn’t going to happen and by the time the Pharaohs faced Saudi Arabia the group was decided.

In Group B we were mostly let down by a large number of draws and some poor finishing from Morocco. Iran failed to put a shot on target in the 2nd half against the Moroccans, but won on an own goal and even though the draw came in as high as 28% in Spain vs. Portugal, each model predicted a winning goal that never came. In the end, this was our worst group in terms of accurate projections.

Our player based model went 3–3 on France games (including our only correctly picked draw) but was only 1–3 on games that didn’t involve the eventual finalists. All of our non-ELO models picked an upset in the final group game — Australia versus Peru, which ended up missing.

Unsurprisingly, all four models missed Croatia’s upset of Argentina and Iceland’s draw against Messi & co. Both our team and blended models went 4–6 in Group D, outside of those two game. our player based model went against the grain picking against Croatia in the final group game — the start of Croatia outperforming our expectations.

Again in Group E we picked winners eventually leading to our models being 0–8 on the two Swiss ties.

Like the rest of the world, we were surprised by Germany’s early exit and as a result our models were 0–8 in their two losses. Our models were split on Sweden versus Mexico and in the end our player based and blended models got it right.

Group G

Group G was our most accurate group only missing one game with one model. Now this makes sense as there were no real upsets and no draws to trip things up. The one miss was our ELO model picking Panama over Tunisia in what was the final game for third place in the group.

Group H — which was viewed by many as the most even group before the tournament, had the most variation in model picks. No single game had the same result chosen from all four models, and as a result at least one model correctly picked each game outside of the Senegal versus Japan draw.

We have struck out on four of the 15 knockout games; Russia’s all time upset against Spain, Sweden’s win over Switzerland, Belgium’s quarter-final win over Brazil and Croatia’s semi-final win over England. Outside of those games, at least two models have correctly picked each knockout game, including today’s third place game.

In the knockout rounds our player based model has correctly picked 60% of games, while our other three models have correctly picked 66.7%.

With one game remaining — our model performances aggregated over both group play and the knockout rounds are summarized below:

Blended model: 61.9%
Team based model: 60.3%
ELO based model: 58.7%
Player based model: 57.1%

For what it’s worth — all four models have picked France as the favorite tomorrow, the same way all four models picked England in the semis…

Tale of the Tape

Written by Eric Schmidt