World Cup Qatar 2022 Prediction

Maarten De Schryver
4 min readJan 28, 2023

--

Evaluation of a Wisdom of the Crowd Model and 3 Data-driven Models

Before the World Cup, we are inundated with various prediction models trying to tell us who will win the world cup. However, we often don’t hear much about the performance of these models after the World Cup. By comparing the performance of different models, we can learn what works and what doesn’t when it comes to predicting. And to be honest, I also like a little competition: which model outperforms the others?

In addition to some data-driven models, I could test a unique model: The Wisdom of the Crowd model. This model, based on data from SPORZA’s World Cup prediction game, where an average of 93,000 individual predictions per game were made on the site during the group stage, can lead to a different perspective than the other models.

The Wisdom of the Crowd model is an interesting concept because it is based on the predictions of individual users rather than on specific statistics or other “objective” data. According to sports psychologist Prof. Markus Raab, fans use some simple heuristics to successfully predict the outcome of football matches:

1. Recognition heuristic: choose the country that is more associated with football.

2. Take-the-last-ranking heuristic: which country went furthest in the last World Cup? If these are equal, a draw is predicted.

But there are also some biases. Take-the-first heuristic: the country that is offered first (the ‘home team’) is declared the winner. This heuristic is more successful in competitive soccer, where there is a real home advantage. Furthermore, research by Na and Kunkel (2019) shows that a win is overestimated while a draw is underestimated. And not surprisingly, people seem to overestimate their own country and have an overly rosy view of their country’s performance during the World Cup. This may lead to an overestimation of their own country’s chances of winning.

Four World Cup Qatar 2022 prediction models

Model 1: ELO model. This is a model I created 5 years ago for the WC2018. The model uses only the ELO rating (an kind of alternative FIFA ranking) of the teams as a source of information.

Model 2: KULeuven model. This model, developed by the DTAI SPORTS ANALYTICS LAB of the KULeuven, starts with the ELO rating and supplements it with additional information on the defensive and offensive ratings and the cumulative market value of each team.

Model 3: Innsbruck model. This model was developed by researchers at the University of Innsbruck. (Well, out of the 5 people involved, just 1 is from Innsbruck, so this sentence is far from the truth. 1 is Innsbruck, 1 Dortmund, 1 Luxembourg, 1 Gent and 1 Munich). It is based on historical data, bookmaker data and team-specific details such as market value, FIFA ranking and team structure. It also takes into account country-specific socio-economic factors (such as population and GDP per capita).

Model 4: Wisdom of the crowd model. Here we use the data from SPORZA’s World Cup predictions. On average, about 93,000 individual predictions were made on the site during the group stage. The percentages of home team win, away team win or draw are considered as prediction.

I only used the data from the group stage to evaluate the models. This concerns 48 matches.

Some (expected) characteristics of the Wisdom of the Crowd model:

  • The average prediction of a draw is 15%. For the data-driven models, the averages range from 22% to 28%
  • There is a more extreme focus on winning or losing. While the Innsbruck model estimates win probabilities between 8% and 79%, the crowd model estimates win probabilities between 0.2% and 99.4%
  • Predictions by the crowd model are clearly in favor of the home team (+16%)- although being the “home” team does not give any more advantage during the World Cup.
  • The tendency to overestimate one’s own country is also reflected in the data: For example, the KULeuven model estimated a 45% winning probability for Belgium against Morocco, while 85% of SPORZA participants predicted a win for the Red Devils.

Evaluating prediction models

Each model can earn 100 points per game. The scoring algorithm takes into account the 3 predicted chances per game (win, loss, draw) and points reflect the accuracy of the prediction. For example, if the model predicts a 100% chance of a win for team A, but team B wins or it’s a draw, then the model gets no points. If team A wins, then 100 points are assigned. The final score is an average score of the 48 group games.

If predictions were made at random without any prior knowledge, a score of 33 would be expected. If one systematically put 100% chance on the home team, one would get a score of 40. Scores greater than 50 indicate an informed decision.

Conclusions

  • The wisdom of the crowd model significantly outperforms random response.
  • The 3 data-driven models perform better than the Crowd model.
  • The simple ELO model, in all its simplicity, performs quite well. In fact, the ELO-model can be considered almost as good as the other two, more advanced models- which may be considered remarkable.

To conclude- the Wisdom of the Crowd Model appears to be a trusted partner for making better decisions. However, data-driven models outperform the wisdom of the crowd- mainly due to be less influenced by expected bias. Finally, adding more information does not really increases the prediction performance. Simplicity is considered a virtue for a reason.

Johan Cruijff would conclude “ Toeval is logisch”.

References

Sangwon Na, Yiran Su & Thilo Kunkel (2019) Do not bet on your favourite football team: the influence of fan identity-based biases and sport context knowledge on game prediction accuracy, European Sport Management Quarterly, 19:3, 396–418, DOI: 10.1080/16184742.2018.1530689

Raab, M. (2012). Simple heuristics in sports. International Review of Sport and Exercise Psychology, 5(2), 104–120.

--

--