Can we predict the scores for World Cup matches?

Lucas Calestini
Sports Analytics
Published in
4 min readJun 6, 2018

How do we go about predicting scores and results for a World Cup tournament in soccer ? What would be an adequate baseline model for estimating the winners and losers of every one of the 48 group-stage matches ?

Photo by Jannes Glas on Unsplash

One way to look at this problem is through a normalcy perspective: what would happen if every team behaved the same way as it has up until now? But before we could start working on any analysis, a long inconclusive discussion could take place about what normalcy really entails. Do we look at goals scored, matches won, shots on goal, possession, players? Or do we try to work on some sort of expected value from all previous encounters between the teams in each match?

At Torneo we decided to take a different approach. After a while looking at the statistics for each game, players and events, we realized that a true baseline model would take into consideration different metrics and calculations by different people, in an aggregated manner. Not only past overall statistics but also heuristics and domain expertise, like an ensemble technique for an economics problem.

Needless to say, the best aggregator of opinions, calculations and perspectives when it comes to sports is the odds market. It contains intelligent analyses of outcome probabilities adjusted to bets on each side to account for market information and efficiencies, where people have a financial incentive to be as accurate as possible. In other words, it has the base Vegas statistical models but adjusted to heuristics and opinions of bettors around the world.

We took the 3-way odds for each game in the World Cup group stage (win vs. loss vs. tie) across the 14 largest odds-makers in the world, averaged them out and embedded it into our pick-generator tool (aka pick for me). This way whenever someone clicks on the generator, we pick the outcome based on the aggregated probability of that team winning. For instance, if someone clicked on the generator 10 times for a match of Brazil vs. Colombia where Brazil has 70% chance of winning, they should see around 7 wins for Brazil.

Once the outcome is defined we only had to decide on the score, and that’s when we looked at historical data on attack and defence strength for each team in all their international games in the past 4 years (mean goals scored and defeated for each team / all teams). We then applied the expected result to a Poisson distribution in order to have the final scores for the fixtures. This way we are able to create a baseline model where the outcome is decided based on odds-makers and the scores are decided based on past performance of all teams, respecting both the market information and the extrapolation of past performances.

Therefore pick for me works in a two-step fashion: first it draws a sample from the a-priori probabilities for the outcomes of a match and then it generates the score by drawing the results from the Poisson distribution of the expected values (and here we made sure the Poisson sample draw respected the outcome picked).

Photo by Emma Dau on Unsplash

We created pick for me with two type of users in mind: those who are not into the sport or teams in question, and those who want to double-check their picks with the market opinions without leaving our platform. The premise has never been to offer a result or a specific pick for games, but rather to provide users with a baseline model that can help them, in a probabilistic manner, to guess the outcomes for World Cup matches. As we also provide the mean score for all other users in our platform, we should start to see convergence between what people at Torneo are picking and what the odds markets are predicting. But first, let us see how the market reacts.

--

--