Who will win the World Cup?

DBS Bank. Live more, Bank less
Discover
Published in
3 min readJul 11, 2018

By Royce Teo, Managing Director & Group Head — Data Management at DBS Bank

Economists and analysts have their picks, so do parrots and octopi. Each time the World Cup comes along, many will attempt to predict the winner. This year at DBS, we are doing so as well, albeit in a different way.

We’ve started a data science challenge in the bank to predict the winner of the World Cup. Simply put, our colleagues will need to use data to predict the champion, first and second runner up, as well as the scores of the final match and third-place playoff.

We collected and uploaded raw and historical World Cup data (going back to the 1930s!) into a database so participants would have a ready data resource to analyse. While we originally intended for only data scientists to take part in this not-so-serious challenge, it spread so quickly that we were soon receiving hundreds of entries. So we opened it up to everyone in DBS, with the catch that you can’t “guess” and your prediction must be backed by a data-driven approach, however complex or simple. It was a great way to get people started on data analytics riding on the World Cup frenzy!

We’ve had a wide variety of approaches — varying from deep learning algorithms, statistical models, to some slightly non-traditional and unique methods.

Final count: 2,200 folks entered from across 14 countries, and 30% of the participants are women. The major upsets (Brazil, Germany, Italy) knocked out a massive proportion of contenders and only 28 participants remain.

Interestingly, as we reach the final few rounds, women are now making up 40% of the remaining teams.

On one end, we have teams who have taken “data driven” to the extreme, collecting years of match results, player performance stats and applying Deep Learning algorithms ranging from Genetic Based Machine Learning (GBML), High Dimension CNN and Reinforcement learning to predict the outcome of this year’s Football world cup.

All these techniques have their advantages, for instance HD-CNN (high dimension CNN) approach stores historical match results in a tensor, capturing different dimensions such as match type (group match, final match, friendly match, etc), team geography (European team, Asian team, etc), match location etc. Based on this, a CNN model is trained to capture the inter-relationship of any pair of teams from which the match outcome could be predicted.

We’ve also had slightly unconventional approaches being used that repurposes the ELO Rating System — a methodology originally invented for rating a player’s skill level in chess and eventually predicting the outcome of a match. One of our colleagues adapted this approach to predict the outcome of the World Cup.

On the other end of the spectrum, one of the entries we have received used Tarot cards to “forecast” the winner! While it’s safe to assume that this might be a bit of stretch in meeting our criteria of being “data driven”, it remains to be seen whether Tarot cards will trump the pundits or deep learning!

Finally, to answer the question we posed in the title, according to our remaining teams, it’ll be France!

--

--

DBS Bank. Live more, Bank less
Discover

DBS creates banking that's fast, natural, effortless. Banking that's with you wherever life takes you, instead of taking you away from life. dbs.com/livemore