SOCCER ELO: THE REBUILD

PART 1: INTRODUCTION

Matt Barger
6 min readDec 12, 2016
Swipe Right Now for a Fun Soccer Night in Your Area! Img (and all apologies): New York Times

Ever look at a league table and say, “Yeah, but…”? The Seattle Sounders just did, to be fair. What if we could have a rankings system for sports that goes deeper than wins and losses? That takes into account home field advantage, margin of victory, importance of match, and, most importantly, can give you an accurate prediction of how the match should go?

Dr. Arpad Elo: Say no more fam

The Elo ratings system is a self-correcting algorithm that connects results, match importance, margin of victory, and other variables to reflect a soccer team’s relative value to its peers. It has been used in ranking systems for mostly two-player games, most famously chess (Dr. Arpad Elo himself was a chess master). In creating a universal ratings system, chess players from different geographies and league networks can get a feel for their relative skill levels coming into a match. The ratings system itself can be modified to iterate team sport competition, and has been used famously by Nate Silver’s FiveThirtyEight blog for time series-based analysis of team performance in basketball, football, baseball, and tennis.

In fact, the Elo ratings system (ELO for short) is the engine of the FIFA Women’s World Rankings. In 1997, it was modified to codify men’s international soccer results on EloRatings.net. This serves as a judicious opposition proposal to the strange current FIFA World Rankings methodology for men’s teams:

Pending his corruption trial, of course. Image: Giphy

The inner workings of the Elo ratings system revolves around the following relationship:

Unexpected results (where the difference between actual and predicted result approaches 1) have a greater effect on the change in ELO rating than expected results.

Therefore, the closer we can mirror match results to their expected values, the more dramatic improbable results are, and the more accurate and predictive the ratings system is as result. Let’s walk through an example to really drive home the basics and the effectiveness of this system.

EXAMPLE: Klinsmann’s Last Stand

[Queue] World’s Smallest Violin. Img: Telegraph

Neil Paine at FiveThirtyEight uses ELO to compare former U.S. Men’s National Team coach Jürgen Klinsmann to his predecessors. I have my own opinions on this piece, but that’s for another time. Let’s see how ELO takes on the two U.S. Men’s National Team games that cost Klinsmann his job.

EloRatings.net starts ratings around 1500 and calculates results of all available data for international matches to calculate relative strength. At the start of the CONCACAF World Cup Qualifying hex, the six participating teams were rated as follows:

Source: EloRatings.net

U.S. vs Mexico

Columbus, USA — 11 November 2016

Signature ball-watching defense from Captain Bradley in Columbus. Img: AP/Valley News.

Going into the first CONCACAF World Cup Qualifying Hexagonal match against Mexico (1885) in Columbus, Ohio, the U.S. was rated a relatively weaker team (1757) but was playing in a stadium in which they hadn’t lost to Mexico. ELO takes both of these things into account when calculating an expected result value:

For eloratings.com, the home team advantage is treated as a constant, 100. The expected result in for the U.S. this case would be:

In other words, the U.S. was a slight underdog, thanks to the home field advantage (Expected Result would be 0.2121 if the game were in Mexico). The expected result now gets plugged into the following equation to determine the number of points at stake:

For this match, the U.S. would earn a factor of 21.6 points by the end of the match. In the event of a draw, two points would be added to the US’s rating and subtracted from Mexico’s rating. In the event of a loss, the U.S. would lose a factor of 18.4 Elo points.

Considering the result was USA 1–2 MEX, the U.S. lost 18 points from its previous rating:

Mexico, on the other hand, gained these 18 points and improved to 1903.

Costa Rica vs U.S.

San Jose, Costa Rica — 15 November 2016

Coming to Spotify — Poorly Defending a Corner Kick: the Album. Img: Major League Soccer

Costa Rica beat Trinidad and Tobago at home in its first CONCACAF hex game, gaining 15 points to a 1816 rating. Playing in San Jose, Costa Rica, the U.S. loses its home field advantage, making the chances of victory much slimmer:

After the U.S. turned in one of their worst performances under admittedly a more favored team in Costa Rica, losing 0–4. As a result, the change in points went as follows:

Interestingly enough, the U.S. lost nearly the same number of points on the road to Costa Rica as they did at home to Mexico. If the Costa Rica match was in, say, Houston, Team USA would have come in to the match a slight favorite (Expected Result = 0.5331). If they turned in a 0–4 dumpster fire then (which, by the way, seemed likely given the team’s level of enthusiasm), they would have lost 40 ELO points as a result, a much more substantial cut (~2.5%) to their World ELO rating.

In these two disappointing results, Team USA lost 7 ranking spots in the ELO World Ranking. Lowly Panama, who won and drew their first two matches, creeped ever closer to the U.S. in Elo rating:

Source: EloRatings.net

What does this mean?

On balance, a neutral-ground game between the U.S. and Panama will now yield an expected result 66 Elo points more in favor of Panama. In terms of expected result, in a neutral ground game, the U.S. went from being 65.7% favorites to 56.7% favorites, a total shift of 18 percentage points to a margin of 13.4 pp between the two teams.

But… what does that mean?

Is this the difference between winning and losing? Is this the difference between winning and drawing? Drawing and losing? Can we optimize ELO’s expectations of results to be more accurate?

ELO is an effective way to show a team’s relative strength to its peers through evaluating its past performances. Using 21 seasons of league data from ten top European domestic club leagues, this study will determine whether past performance is indeed indicative of future results. This project will be discussed in four parts in which I tell you how I:

  • Created an ELO ratings system for each league across 21 years of soccer (Part 2)
  • Used the findings in ELO to create a prediction model in order to predict match result and scores to simulate seasons (Part 3)
  • Run the prediction model against other iterations of odds making, including other ELO models from the literature and odds from betting houses (part 4)

--

--

Matt Barger

Soccer, one data point at a time. Curator of the Gringo Samba blog.