SOCCER ELO: THE REBUILD
PART 1: INTRODUCTION
Ever look at a league table and say, “Yeah, but…”? The Seattle Sounders just did, to be fair. What if we could have a rankings system for sports that goes deeper than wins and losses? That takes into account home field advantage, margin of victory, importance of match, and, most importantly, can give you an accurate prediction of how the match should go?
Dr. Arpad Elo: Say no more fam
The Elo ratings system is a self-correcting algorithm that connects results, match importance, margin of victory, and other variables to reflect a soccer team’s relative value to its peers. It has been used in ranking systems for mostly two-player games, most famously chess (Dr. Arpad Elo himself was a chess master). In creating a universal ratings system, chess players from different geographies and league networks can get a feel for their relative skill levels coming into a match. The ratings system itself can be modified to iterate team sport competition, and has been used famously by Nate Silver’s FiveThirtyEight blog for time series-based analysis of team performance in basketball, football, baseball, and tennis.
In fact, the Elo ratings system (ELO for short) is the engine of the FIFA Women’s World Rankings. In 1997, it was modified to codify men’s international soccer results on EloRatings.net. This serves as a judicious opposition proposal to the strange current FIFA World Rankings methodology for men’s teams:
The inner workings of the Elo ratings system revolves around the following relationship:
Unexpected results (where the difference between actual and predicted result approaches 1) have a greater effect on the change in ELO rating than expected results.
Therefore, the closer we can mirror match results to their expected values, the more dramatic improbable results are, and the more accurate and predictive the ratings system is as result. Let’s walk through an example to really drive home the basics and the effectiveness of this system.
EXAMPLE: Klinsmann’s Last Stand
Neil Paine at FiveThirtyEight uses ELO to compare former U.S. Men’s National Team coach Jürgen Klinsmann to his predecessors. I have my own opinions on this piece, but that’s for another time. Let’s see how ELO takes on the two U.S. Men’s National Team games that cost Klinsmann his job.
EloRatings.net starts ratings around 1500 and calculates results of all available data for international matches to calculate relative strength. At the start of the CONCACAF World Cup Qualifying hex, the six participating teams were rated as follows:
U.S. vs Mexico
Going into the first CONCACAF World Cup Qualifying Hexagonal match against Mexico (1885) in Columbus, Ohio, the U.S. was rated a relatively weaker team (1757) but was playing in a stadium in which they hadn’t lost to Mexico. ELO takes both of these things into account when calculating an expected result value:
For eloratings.com, the home team advantage is treated as a constant, 100. The expected result in for the U.S. this case would be:
In other words, the U.S. was a slight underdog, thanks to the home field advantage (Expected Result would be 0.2121 if the game were in Mexico). The expected result now gets plugged into the following equation to determine the number of points at stake:
For this match, the U.S. would earn a factor of 21.6 points by the end of the match. In the event of a draw, two points would be added to the US’s rating and subtracted from Mexico’s rating. In the event of a loss, the U.S. would lose a factor of 18.4 Elo points.
Considering the result was USA 1–2 MEX, the U.S. lost 18 points from its previous rating:
Mexico, on the other hand, gained these 18 points and improved to 1903.
Costa Rica vs U.S.
Costa Rica beat Trinidad and Tobago at home in its first CONCACAF hex game, gaining 15 points to a 1816 rating. Playing in San Jose, Costa Rica, the U.S. loses its home field advantage, making the chances of victory much slimmer:
After the U.S. turned in one of their worst performances under admittedly a more favored team in Costa Rica, losing 0–4. As a result, the change in points went as follows:
Interestingly enough, the U.S. lost nearly the same number of points on the road to Costa Rica as they did at home to Mexico. If the Costa Rica match was in, say, Houston, Team USA would have come in to the match a slight favorite (Expected Result = 0.5331). If they turned in a 0–4 dumpster fire then (which, by the way, seemed likely given the team’s level of enthusiasm), they would have lost 40 ELO points as a result, a much more substantial cut (~2.5%) to their World ELO rating.
In these two disappointing results, Team USA lost 7 ranking spots in the ELO World Ranking. Lowly Panama, who won and drew their first two matches, creeped ever closer to the U.S. in Elo rating:
What does this mean?
On balance, a neutral-ground game between the U.S. and Panama will now yield an expected result 66 Elo points more in favor of Panama. In terms of expected result, in a neutral ground game, the U.S. went from being 65.7% favorites to 56.7% favorites, a total shift of 18 percentage points to a margin of 13.4 pp between the two teams.
But… what does that mean?
Is this the difference between winning and losing? Is this the difference between winning and drawing? Drawing and losing? Can we optimize ELO’s expectations of results to be more accurate?
ELO is an effective way to show a team’s relative strength to its peers through evaluating its past performances. Using 21 seasons of league data from ten top European domestic club leagues, this study will determine whether past performance is indeed indicative of future results. This project will be discussed in four parts in which I tell you how I:
- Created an ELO ratings system for each league across 21 years of soccer (Part 2)
- Used the findings in ELO to create a prediction model in order to predict match result and scores to simulate seasons (Part 3)
- Run the prediction model against other iterations of odds making, including other ELO models from the literature and odds from betting houses (part 4)