Predictive Modelling Football Games using Python and Excel ⚽️

Sam Iyer-Sequeira
Football Applied
Published in
38 min readJul 5, 2024

Predictive modelling in football games leverages the power of data analysis and statistical techniques to forecast match outcomes, offering insights into team performance and match dynamics. Utilizing Python and Excel, this approach incorporates the ELO rating system — a method initially developed for chess, now adapted to various sports — to evaluate team strengths and predict results for major football tournaments such as the EUROs, World Cup, Premier League, and Champions League. By systematically quantifying the performance of teams through historical data, match results, and head-to-head records, predictive models using ELO ratings provide a robust framework for forecasting outcomes, aiding analysts, coaches, and enthusiasts in making data-driven decisions and enhancing their understanding of the game. Through the integration of Python’s advanced data manipulation and machine learning capabilities with Excel’s versatile data handling and visualization tools, predictive modeling offers a comprehensive toolkit for analyzing and anticipating the thrilling events of football’s most prestigious competitions.

To understand the dynamics of the sport across club and international football, we’ve looked at the ELO ratings of club and national football teams across multiple tournaments to get an insight into the trends that have changed this sport.

The Elo system was invented as an improved chess-rating system over the previously used Harkness system,[1] but is also used as a rating system in association football, American football, baseball, basketball, pool, various board games and esports, and more recently large language models.

The difference in the ratings between two players serves as a predictor of the outcome of a match. Two players with equal ratings who play against each other are expected to score an equal number of wins. A player whose rating is 100 points greater than their opponent’s is expected to score 64%; if the difference is 200 points, then the expected score for the stronger player is 76%.[2]

Additionally, looking at historical ELO ratings either over time, or in recent major tournaments tells us how noise there is in a game between two teams; with greater noise meaning greater uncertainty in terms of who will win. For example, a knockout game between Spain and Georgia will be a less “noisy” match then a match between France and Portugal. Whilst it’s important to account for home advantage, this is something that’ll be accounted for in the future when working on augmented calculations of ELO ratings for football teams. However, as we know in this sport, great teams are almost as good away from home then they are playing at home.

These ratings account for the relative strength of the opponent, so that if the underdog beats the favourite, the underdog’s ELO rating would increase by more then the favourite’s ELO rating would’ve had the favourites won.

Machine Learning

To create predictive models for different football tournaments, I followed a systematic process involving data cleaning, preparation, and analysis. Initially, I obtained raw CSV files from sources such as Kaggle and GitHub. These files required significant cleaning and preparation before being uploaded into Python for analysis.

The first step involved filtering the datasets by season and merging relevant datasets, especially those comprising separate seasons. For example, in the case of Champions League data, each season’s data was stored in separate files. Considerable time was spent merging these files and ensuring consistency in team names. For instance, Atletico de Madrid was listed as “Atletico” in the 2021/22 dataset but as “Atletico de Madrid” in the 2023/24 dataset. Ensuring all team names were correct and avoiding duplicates was crucial before completing the merge functions.

To illustrate this process, let’s consider the Premier League dataset. First, various libraries are imported, including pandas for data manipulation, plotly for interactive plotting, numpy for numerical operations, and seaborn and matplotlib for data visualization. The dataset is read into a DataFrame named premierleague from a CSV file located at a specified path. The column names of the dataset are printed to verify the structure.

Next, unique teams participating in the league are extracted by concatenating and stripping whitespace from the ‘Home’ and ‘Away’ columns. This ensures that all team names are consistently formatted. The code then counts the number of matches each team has played, both home and away, and stores these counts in a dictionary. This dictionary is converted to a DataFrame and sorted by the number of matches played, which is then printed.

import os
import json
import pandas as pd
import pydash
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
import csv
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt # plotting
import seaborn as sns

premierleague = pd.read_csv("/Users/name/Downloads/premier-league-matches.csv", index_col=0)

premierleague.columns

unique_teams = pd.concat([premierleague['Home'].str.strip(), premierleague['Away'].str.strip()]).unique()

# Print unique teams
print(unique_teams)

unique_countries_team1 = premierleague['Home'].str.strip().unique()

premierleague_count = {}
for team in unique_teams:
premierleague_count[team] = ((premierleague['Home'].str.strip() == team) | (premierleague['Away'].str.strip() == team)).sum()

premierleague_count_df = pd.DataFrame(list(premierleague_count.items()), columns=['Team', 'MatchesPlayed'])

premierleague_count_df = premierleague_count_df.sort_values(by='MatchesPlayed', ascending=False)

print(premierleague_count_df)

Team MatchesPlayed
13 Newcastle 152
16 Aston Villa 152
2 Liverpool 152
3 West Ham 152
5 Tottenham 152
19 Manchester City 152
7 Brighton 152
8 Everton 152
10 Manchester Utd 152
11 Arsenal 152
17 Wolves 152
1 Crystal Palace 152
14 Chelsea 152
20 Brentford 114
18 Burnley 114
0 Fulham 114
15 Leicester City 114
12 Southampton 114
9 Leeds United 114
6 Sheffield Utd 76
23 Bournemouth 76
24 Nottingham Forest 76
4 West Brom 38
21 Watford 38
22 Norwich City 38
25 Luton 38

unique_teams = pd.concat([premierleague['Home'], premierleague['Away']]).str.strip().unique()
elo_ratings = {team: 1000 for team in unique_teams}

# Set K-factor
k_factor = 10

def calculate_elo_rating(team1_rating, team2_rating, outcome, k_factor):
expected1 = 1 / (1 + 10 ** ((team2_rating - team1_rating) / 400))
elo_change = k_factor * (outcome - expected1)
return expected1, elo_change

def update_elo_ratings(premierleague, elo_ratings, k_factor):
for index, match in premierleague.iterrows():
team1 = str(match['Home']).strip()
team2 = str(match['Away']).strip()

winner = match['Winner'] if pd.notna(match['Winner']) else ''

if winner.strip() == team1:
outcome_team1 = 2 # Win for team1
outcome_team2 = 0 # Loss for team2
elif winner.strip() == team2:
outcome_team1 = 0 # Loss for team1
outcome_team2 = 2 # Win for team2
else:
outcome_team1 = 1 # Draw
outcome_team2 = 1 # Draw

# Get current Elo ratings
team1_rating = elo_ratings.get(team1, 1000)
team2_rating = elo_ratings.get(team2, 1000)

# Calculate Elo changes and expected outcomes
expected1, elo_change1 = calculate_elo_rating(team1_rating, team2_rating, outcome_team1, k_factor)
expected2, elo_change2 = calculate_elo_rating(team2_rating, team1_rating, outcome_team2, k_factor)

# Update Elo ratings in the dictionary
elo_ratings[team1] += elo_change1
elo_ratings[team2] += elo_change2

# Also update the Elo ratings and expected outcomes in the DataFrame
premierleague.at[index, 'team1_rating'] = team1_rating
premierleague.at[index, 'team2_rating'] = team2_rating
premierleague.at[index, 'team1_newrating'] = elo_ratings[team1]
premierleague.at[index, 'team2_newrating'] = elo_ratings[team2]
premierleague.at[index, 'team1_expected'] = expected1
premierleague.at[index, 'team2_expected'] = expected2
premierleague.at[index, 'outcometeam1'] = outcome_team1
premierleague.at[index, 'outcometeam2'] = outcome_team2

return elo_ratings

# Extract unique teams
unique_teams = pd.concat([premierleague['Home'], premierleague['Away']]).astype(str).str.strip().unique()

# Initialize Elo ratings dictionary
elo_ratings = {team: 1000 for team in unique_teams}

# Set K-factor
k_factor = 10

# Initialize Elo ratings columns in the matches DataFrame
premierleague['team1_rating'] = premierleague['Home'].astype(str).map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
premierleague['team2_rating'] = premierleague['Away'].astype(str).map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
premierleague['team1_newrating'] = None
premierleague['team2_newrating'] = None
premierleague['team1_expected'] = None
premierleague['team2_expected'] = None
premierleague['outcometeam1'] = None
premierleague['outcometeam2'] = None

# Update Elo ratings based on matches data
elo_ratings = update_elo_ratings(premierleague, elo_ratings, k_factor)

# Display updated Elo ratings
for team, rating in elo_ratings.items():
print(f"{team}: {rating}")

# Print the updated DataFrame
print(premierleague[['id', 'Home', 'Away', 'team1_rating', 'team2_rating', 'team1_newrating', 'team2_newrating', 'team1_expected', 'team2_expected', 'outcometeam1', 'outcometeam2']])

Fulham: 1557.6800903291848
Crystal Palace: 1683.6736569099126
Liverpool: 2082.7314296712552
West Ham: 1698.7301955073785
West Brom: 1059.7196687809512
Tottenham: 1858.145313214513
Sheffield Utd: 1124.2026004151253
Brighton: 1732.5280143643902
Everton: 1631.2939594027273
Leeds United: 1407.999628779109
Manchester Utd: 1883.9953917877838
Arsenal: 2081.416464584651
Southampton: 1335.7282741949216
Newcastle: 1831.402050161974
Chelsea: 1833.5910308595628
Leicester City: 1478.2040975236016
Aston Villa: 1827.2126916778773
Wolves: 1623.8548528825818
Burnley: 1364.9624793403116
Manchester City: 2251.2060169054635
Brentford: 1625.755805232274
Watford: 1083.483385699522
Norwich City: 1086.9355926849996
Bournemouth: 1473.8620200824955
Nottingham Forest: 1414.7249044520672
Luton: 1166.9603845553654

import matplotlib.pyplot as plt
import seaborn as sns

elo_df = pd.DataFrame(list(elopremratings.items()), columns=['Team', 'Elo Rating'])
output_file = 'premelo.csv'
elo_df.to_csv(output_file, index=False)

elo_df = elo_df.sort_values(by='Elo Rating', ascending=False)

# Print DataFrame to ensure correctness
print(elo_df)

Team Elo Rating
19 Manchester City 2251.206017
2 Liverpool 2082.731430
11 Arsenal 2081.416465
10 Manchester Utd 1883.995392
5 Tottenham 1858.145313
14 Chelsea 1833.591031
13 Newcastle 1831.402050
16 Aston Villa 1827.212692
7 Brighton 1732.528014
3 West Ham 1698.730196
1 Crystal Palace 1683.673657
8 Everton 1631.293959
20 Brentford 1625.755805
17 Wolves 1623.854853
0 Fulham 1557.680090
15 Leicester City 1478.204098
23 Bournemouth 1473.862020
24 Nottingham Forest 1414.724904
9 Leeds United 1407.999629
18 Burnley 1364.962479
12 Southampton 1335.728274
25 Luton 1166.960385
6 Sheffield Utd 1124.202600
22 Norwich City 1086.935593
21 Watford 1083.483386
4 West Brom 1059.719669

plt.figure(figsize=(12, 8))
sns.barplot(x='Elo Rating', y='Team', data=elo_df, palette='viridis')

# Customize the plot
plt.title('Elo Ratings of Premier League Teams')
plt.xlabel('Elo Rating')
plt.ylabel('Team')

# Display the plot
plt.tight_layout()
plt.show()

elo_df.to_csv('filename.csv')

premelo = pd.read_csv("/Users/user/Downloads/filename.csv", index_col=0)

palette_name = "Set2"

plt.figure(figsize=(12, 8))
sns.barplot(x='Team', y='Elo Rating', data=premelo, palette=palette_name)
plt.xlabel('Team')
plt.ylabel('Elo Rating')
plt.xticks(rotation=45, ha='right')
plt.title('Elo Ratings of Teams')
plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt
import pandas as pd

# Assuming nbaelo1 is your DataFrame
# Sort the DataFrame by 'Elo Rating' in descending order
premelo_sorted = premelo.sort_values(by='Elo Rating', ascending=False).reset_index(drop=True)

plt.figure(figsize=(10, 8))
plt.scatter(premelo_sorted.index, premelo_sorted['Elo Rating'], color='blue', alpha=0.5)

for idx, row in premelo_sorted.iterrows():
plt.annotate(row['Team'], (idx, row['Elo Rating']), textcoords="offset points", xytext=(0, 10), ha='center')

plt.xlabel('Index')
plt.ylabel('Elo Rating')
plt.title('Scatter Plot of Elo Ratings with Team Labels (Sorted)')
plt.grid(True)
plt.tight_layout()
plt.show()

The code then initializes Elo ratings for each team, starting with a base rating of 1000. The Elo rating system, a method for calculating the relative skill levels of players or teams in competitor-versus-competitor games, is implemented here. A K-factor of 10 is set, determining the sensitivity of rating changes.

The calculate_elo_rating function is defined to compute the expected score and Elo change based on the current ratings of two teams and the match outcome. The update_elo_ratings function iterates over each match in the DataFrame, updating the Elo ratings based on the match results. For each match, the function determines the winner and calculates the new ratings and expected outcomes. These values are stored back into the DataFrame.

After updating the ratings, the code prints the final Elo ratings for all teams. A new DataFrame, elo_df, is created from these ratings, sorted, and saved to a CSV file named 'premelo.csv'. Finally, a bar plot is generated using seaborn to visualize the Elo ratings of the teams, with the plot customized and displayed using matplotlib.

In summary, this code processes various match data, calculates and updates Elo ratings for each team, and visualizes the results, providing insights into the relative strengths of the teams based on their match performances. This process is consistently applied across different tournaments, with the main variation being the specific datasets used.

Premier League

When constructing an ELO rating model for Premier League clubs since the 2020/21 season, we choose not to include draws when calculating win probabilities due to the concept of regression to the mean. Regression to the mean is a statistical phenomenon where extreme or unusual outcomes are likely to be followed by more typical outcomes. In the context of football, this means that the frequency of draws can introduce noise and obscure the true strength differences between teams.

Excluding draws from win probability calculations helps to provide a clearer picture of a team’s actual performance capabilities. By focusing solely on wins and losses, the model can better gauge a team’s tendency to outperform or underperform relative to their opponents. This approach reduces the impact of random variability that can arise from draws, leading to more accurate predictions of future match outcomes.

However, while draws are excluded from win probability calculations, they are included when calculating ELO ratings. The ELO rating system is designed to reflect the overall performance and strength of a team over time. Draws, being a common result in football, provide valuable information about the competitiveness and relative skill levels of teams. Including draws in ELO ratings ensures that the ratings more accurately represent the balance between attacking and defensive capabilities, and overall team consistency.

To explain regression to the mean further: imagine a team that has an unusually high number of draws in one season. If we include draws in the win probability calculation, it might falsely indicate that the team is consistently on par with their opponents, when in reality, it could be an artifact of random variance. In the next season, the number of draws is likely to revert to a more typical level, leading to potential miscalculations in win probabilities if draws were considered.

Therefore, while win probabilities are calculated based solely on wins and losses to mitigate the noise from draws, ELO ratings take into account all match outcomes, including draws. This dual approach ensures that win probabilities are more stable and less prone to variance, while ELO ratings remain comprehensive and reflective of a team’s overall performance, including their ability to secure draws against various opponents. This balanced method allows the ELO rating model to provide a robust and nuanced assessment of Premier League clubs’ strengths and potential future performances.

The scatter plot of ELO ratings for Premier League teams since the 2020/21 season offers a clear visualisation of the relative strengths of the clubs. At the top of the chart, Manchester City stands out with the highest ELO rating, reflecting their dominance in the Premier League, consistent top performances, league titles, and deep runs in the Champions League. Following them are Liverpool and Arsenal, who also boast high ELO ratings. Liverpool’s recent successes in both the Premier League and Champions League contribute to their high rating, while Arsenal’s improvement and consistency in the top tier are also noteworthy.

In the upper mid-tier, we find Manchester United, Tottenham, Chelsea, Newcastle, and Aston Villa, all clustered around the 1800 ELO rating mark. Manchester United and Chelsea have had fluctuating performances but remain strong contenders. Tottenham and Newcastle’s ratings reflect their competitive performances, and Aston Villa’s very strong performance in last season is also noteworthy. Slightly lower but still in the upper mid-tier, Brighton’s rating indicates their solid performances and ability to compete effectively against stronger teams.

The mid-tier teams, such as West Ham, Crystal Palace, Everton, Wolves, and Brentford, with ELO ratings around 1600–1700, represent the core mid-table clubs. These teams have shown the capability to challenge higher-ranked teams but lack the consistency to regularly break into the upper tiers. West Ham’s strong showings in European competitions also boost their rating.

In the lower mid-tier, we have Fulham, Leicester City, Bournemouth, and Nottingham Forest, positioned between 1400–1600 ELO ratings. These teams have struggled for consistency. Leicester City, despite their past Premier League triumph, has seen a dip in form, while Fulham, Bournemouth, and Nottingham Forest have had to battle for survival in the league. Leeds United, although having a rich history, has shown mixed performances recently, reflected in their lower rating.

At the bottom of the chart are Burnley, Southampton, Luton, Sheffield United, Norwich, Watford, and West Brom, with the lowest ELO ratings around 1200–1400. These teams have frequent struggles in the league, battles against relegation, and less competitive performances, justifying their lower ratings. Teams like Norwich and West Brom have yo-yoed between the Premier League and Championship, impacting their stability and rating.

The significant gap between Manchester City and the bottom-tier teams like West Brom and Norwich highlights the disparity in performance and quality within the league. While the top teams consistently perform well in both domestic and European competitions, the bottom teams often fight to avoid relegation. The mid-tier teams show a lot of fluidity, with frequent movements between the mid and lower mid-tiers, indicating the competitive and unpredictable nature of the Premier League.

Transfer fees and Transfer Value

The scatter plot integrates ELO ratings with net spend since the 2020/21 season, providing a fascinating insight into how financial investments correlate with team performances. The trend line indicates a positive relationship between net spend and ELO rating, suggesting that higher financial investment generally correlates with better team performance. However, there are notable exceptions that reveal the nuances of value for money in football.

At the top right, Chelsea stands out with a significant net spend yet a relatively lower ELO rating compared to Manchester City, Liverpool, and Arsenal. This suggests that Chelsea has not seen proportional returns on their investment, indicating lower relative value for money. In contrast, Manchester City, despite their high net spend, justifies the expenditure with the highest ELO rating, demonstrating efficient use of their resources.

Arsenal and Liverpool also show strong performance relative to their spend, positioning them as teams that have achieved good value for money. Tottenham and Newcastle, with moderate net spends, have achieved commendable ELO ratings, further indicating efficient resource utilization.

Teams like Brighton, Brentford, and West Ham are notable for their lower net spends coupled with respectable ELO ratings. Brighton and Brentford, in particular, showcase exceptional value for money, achieving competitive performance levels without significant financial outlay.

On the other hand, Everton and Wolves have net spends that are not fully reflected in their ELO ratings, indicating lower relative value for money. Leicester City, with a slight negative net spend, has managed to maintain a decent ELO rating, highlighting their efficiency in using resources.

At the lower end of the spectrum, teams like Watford, West Brom, and Sheffield United show low ELO ratings alongside negative or minimal net spends. While this indicates a struggle in performance, their financial prudence is notable.

Leeds United, Nottingham Forest, and Fulham show moderate net spends but have lower ELO ratings, suggesting that their financial investments have not yet translated into higher performance levels. Conversely, Southampton and Bournemouth, despite low net spends, have maintained mid-tier ELO ratings, demonstrating effective resource management.

Overall, while higher spending generally correlates with better performance, teams like Brighton, Brentford, and West Ham highlight how strategic spending and efficient management can yield competitive results without breaking the bank. Conversely, Chelsea and Everton exemplify how high expenditure does not always guarantee proportional success, emphasizing the importance of strategic resource utilisation in achieving value for money.

Champions League

The scatter plot provides a comprehensive comparison of the ELO ratings for Champions League teams since the 2020/21 season. The ELO rating system offers a valuable perspective on team performance and relative strength, taking into account the results of matches and the strength of opponents. This analysis reveals a clear stratification among the top European clubs, highlighting the dominance of certain teams and the competitive balance in different tiers.

At the very top, Manchester City, Real Madrid, and Bayern Munich stand out with the highest ELO ratings, all above 1700. These clubs have consistently performed at the highest level in domestic and European competitions, reflecting their strong squad depth, tactical acumen, and financial muscle. Their ELO ratings underline their status as the elite of European football.

Following this top tier, there is a cluster of teams including Chelsea, Liverpool, Borussia Dortmund, and Internazionale, with ELO ratings around 1400–1500. These teams have also shown strong performances, particularly in European competitions, but they are a notch below the very top. Chelsea and Liverpool, despite being highly competitive, have slightly lower ELO ratings compared to their Premier League counterparts, Manchester City. Borussia Dortmund and Internazionale have maintained their positions as top clubs in their respective leagues, and their ELO ratings reflect consistent performances in the Champions League.

import pandas as pd
import csv

import pandas as pd

try:
df = pd.read_csv('/Users/user/Downloads/filename.csv', encoding='utf-8')
except UnicodeDecodeError:
try:
df = pd.read_csv('/Users/user/Downloads/filename.csv', encoding='latin1')
except UnicodeDecodeError:
df = pd.read_csv('/Users/user/Downloads/filename.csv', encoding='iso-8859-1')

# Display the DataFrame
print(df.head())

# DataFrame
Match Number Round Number Date Location Home Team \
0 39 3 44138.83 Arena Herning Midtjylland
1 63 4 44160.83 Johan Cruijff ArenA Ajax
2 15 1 44454.83 Est‡dio JosŽ Alvalade Sporting CP
3 17 2 44467.74 Johan Cruijff ArenA Ajax
4 38 3 44488.83 Johan Cruijff ArenA Ajax

Away Team Group team1score team2score Winner id
0 Ajax Group D 1 2 Ajax 283
1 Midtjylland Group D 3 1 Ajax 307
2 Ajax Group C 1 5 Ajax 384
3 Besiktas Group C 2 0 Ajax 386
4 Dortmund Group C 4 0 Ajax 407

df.columns

# df.columns

Index(['Match Number', 'Round Number', 'Date', 'Location', 'Home Team',
'Away Team', 'Group', 'team1score', 'team2score', 'Winner', 'id'],
dtype='object')

unique_teams = pd.concat([df['Home Team'].str.strip(), df['Away Team'].str.strip()]).unique()

# Print unique teams
print(unique_teams)

import pandas as pd

# Sample data
teams = ['Midtjylland' 'Ajax' 'Sporting CP' 'Dortmund' 'Besiktas' 'Rangers'

'Antwerp' 'Arsenal' 'Sevilla' 'Liverpool' 'Atalanta' 'Atl\x8etico'
'Salzburg' 'Milan' 'Porto' 'Man United' 'Feyenoord' 'Barcelona'
'Juventus' 'Dynamo Kyiv' 'Ferencv\x87ros' 'Plzen' 'FC Porto' 'Paris'
'Bayern' 'Lokomotiv Moskva' 'Lazio' 'Benfica' 'Internazionale'
'Copenhagen' 'Galatasaray' 'M. Haifa' 'Club Brugge' 'Union Berlin'
'Celtic' 'Krasnodar' 'Chelsea' 'Rennes' 'Man City' 'Malm\x9a' 'LOSC'
'Real Madrid' 'Zenit' 'Leipzig' 'Dinamo Zagreb' 'Newcastle'
'Shakhtar Donetsk' 'M\x9anchengladbach' 'Villarreal' 'Wolfsburg'
'Young Boys' 'Frankfurt' 'Tottenham' 'Leverkusen' 'Napoli'
'Real Sociedad' 'PSV' 'Crvena zvezda' 'Lens' 'Braga' 'Marseille'
'Sheriff' 'Istanbul Basaksehir' 'Olympiacos']

# Convert list to DataFrame
df = pd.DataFrame(teams, columns=['Team'])


# Replace values in the DataFrame
df['Team'] = df['Team'].replace(teams)

# Remove exact duplicates after mapping
df = df.drop_duplicates()

# Reset index after dropping duplicates
df = df.reset_index(drop=True)

# Check the result
print(df)

unique_countries_team1 = merged_df['Home Team'].str.strip().unique()

merged_df_count = {}
for team in unique_teams:
merged_df_count[team] = ((merged_df['Home Team'].str.strip() == team) | (merged_df['Away Team'].str.strip() == team)).sum()

merged_df.columns

merged_df_count_df = pd.DataFrame(list(merged_df_count.items()), columns=['Team', 'MatchesPlayed'])

merged_df_count_df = merged_df_count_df.sort_values(by='MatchesPlayed', ascending=False)

print(merged_df_count_df)

Team MatchesPlayed
41 Real Madrid 50
38 Man City 48
24 Bayern 42
23 Paris 40
3 Dortmund 37
.. ... ...
16 Feyenoord 6
6 Antwerp 6
5 Rangers 6
4 Besiktas 6
63 Olympiacos 6

[64 rows x 2 columns]

unique_teams = pd.concat([merged_df['Home Team'], merged_df['Away Team']]).str.strip().unique()
elo_ratings = {team: 1000 for team in unique_teams}

# Set K-factor
k_factor = 10



def calculate_elo_rating(team1_rating, team2_rating, outcome, k_factor):
expected1 = 1 / (1 + 10 ** ((team2_rating - team1_rating) / 400))
elo_change = k_factor * (outcome - expected1)
return expected1, elo_change

def update_elo_ratings(merged_df, elo_ratings, k_factor):
for index, match in merged_df.iterrows():
team1 = str(match['Home Team']).strip()
team2 = str(match['Away Team']).strip()

winner = match['Winner'] if pd.notna(match['Winner']) else ''

if winner.strip() == team1:
outcome_team1 = 2 # Win for team1
outcome_team2 = 0 # Loss for team2
elif winner.strip() == team2:
outcome_team1 = 0 # Loss for team1
outcome_team2 = 2 # Win for team2
else:
outcome_team1 = 1 # Draw
outcome_team2 = 1 # Draw

# Get current Elo ratings
team1_rating = elo_ratings.get(team1, 1000)
team2_rating = elo_ratings.get(team2, 1000)

# Calculate Elo changes and expected outcomes
expected1, elo_change1 = calculate_elo_rating(team1_rating, team2_rating, outcome_team1, k_factor)
expected2, elo_change2 = calculate_elo_rating(team2_rating, team1_rating, outcome_team2, k_factor)

# Update Elo ratings in the dictionary
elo_ratings[team1] += elo_change1
elo_ratings[team2] += elo_change2

# Also update the Elo ratings and expected outcomes in the DataFrame
merged_df.at[index, 'team1_rating'] = team1_rating
merged_df.at[index, 'team2_rating'] = team2_rating
merged_df.at[index, 'team1_newrating'] = elo_ratings[team1]
merged_df.at[index, 'team2_newrating'] = elo_ratings[team2]
merged_df.at[index, 'team1_expected'] = expected1
merged_df.at[index, 'team2_expected'] = expected2
merged_df.at[index, 'outcometeam1'] = outcome_team1
merged_df.at[index, 'outcometeam2'] = outcome_team2

return elo_ratings

# Extract unique teams
unique_teams = pd.concat([merged_df['Home Team'], merged_df['Away Team']]).astype(str).str.strip().unique()

# Initialize Elo ratings dictionary
elo_ratings = {team: 1000 for team in unique_teams}

# Set K-factor
k_factor = 10

# Initialize Elo ratings columns in the matches DataFrame
merged_df['team1_rating'] = merged_df['Home Team'].astype(str).map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
merged_df['team2_rating'] = merged_df['Away Team'].astype(str).map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
merged_df['team1_newrating'] = None
merged_df['team2_newrating'] = None
merged_df['team1_expected'] = None
merged_df['team2_expected'] = None
merged_df['outcometeam1'] = None
merged_df['outcometeam2'] = None

# Update Elo ratings based on matches data
elo_ratings = update_elo_ratings(merged_df, elo_ratings, k_factor)

# Display updated Elo ratings
for team, rating in elo_ratings.items():
print(f"{team}: {rating}")

# Print the updated DataFrame
print(merged_df[['id', 'Home Team', 'Away Team', 'team1_rating', 'team2_rating', 'team1_newrating', 'team2_newrating', 'team1_expected', 'team2_expected', 'outcometeam1', 'outcometeam2']])

import pandas as pd

# Dictionary of Elo ratings
elo_ratings = {
'Zenit' : 1014.4918377850844,
'Dynamo Kyiv' : 1001.903913611602,
'Rennes' : 982.506237387442,
'Chelsea' : 1459.7517053601473,
'Lazio' : 1158.072833226442,
'Barcelona' : 1320.753399792884,
'Paris' : 1412.2993759445933,
'Leipzig' : 1318.2214905428166,
'Salzburg' : 1136.9128125930872,
'Real Madrid' : 1736.7622315300077,
'Bayern' : 1658.7214847524956,
'Internazionale' : 1398.4888997746586,
'Olympiacos' : 1001.4068144292776,
'Man City' : 1767.312496709755,
'Ajax' : 1209.6900816339257,
'Midtjylland' : 993.611635079379,
'Lokomotiv Moskva' : 1001.9266194987879,
'Shakhtar Donetsk' : 1139.9268298945867,
'Atlético' : 1294.4030654183753,
'Mönchengladbach' : 1041.569620930675,
'Porto' : 1237.9398471730872,
'Marseille' : 1036.293489284177,
'Atalanta' : 1095.7687451905144,
'Liverpool' : 1453.4028821610639,
'Krasnodar' : 1022.506237387442,
'Istanbul Basaksehir' : 1002.5110820768597,
'Sevilla' : 1126.3516338316717,
'Club Brugge' : 1139.9245719157555,
'Dortmund' : 1418.7704482715144,
'Ferencváros' : 982.4955180073781,
'Juventus' : 1237.4391570408134,
'Man United' : 1145.348602543533,
'Young Boys' : 1048.5365714967872,
'Villarreal' : 1138.7614957825642,
'LOSC' : 1079.1216608720888,
'Malmö' : 991.2228867123092,
'Besiktas' : 977.4129186138842,
'Sheriff' : 1044.856893892438,
'Sporting CP' : 1105.0108111552483,
'Milan' : 1207.0742819604757,
'Benfica' : 1262.5441564901673,
'Wolfsburg' : 1024.1028098455306,
'Dinamo Zagreb' : 1020.9774046339113,
'Celtic' : 1021.4638286880178,
'Frankfurt' : 1064.3455538932083,
'Napoli' : 1244.3002859888456,
'Tottenham' : 1082.5705906963021,
'Plzen' : 986.9953022959236,
'Leverkusen' : 1030.3327493548486,
'Rangers' : 985.223331274488,
'Copenhagen' : 1078.3971001498862,
'M. Haifa' : 1016.2160343047493,
'Feyenoord' : 1034.8378513550604,
'Galatasaray' : 1032.9384998887665,
'Arsenal' : 1131.069791736582,
'Braga' : 1023.8237896625668,
'Real Sociedad' : 1101.0321863325937,
'Union Berlin' : 1005.0625596937681,
'Lens' : 1053.4711174511485,
'PSV' : 1071.2976713061082,
'Antwerp' : 1010.2466921944103,
'Newcastle' : 1037.2805346142013,
'Crvena zvezda' : 996.3967803963809,
'FC Porto' : 1115.590256486907
}

# Convert dictionary to DataFrame
elo_df = pd.DataFrame(list(elo_ratings.items()), columns=['Team', 'Elo Rating'])

# Save DataFrame to CSV
output_file = 'clelo_ratings.csv'
elo_df.to_csv(output_file, index=False)

elo_df = elo_df.sort_values(by='Elo Rating', ascending=False)

# Print DataFrame to ensure correctness
print(elo_df)

Team Elo Rating
13 Man City 1767.312497
9 Real Madrid 1736.762232
10 Bayern 1658.721485
3 Chelsea 1459.751705
23 Liverpool 1453.402882
.. ... ...
47 Plzen 986.995302
49 Rangers 985.223331
2 Rennes 982.506237
29 Ferencváros 982.495518
36 Besiktas 977.412919

[64 rows x 2 columns]

import matplotlib.pyplot as plt
import seaborn as sns
plt.figure(figsize=(5, 10))
sns.barplot(x='Elo Rating', y='Team', data=elo_df, palette='viridis')

# Customize the plot
plt.title('Elo Ratings of Euro Teams')
plt.xlabel('Elo Rating')
plt.ylabel('Team')

# Display the plot
plt.tight_layout()
plt.show()

elo_df.to_csv('clelo3.csv')

clelo1 = pd.read_csv("/Users/user/Downloads/filename.csv", index_col=0)

import matplotlib.pyplot as plt
import seaborn as sns

# Assuming clelo1 DataFrame is already defined

# Define the palette
palette_name = "Set2"

# Create the plot
plt.figure(figsize=(20, 20)) # Increase figure size

# Create the barplot with adjusted bar width
barplot = sns.barplot(x='Team', y='Elo Rating', data=clelo1, palette=palette_name)

# Customize the plot
plt.xlabel('Team', fontsize=20) # Increase font size for x-label
plt.ylabel('Elo Rating', fontsize=20) # Increase font size for y-label
plt.xticks(rotation=90, ha='center', fontsize=15) # Rotate x-tick labels and increase font size
plt.yticks(fontsize=15) # Increase font size for y-ticks
plt.title('Elo Ratings of Teams', fontsize=25) # Increase font size for title

# Add values on top of bars
for p in barplot.patches:
barplot.annotate(format(p.get_height(), '.1f'),
(p.get_x() + p.get_width() / 2., p.get_height()),
ha = 'center', va = 'center',
xytext = (0, 9),
textcoords = 'offset points',
fontsize=12) # Font size for the annotation

# Adjust layout
plt.tight_layout()

# Show the plot
plt.show()

import matplotlib.pyplot as plt
import pandas as pd

clelo1_sorted = clelo1.sort_values(by='Elo Rating', ascending=False).reset_index(drop=True)

plt.figure(figsize=(10, 8))
plt.scatter(clelo1_sorted.index, clelo1_sorted['Elo Rating'], color='blue', alpha=0.5)

for idx, row in clelo1_sorted.iterrows():
plt.annotate(row['Team'], (idx, row['Elo Rating']), textcoords="offset points", xytext=(0, 10), ha='center')

plt.xlabel('Index')
plt.ylabel('Elo Rating')
plt.title('Scatter Plot of Elo Ratings with Team Labels since 2021 (Sorted)')
plt.grid(True)
plt.tight_layout()
plt.show()

Further down the list, teams like Barcelona, Paris Saint-Germain (PSG), Atletico Madrid, RB Leipzig, and Benfica exhibit ELO ratings in the range of 1200–1300. These clubs are strong contenders in their domestic leagues and often progress to the knockout stages of the Champions League. Barcelona and PSG, in particular, have a rich history in European football but have experienced fluctuations in performance, which is reflected in their ELO ratings.

The middle tier consists of clubs like Juventus, Napoli, Ajax, and AC Milan, with ELO ratings around 1100–1200. These teams have rich histories and have enjoyed success domestically and in Europe, but their recent performances have been more inconsistent. This inconsistency is mirrored in their ELO ratings, which are lower than the top and upper-middle tiers.

At the lower end of the spectrum, clubs such as Lazio, Shakhtar Donetsk, and Porto have ELO ratings around 1000–1100. While these teams are often competitive in their domestic leagues and qualify for the Champions League, they typically struggle to advance deep into the tournament. Their ELO ratings reflect their status as competitive but not among the top echelon of European clubs.

Overall, the ELO ratings since the 2020/21 season highlight the hierarchical nature of European football. The top clubs maintain their dominance through consistent high-level performances, while there is a noticeable drop-off as we move down the rankings. This stratification underscores the importance of financial resources, strategic management, and sustained success in maintaining high ELO ratings. The analysis also emphasizes the competitive nature of the Champions League, where even mid-tier clubs can challenge the elite on their day, adding to the excitement and unpredictability of the tournament.

European Championship

To analyse and interpret the ELO Ratings for teams that have participated in the European Championships, we’ve focused on data from the past three completed tournaments (EURO 2012, 2016, and 2020). This approach ensures that the ratings are not overly influenced by older tournaments, maintaining relevance and accuracy. Most players typically participate in at least three to five international tournaments, and since each team plays a maximum of seven games per tournament, including only EURO 2020 would have resulted in high variance and inaccurate results.

import os
import json
import pandas as pd
import pydash
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
import csv
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt # plotting
import seaborn as sns

euros = pd.read_csv("/Users/user/Downloads/filename.csv", index_col=0)

unique_teams = pd.concat([euros['Team1'].str.strip(), euros['Team2'].str.strip()]).unique()

# Print unique teams
print(unique_teams)

unique_countries_team1 = euros['Team1'].str.strip().unique()

euros_count = {}
for team in unique_teams:
euros_count[team] = ((euros['Team1'].str.strip() == team) | (euros['Team2'].str.strip() == team)).sum()

euros_count_df = pd.DataFrame(list(euros_count.items()), columns=['Team', 'MatchesPlayed'])

euros_count_df = euros_count_df.sort_values(by='MatchesPlayed', ascending=False)

print(euros_count_df)

unique_teams = pd.concat([euros['Team1'], euros['Team2']]).str.strip().unique()
elo_ratings = {team: 1000 for team in unique_teams}

# Set K-factor
k_factor = 10

def calculate_elo_rating(team1_rating, team2_rating, outcome, k_factor):
expected1 = 1 / (1 + 10 ** ((team2_rating - team1_rating) / 400))
elo_change = k_factor * (outcome - expected1)
return expected1, elo_change

def update_elo_ratings(euros, elo_ratings, k_factor):
for index, match in euros.iterrows():
team1 = match['Team1'].strip()
team2 = match['Team2'].strip()

winner = match['Winner'] if pd.notna(match['Winner']) else ''

if winner.strip() == team1:
outcome_team1 = 2 # Win for team1
outcome_team2 = 0 # Loss for team2
elif winner.strip() == team2:
outcome_team1 = 0 # Loss for team1
outcome_team2 = 2 # Win for team2
else:
outcome_team1 = 1 # Draw
outcome_team2 = 1 # Draw

# Get current Elo ratings
team1_rating = elo_ratings.get(team1, 1000)
team2_rating = elo_ratings.get(team2, 1000)

# Calculate Elo changes and expected outcomes
expected1, elo_change1 = calculate_elo_rating(team1_rating, team2_rating, outcome_team1, k_factor)
expected2, elo_change2 = calculate_elo_rating(team2_rating, team1_rating, outcome_team2, k_factor)

# Update Elo ratings in the dictionary
elo_ratings[team1] += elo_change1
elo_ratings[team2] += elo_change2

# Also update the Elo ratings and expected outcomes in the DataFrame
euros.at[index, 'team1_rating'] = team1_rating
euros.at[index, 'team2_rating'] = team2_rating
euros.at[index, 'team1_newrating'] = elo_ratings[team1]
euros.at[index, 'team2_newrating'] = elo_ratings[team2]
euros.at[index, 'team1_expected'] = expected1
euros.at[index, 'team2_expected'] = expected2
euros.at[index, 'outcometeam1'] = outcome_team1
euros.at[index, 'outcometeam2'] = outcome_team2

return elo_ratings

# Extract unique teams
unique_teams = pd.concat([euros['Team1'], euros['Team2']]).str.strip().unique()

# Initialize Elo ratings dictionary
elo_ratings = {team: 1000 for team in unique_teams}

# Set K-factor
k_factor = 10

# Initialize Elo ratings columns in the matches DataFrame
euros['team1_rating'] = euros['Team1'].map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
euros['team2_rating'] = euros['Team2'].map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
euros['team1_newrating'] = None
euros['team2_newrating'] = None
euros['team1_expected'] = None
euros['team2_expected'] = None
euros['outcometeam1'] = None
euros['outcometeam2'] = None

# Update Elo ratings based on euros data
elo_ratings = update_elo_ratings(euros, elo_ratings, k_factor)

# Display updated Elo ratings
for team, rating in elo_ratings.items():
print(f"{team}: {rating}")

# Print the updated DataFrame
print(euros[['id', 'Team1', 'Team2', 'team1_rating', 'team2_rating', 'team1_newrating', 'team2_newrating', 'team1_expected', 'team2_expected', 'outcometeam1', 'outcometeam2']])

import matplotlib.pyplot as plt
import seaborn as sns

eloeuro_ratings = {
'Poland' : 1040.863963518912,
'Russia' : 1014.6786830397292,
'Germany' : 1153.1994148096076,
'Netherlands' : 1056.237059655154,
'Republic of Ireland' : 997.574373615579,
'Spain' : 1187.0628815344617,
'France' : 1147.8395590321024,
'Ukraine' : 1024.4774740961034,
'Greece' : 1010.5640314961036,
'Denmark' : 1049.6623364895897,
'Italy' : 1167.939944444686,
'Sweden' : 1032.2901088838305,
'Portugal' : 1148.666416748963,
'Croatia' : 1062.5169964246645,
'Czech Republic' : 1035.7789871638402,
'England' : 1171.8509933980918,
'Albania' : 1005.4667937782659,
'Wales' : 1059.3230832197703,
'Turkey' : 1036.7634203461234,
'Belgium' : 1097.103967817304,
'Austria' : 1038.8798260237281,
'Romania' : 1007.2596394078785,
'Iceland' : 1036.4234580714356,
'Slovakia' : 1030.2934936773247,
'Northern Ireland' : 1001.3180433119314,
'Hungary' : 1035.4438779999807,
'Switzerland' : 1098.4212782771292,
'Scotland' : 994.9174969518759,
'Finland' : 1005.7022570881337,
'Slovenia' : 1014.0318584143133,
'Serbia' : 1007.6978780971217,
'Georgia' : 1014.56487025129,
'North Macedonia' : 985.1855329149744
}

lo_df = pd.DataFrame(list(eloeuro_ratings.items()), columns=['Team', 'Elo Rating'])
output_file = 'euroelo4.csv'
euros.to_csv(output_file, index=False)

lo_df = lo_df.sort_values(by='Elo Rating', ascending=False)

# Print DataFrame to ensure correctness
print(lo_df)

plt.figure(figsize=(12, 8))
sns.barplot(x='Elo Rating', y='Team', data=lo_df, palette='viridis')

# Customize the plot
plt.title('Elo Ratings of Euro Teams')
plt.xlabel('Elo Rating')
plt.ylabel('Team')

# Display the plot
plt.tight_layout()
plt.show()

lo_df.to_csv('filename.csv')

euroelo = pd.read_csv("/Users/user/Downloads/filename.csv", index_col=0)

palette_name = "Set2"

plt.figure(figsize=(12, 8))
sns.barplot(x='Team', y='Elo Rating', data=euroelo, palette=palette_name)
plt.xlabel('Team')
plt.ylabel('Elo Rating')
plt.xticks(rotation=45, ha='right')
plt.title('Elo Ratings of Teams')
plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt

euroelo_sorted = euroelo.sort_values(by='Elo Rating', ascending=False).reset_index(drop=True)

plt.figure(figsize=(10, 8))
plt.scatter(euroelo_sorted.index, euroelo_sorted['Elo Rating'], color='blue', alpha=0.5)

for idx, row in euroelo_sorted.iterrows():
plt.annotate(row['Team'], (idx, row['Elo Rating']), textcoords="offset points", xytext=(0, 10), ha='center')

plt.xlabel('Index')
plt.ylabel('Elo Rating')
plt.title('Scatter Plot of Elo Ratings with Team Labels since 2016 (Sorted)')
plt.grid(True)
plt.tight_layout()
plt.show()

One might suggest including all EURO qualification and UEFA Nations League results to address the small sample size. However, the importance of these matches is significantly different from the European Championship games. If we used the same k-factor for these games as for the tournament matches, it would disproportionately favor stronger teams. This is because top teams often dominate their qualification groups, leading to an inflated ELO rating that does not accurately reflect their tournament performance. Additionally, the variability in team line-ups during international breaks, due to form fluctuations and injuries, makes it difficult to gauge a team’s true quality based solely on qualification and Nations League matches.

The Italian football team exemplifies these discrepancies. Despite winning EURO 2020, reaching the quarter-finals in 2016, and the final in 2021, Italy failed to qualify for the last two World Cups. This inconsistency underscores the unique challenges of international football. While there are more qualification slots for the European Championships than the World Cup, some teams, like Wales, have managed to qualify for the World Cup but not the European Championships. Such discrepancies highlight that qualification alone cannot explain a team’s performance, as seen with Italy’s failure to qualify for the 2022 FIFA World Cup months after winning EURO 2020.

In international football, managers have less time with their players, resulting in fewer opportunities to win tournaments. This high-stakes environment means that even small errors can have significant consequences. Unlike the Champions League, where teams have six games to qualify for the knockouts, international teams only have three group stage games. Additionally, knockout games in international tournaments are single-legged, making each match akin to a decisive Game 7 in American sports. This setup contributes to the greater variance and unpredictability seen in international football.

This unpredictability is evident in the scatterplot of ELO Ratings. Four distinct tiers of teams emerge. The top tier consists of regular contenders and favourites for the European Championships. The second tier includes Belgium and Switzerland, teams that consistently reach the knockouts but do not perform at the highest levels. The third tier comprises inconsistent teams like Croatia and the Netherlands, who have done well in recent World Cups but not in the EUROs. The bottom tier includes teams like Russia and North Macedonia, who have made few appearances since 2012, with Scotland consistently underperforming.

The gap in quality between top teams and others explains why teams adopt a risk-averse, defensive approach in international tournaments. This strategy minimizes the opponent’s chances and forces riskier play as the game progresses, increasing the underdog’s chances of winning. However, once a match reaches penalties, it becomes a complete lottery, with each miss having significant consequences.

Quarter Finals:

France v Portugal

Germany vs Spain

England v Switzerland

Netherlands v Turkey

Semi-Finals:

Portugal v Germany

England v Netherlands

Finals:

Germany v England

Winner:

England

England, despite a shaky start to the EUROs, are still quarter-finalists, benefiting from favorable matchups. The model accounts for their strong past performances and suggests that, despite not playing the best football, they could still win the tournament. While the ELO Rating model does not consider qualitative variables like margin of victory, it highlights that success in international football often comes down to winning rather than how the win is achieved.

However, one could argue that given the results-oriented nature of international football, England should still be considered the favourites. Despite being marginal favourites to win the tournament, having never won the EUROs, England currently boasts the most valuable squad in the world. When plotting the relationship between the total market value of the players in the national team and its ELO Rating, it becomes evident that England underachieves relative to its talent. Despite having the highest market value compared to any other country, England’s ELO Rating is lower than that of countries such as Italy and Spain. Additionally, the gap between England, France, and Portugal in terms of ELO Ratings is smaller than the gap between these nations in terms of transfer market value. Thus, although England has not won much as a nation, there is no better time for them to win an international tournament.

The inconsistent nature of major international tournaments means that a few select teams consistently perform well, while the rest of the outcomes are largely up to chance. This inconsistency is visualized in the scatterplot, showing a significant drop-off in quality after the top teams, with France and Switzerland marking a notable threshold. Beyond this point, teams become quite similar in terms of real quality based on recent EURO performances. This further highlights the slim margins that define success in international football, emphasising the importance of seizing the moment when conditions are favorable, as they currently are for England.

FIFA World Cup

For extracting the ELO ratings for nations that have participated in the World Cup, we’ve focused on data from the past four World Cup tournaments. The ELO Rating plot reveals that the gap between the better and worse teams is even smaller compared to the EUROs. Argentina ranks as the best ELO-rated team, demonstrating the high quality of teams that make it to the World Cup, given the limited slots for associations and the overall stronger competition. Although Europe continues to be the strongest continent in terms of overall performance and football infrastructure, the Asian and North American teams that qualify for the World Cup are similar in quality to the European teams that miss out.

import os
import json
import pandas as pd
import pydash
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
import csv
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt # plotting
import seaborn as sns

wc = pd.read_csv("/Users/user/Downloads/wcresults.csv", index_col=0)

unique_teams = pd.concat([wc['home_team'].str.strip(), wc['away_team'].str.strip()]).unique()

# Print unique teams
print(unique_teams)

unique_countries_team1 = wc['home_team'].str.strip().unique()

wc_count = {}
for team in unique_teams:
wc_count[team] = ((wc['home_team'].str.strip() == team) | (wc['away_team'].str.strip() == team)).sum()

wc_df = pd.DataFrame(list(wc_count.items()), columns=['Team', 'MatchesPlayed'])

wc_df = wc_df.sort_values(by='MatchesPlayed', ascending=False)

print(wc_df)

unique_teams = pd.concat([wc['home_team'], wc['away_team']]).str.strip().unique()
elo_ratings = {team: 1000 for team in unique_teams}

# Set K-factor
k_factor = 10


def calculate_elo_rating(team1_rating, team2_rating, outcome, k_factor):
expected1 = 1 / (1 + 10 ** ((team2_rating - team1_rating) / 400))
elo_change = k_factor * (outcome - expected1)
return expected1, elo_change

def update_elo_ratings(wc, elo_ratings, k_factor):
for index, match in wc.iterrows():
team1 = str(match['home_team']).strip()
team2 = str(match['away_team']).strip()

winner = match['Winner'] if pd.notna(match['Winner']) else ''

if winner.strip() == team1:
outcome_team1 = 2 # Win for team1
outcome_team2 = 0 # Loss for team2
elif winner.strip() == team2:
outcome_team1 = 0 # Loss for team1
outcome_team2 = 2 # Win for team2
else:
outcome_team1 = 1 # Draw
outcome_team2 = 1 # Draw

# Get current Elo ratings
team1_rating = elo_ratings.get(team1, 1000)
team2_rating = elo_ratings.get(team2, 1000)

# Calculate Elo changes and expected outcomes
expected1, elo_change1 = calculate_elo_rating(team1_rating, team2_rating, outcome_team1, k_factor)
expected2, elo_change2 = calculate_elo_rating(team2_rating, team1_rating, outcome_team2, k_factor)

# Update Elo ratings in the dictionary
elo_ratings[team1] += elo_change1
elo_ratings[team2] += elo_change2

# Also update the Elo ratings and expected outcomes in the DataFrame
wc.at[index, 'team1_rating'] = team1_rating
wc.at[index, 'team2_rating'] = team2_rating
wc.at[index, 'team1_newrating'] = elo_ratings[team1]
wc.at[index, 'team2_newrating'] = elo_ratings[team2]
wc.at[index, 'team1_expected'] = expected1
wc.at[index, 'team2_expected'] = expected2
wc.at[index, 'outcometeam1'] = outcome_team1
wc.at[index, 'outcometeam2'] = outcome_team2

return elo_ratings

# Extract unique teams
unique_teams = pd.concat([wc['home_team'], wc['away_team']]).astype(str).str.strip().unique()

# Initialize Elo ratings dictionary
elo_ratings = {team: 1000 for team in unique_teams}

# Set K-factor
k_factor = 10

# Initialize Elo ratings columns in the matches DataFrame
wc['team1_rating'] = wc['home_team'].astype(str).map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
wc['team2_rating'] = wc['home_team'].astype(str).map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
wc['team1_newrating'] = None
wc['team2_newrating'] = None
wc['team1_expected'] = None
wc['team2_expected'] = None
wc['outcometeam1'] = None
wc['outcometeam2'] = None

# Update Elo ratings based on matches data
elo_ratings = update_elo_ratings(wc, elo_ratings, k_factor)

# Display updated Elo ratings
for team, rating in elo_ratings.items():
print(f"{team}: {rating}")

# Print the updated DataFrame
print(wc[['id', 'home_team', 'away_team', 'team1_rating', 'team2_rating', 'team1_newrating', 'team2_newrating', 'team1_expected', 'team2_expected', 'outcometeam1', 'outcometeam2']])

import matplotlib.pyplot as plt
import seaborn as sns

elowcratings = {
'South Africa' : 1015.0,
'Uruguay' : 1133.9032589521748,
'Argentina' : 1217.6185616154978,
'England' : 1104.4013061327307,
'South Korea' : 1027.0516093881943,
'Algeria' : 1006.8630560097776,
'Germany' : 1166.4073757486779,
'Serbia' : 1009.2910478340491,
'Italy' : 1010.7005480263952,
'Japan' : 1049.3696295795032,
'Netherlands' : 1199.1056698278394,
'Brazil' : 1165.134607914747,
'Ivory Coast' : 1019.5768726574483,
'New Zealand' : 1015.14387184166,
'Honduras' : 980.9112960804928,
'Spain' : 1111.3911272573928,
'France' : 1190.4928362088986,
'Greece' : 1015.9158692108933,
'Slovenia' : 1014.7122563166799,
'Cameroon' : 991.6149795728185,
'Ghana' : 1027.5175063269614,
'Slovakia' : 1010.5642932165725,
'Chile' : 1041.7723232691685,
'Portugal' : 1089.745701271368,
'Mexico' : 1076.7152410514354,
'Nigeria' : 1013.1781212844772,
'Australia' : 1022.5611874662267,
'United States' : 1052.178537420228,
'Denmark' : 1021.6898967821948,
'Paraguay' : 1034.8643173177925,
'North Korea' : 985.2877436833201,
'Switzerland' : 1073.6890276508711,
'Colombia' : 1073.4812463699514,
'Iran' : 1018.5271224800897,
'Belgium' : 1146.757301834105,
'Russia' : 1042.2640874326462,
'Croatia' : 1153.5371061780447,
'Costa Rica' : 1049.469560251165,
'Bosnia and Herzegovina' : 1005.9808057145501,
'Ecuador' : 1031.9831404322197,
'Egypt' : 986.3700887114461,
'Morocco' : 1059.6067863561543,
'Peru' : 1006.1104899066186,
'Sweden' : 1038.4092994553514,
'Tunisia' : 1023.4155898021047,
'Poland' : 1020.8254828728243,
'Saudi Arabia' : 1013.656603082485,
'Iceland' : 997.2870124947586,
'Senegal' : 1038.4762952462881,
'Panama' : 986.281842183465,
'Qatar' : 987.9481966765793,
'Wales' : 996.6086626943902,
'Canada' : 988.6336029082757
}

elo_df = pd.DataFrame(list(elowcratings.items()), columns=['Team', 'Elo Rating'])
output_file = 'wcelo.csv'
elo_df.to_csv(output_file, index=False)

elo_df = elo_df.sort_values(by='Elo Rating', ascending=False)

# Print DataFrame to ensure correctness
print(elo_df)

plt.figure(figsize=(12, 8))
sns.barplot(x='Elo Rating', y='Team', data=elo_df, palette='viridis')

# Customize the plot
plt.title('Elo Ratings of World Cup Nations')
plt.xlabel('Elo Rating')
plt.ylabel('Team')

# Display the plot
plt.tight_layout()
plt.show()

elo_df.to_csv('filename.csv')

wcelo = pd.read_csv("/Users/samyukth/user/filename.csv", index_col=0)

palette_name = "Set2"

plt.figure(figsize=(12, 8))
sns.barplot(x='Team', y='Elo Rating', data=wcelo, palette=palette_name)
plt.xlabel('Team')
plt.ylabel('Elo Rating')
plt.xticks(rotation=45, ha='right')
plt.title('Elo Ratings of Teams')
plt.tight_layout()
plt.show()

import matplotlib.pyplot as plt
import pandas as pd

# Assuming wcelo is your DataFrame containing the 'Elo Rating' and 'Team' columns
wcelo_sorted = wcelo.sort_values(by='Elo Rating', ascending=False).reset_index(drop=True)

plt.figure(figsize=(10, 8))
plt.scatter(wcelo_sorted.index, wcelo_sorted['Elo Rating'], color='blue', alpha=0.5)

for idx, row in wcelo_sorted.iterrows():
plt.annotate(row['Team'], (idx, row['Elo Rating']), textcoords="offset points", xytext=(0, 10), ha='center')

plt.xlabel('Index')
plt.ylabel('Elo Rating')
plt.title('Scatter Plot of Elo Ratings with Team Labels (Sorted)')
plt.grid(True)
plt.tight_layout()
plt.show()

Japan, for example, has qualified for all the past four World Cups and made the knockout stages in the past two, ranking as the highest Asian team. However, their ELO rating is comparable to that of Costa Rica, Russia, and Chile — teams that have each had a single quarter-final appearance in the past four World Cups. While Japan has consistently reached the knockout stages more often, Costa Rica and Russia have achieved notable victories against stronger teams. Costa Rica, in the 2014 World Cup, topped a group that included Italy, Uruguay, and England, eventually making it to the quarter-finals. Russia, in the 2018 World Cup, caused a major upset by defeating Spain and progressing to the quarter-finals.

Japan’s journey, despite defeating Spain and Germany in the 2022 World Cup group stages, included a loss to Croatia and a Round of 16 elimination against Belgium in the 2018 World Cup knockouts. This mixed record places them similarly to teams with fewer but significant successes.

The results underscore how tournament outcomes are influenced not just by the overall strength of the teams but by specific match-ups and key moments. For instance, Costa Rica’s and Russia’s quarter-final runs were marked by exceptional performances in high-stakes games, contrasting with Japan’s steadier but less dramatic progress. This demonstrates the variability and unpredictability inherent in World Cup tournaments, where a single game can dramatically alter a team’s rating and perceived quality.

The most important determinant in ELO ratings is consistency, particularly in World Cup performances. For example, the Netherlands’ consistent track record of reaching the finals, semi-finals, and quarter-finals in three of the past four World Cups has secured them a higher rating than France. Despite France’s back-to-back World Cup appearances, their failure to progress beyond the group stage in 2010 significantly impacted their rating. Similarly, this consistency is why Switzerland and Mexico, who have both failed to get past the Round of 16 in the past four editions, are rated higher than one might initially think. Their regular appearances in the knockout stages demonstrate the challenge of consistently reaching that level. Conversely, Italy’s low ELO rating, ranking below teams like New Zealand, South Africa, and Saudi Arabia, reflects their failure to qualify for the knockout stages in recent tournaments.

While the margins are narrow in international football, particularly in the FIFA World Cup, it’s evident that the sport and the quality of national teams are heavily skewed towards Europe. Most of the valuable players in world football come from Europe, especially Western Europe. The developed footballing infrastructure, leagues, and systems in Western Europe place these countries in a prime position to nurture and develop talent. In contrast, other nations, despite similar geographic or population sizes, do not possess the same caliber of players. South American players, especially those from Brazil and Argentina, often move to Europe at a young age to develop their skills and compete in the Champions League. This dynamic benefits non-European sides by helping to develop their players, but it also creates an imbalanced environment that makes it difficult for these countries to achieve sustainable growth as top footballing nations.

For instance, African teams like Senegal and Morocco have performed well in recent World Cups, but most of their players ply their trade in European leagues, often in the top divisions. This reliance on European training and competition highlights the disparity between the footballing infrastructures of different regions. Countries like Iran or Australia continue to lag behind until they either produce a golden generation of talent or have more players consistently playing in Europe.

A detailed analysis of ELO ratings reveals several key insights. The Netherlands’ high rating is a testament to their ability to perform consistently at the highest level, regardless of individual tournament outcomes. France’s rating, while still strong, is tempered by past inconsistencies, demonstrating how a single poor performance can have lasting effects. Switzerland and Mexico’s ratings underscore the difficulty of consistently reaching the knockout stages, highlighting the importance of stability and reliability in international competitions.

Italy’s low rating, despite their success in 2006, illustrates the impact of recent failures to qualify for knockout stages, emphasising how current form and consistency are crucial in ELO calculations. The European dominance in football is further evidenced by the concentration of high-value players in Western Europe, driven by superior infrastructure and competitive leagues. This dominance raises questions about the ability of other regions to develop top-tier talent without similar resources.

In conclusion, ELO ratings provide a comprehensive measure of a team’s performance and consistency in international football. The ratings highlight the importance of regular success in major tournaments and reveal the disparities between different regions’ footballing infrastructures. As the sport continues to evolve, understanding these dynamics is crucial for predicting future trends and identifying potential shifts in the global football landscape.

Augmented World Cup k-factor model

In comparing the old and new Elo ratings for international football teams, the adjustments made in the new model, particularly the incorporation of variable K-factors for different stages of the tournament, reveal nuanced shifts in team evaluations. Argentina stands out with a substantial increase in its Elo rating, rising from 1217.62 to 1378.95. This surge reflects Argentina’s strong performance in knockout stages, where crucial matches are weighted more heavily with higher K-factors. This adjustment highlights Argentina’s ability to deliver under pressure, influencing their overall rating positively.

import os
import json
import pandas as pd
import pydash
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import numpy as np
import csv
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt # plotting
import seaborn as sns
wc = pd.read_csv("/Users/user/Downloads/wcresults.csv", index_col=0)
unique_teams = pd.concat([wc['home_team'].str.strip(), wc['away_team'].str.strip()]).unique()
# Print unique teams
print(unique_teams)
unique_countries_team1 = wc['home_team'].str.strip().unique()
import pandas as pd
cl = pd.read_csv("/Users/samyukth/Downloads/wcresults1.csv", index_col=0)
# Function to calculate Elo rating
def calculate_elo_rating(team1_rating, team2_rating, outcome, k_factor):
expected1 = 1 / (1 + 10 ** ((team2_rating - team1_rating) / 400))
elo_change = k_factor * (outcome - expected1)
return expected1, elo_change
# Function to determine K-factor based on playoff value
def determine_k_factor(playoff):
if playoff == 1:
return 10
elif playoff == 2:
return 20
elif playoff == 3:
return 25
elif playoff == 4:
return 30
elif playoff == 4.5:
return 10
elif playoff == 5:
return 40
else:
return 5 # Default K-factor if playoff value is not in the specified range
# Function to update Elo ratings
def update_elo_ratings(cl, elo_ratings):
for index, match in cl.iterrows():
team1 = str(match['home_team']).strip()
team2 = str(match['away_team']).strip()

winner = match['Winner'] if pd.notna(match['Winner']) else ''

if winner.strip() == team1:
outcome_team1 = 2 # Win for team1
outcome_team2 = 0 # Loss for team2
elif winner.strip() == team2:
outcome_team1 = 0 # Loss for team1
outcome_team2 = 2 # Win for team2
else:
outcome_team1 = 1 # Draw
outcome_team2 = 1 # Draw
# Determine K-factor based on playoff value
k_factor = determine_k_factor(match['playoff'])
# Get current Elo ratings
team1_rating = elo_ratings.get(team1, 1000)
team2_rating = elo_ratings.get(team2, 1000)
# Calculate Elo changes and expected outcomes
expected1, elo_change1 = calculate_elo_rating(team1_rating, team2_rating, outcome_team1, k_factor)
expected2, elo_change2 = calculate_elo_rating(team2_rating, team1_rating, outcome_team2, k_factor)
# Update Elo ratings in the dictionary
elo_ratings[team1] += elo_change1
elo_ratings[team2] += elo_change2
# Also update the Elo ratings and expected outcomes in the DataFrame
cl.at[index, 'team1_rating'] = team1_rating
cl.at[index, 'team2_rating'] = team2_rating
cl.at[index, 'team1_newrating'] = elo_ratings[team1]
cl.at[index, 'team2_newrating'] = elo_ratings[team2]
cl.at[index, 'team1_expected'] = expected1
cl.at[index, 'team2_expected'] = expected2
cl.at[index, 'outcometeam1'] = outcome_team1
cl.at[index, 'outcometeam2'] = outcome_team2

return elo_ratings
# Extract unique teams
unique_teams = pd.concat([cl['home_team'], cl['away_team']]).astype(str).str.strip().unique()
# Initialize Elo ratings dictionary
elo_ratings = {team: 1000 for team in unique_teams}
# Initialize Elo ratings columns in the matches DataFrame
cl['team1_rating'] = cl['home_team'].astype(str).map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
cl['team2_rating'] = cl['away_team'].astype(str).map(lambda x: elo_ratings.get(x.strip(), 1000)).astype('float64')
cl['team1_newrating'] = None
cl['team2_newrating'] = None
cl['team1_expected'] = None
cl['team2_expected'] = None
cl['outcometeam1'] = None
cl['outcometeam2'] = None
# Update Elo ratings based on matches data
elo_ratings = update_elo_ratings(cl, elo_ratings)
# Display updated Elo ratings
for team, rating in elo_ratings.items():
print(f"{team}: {rating}")
# Print the updated DataFrame
print(cl[['id', 'home_team', 'away_team', 'team1_rating', 'team2_rating', 'team1_newrating', 'team2_newrating', 'team1_expected', 'team2_expected', 'outcometeam1', 'outcometeam2']])
# Save the updated DataFrame to a CSV file
output_file = 'wc22.csv'
cl.to_csv(output_file, index=False)
import matplotlib.pyplot as plt
import seaborn as sns
elowc_ratings = {
'South Africa' : 1015.0,
'Uruguay' : 1156.936503395829,
'Argentina' : 1378.9495326385065,
'England' : 1138.52398618582,
'South Korea' : 1022.0141176914905,
'Algeria' : 1003.7387393268539,
'Germany' : 1291.8247131358141,
'Serbia' : 1010.1422624366481,
'Italy' : 1011.0022285299192,
'Japan' : 1041.6107392326703,
'Netherlands' : 1268.3772233497314,
'Brazil' : 1205.7599660065316,
'Ivory Coast' : 1019.5059272413573,
'New Zealand' : 1015.14387184166,
'Honduras' : 980.9112960804928,
'Spain' : 1197.641544614538,
'France' : 1344.2279940014007,
'Greece' : 1011.0690773750082,
'Slovenia' : 1014.7122563166799,
'Cameroon' : 991.9220733727933,
'Ghana' : 1035.331041764243,
'Slovakia' : 1005.984714591485,
'Chile' : 1034.9708544619973,
'Portugal' : 1090.870164197596,
'Mexico' : 1069.593971814395,
'Nigeria' : 1009.4424543911202,
'Australia' : 1023.6595271052843,
'United States' : 1042.2460888227406,
'Denmark' : 1018.7327918853176,
'Paraguay' : 1042.3753424647714,
'North Korea' : 985.2877436833201,
'Switzerland' : 1061.067831687775,
'Colombia' : 1076.5851785064986,
'Iran' : 1020.0394200211175,
'Belgium' : 1181.4246347017277,
'Russia' : 1054.002337305343,
'Croatia' : 1231.9004813704312,
'Costa Rica' : 1062.6476727528877,
'Bosnia and Herzegovina' : 1006.0728664580525,
'Ecuador' : 1032.7408964670176,
'Egypt' : 986.6334046229742,
'Morocco' : 1100.6599468326554,
'Peru' : 1006.2691919936517,
'Sweden' : 1047.5697211591246,
'Tunisia' : 1024.6585010427302,
'Poland' : 1020.7525330065081,
'Saudi Arabia' : 1014.3772261507378,
'Iceland' : 997.9056644717933,
'Senegal' : 1035.6626302999455,
'Panama' : 986.3364464041181,
'Qatar' : 988.6198653535296,
'Wales' : 996.8262330511899,
'Canada' : 989.7385683841735
}
lo_df = pd.DataFrame(list(elowc_ratings.items()), columns=['Team', 'Elo Rating'])
output_file = 'wcelo4.csv'
wc.to_csv(output_file, index=False)

Similarly, France has seen a notable increase from 1190.49 to 1344.23, underscoring their consistent success across both group and knockout stages. The new model’s recognition of France’s prowess in critical matches amplifies their rating, affirming their status as a formidable competitor. Germany, with its rise from 1166.41 to 1291.82, also benefits significantly from the new rating methodology, which emphasizes performance in high-stakes knockout games. These adjustments reflect Germany’s ability to maintain strong performances throughout tournaments, thereby enhancing their Elo rating under the updated system. Conversely, teams like Italy show minimal change (1010.70 to 1011.00), indicating consistent performance levels across all stages without significant fluctuations. Overall, the introduction of variable K-factors in the new Elo rating model enriches its capacity to capture the intricacies of international football dynamics, providing a more refined assessment of teams’ tournament performances and their corresponding Elo ratings.

Europe’s Dominance

European teams’ dominance in World Cup tournaments can be attributed to several interconnected factors, including robust footballing infrastructure, historical legacy, and strategic development initiatives.

Firstly, European countries often boast well-established footballing infrastructures that support grassroots development, professional leagues, and extensive youth academies. These infrastructures nurture talent from a young age, providing systematic training and development pathways that cultivate skilled players capable of competing at the highest international levels. This structured approach ensures a consistent supply of talented players who are well-prepared for the rigours of elite football competition.

Moreover, European nations benefit from a rich footballing heritage and a deep-rooted cultural affinity for the sport. Football holds a central place in European culture, with passionate fan bases, strong club rivalries, and a long-standing tradition of excellence. This cultural backdrop fosters a competitive environment that motivates players to excel and achieve success on the global stage.

Strategically, European football associations and clubs invest heavily in youth development programs, coaching education, and sports science research. These investments not only enhance player skills but also optimize training methodologies and tactical innovations. The integration of advanced technologies and analytics further refines coaching strategies and player performance assessments, giving European teams a competitive edge in international competitions.

Additionally, the structure of European football leagues, such as the English Premier League, La Liga, Bundesliga, Serie A, and Ligue 1, among others, provides a high level of competition and exposure for players. The intensity and quality of these leagues prepare players mentally and physically for the demands of major tournaments like the World Cup.

Furthermore, the geographic proximity of European nations allows for regular and competitive international fixtures, which facilitate player development and tactical adaptation. Friendly matches and competitive qualifiers provide valuable opportunities for teams to refine their strategies and build cohesion among players.

In summary, European teams’ dominance in World Cups is intricately tied to their well-established footballing infrastructure, cultural affinity for the sport, strategic development initiatives, competitive leagues, and systematic youth development programs. These factors collectively contribute to the consistent success of European nations on the global football stage, showcasing their ability to produce top-tier talent and effectively leverage their footballing resources to achieve international acclaim.

Concluding from the analysis of ELO ratings across major football tournaments like the Champions League, Premier League, European Championships, and FIFA World Cup, several key insights emerge. Firstly, the ratings underscore the hierarchical nature of European football, where clubs and national teams are stratified based on consistent performance and historical success. Teams like Manchester City, Real Madrid, and Bayern Munich maintain their dominance, reflecting deep squads and tactical prowess. Similarly, in European Championships, teams like Italy and the Netherlands demonstrate the importance of sustained performance, while others, like France, show how past inconsistencies can impact current ratings.

Moreover, the analysis highlights the unpredictable nature of football, especially in international competitions where single-game eliminations and strategic matchups play pivotal roles. The slim margins between success and failure underscore the high stakes and variability inherent in tournament football.

Looking forward, ELO ratings provide a valuable tool for understanding the dynamics of football performance, reflecting not just current form but also historical achievements and institutional strengths. As football continues to evolve globally, these ratings offer a lens through which to assess trends, predict outcomes, and gauge the shifting balance of power in the sport.

In essence, while ELO ratings provide quantitative insights, the passion, drama, and unpredictability of football ensure that the sport remains a captivating spectacle where underdogs can defy the odds and giants can stumble, keeping fans enthralled with every match.

--

--