Which Premier League Team had the easiest time in the 2022/23 season? — Part 1

Tom Adams
6 min readDec 19, 2023

--

Football is a funny game, the highs are high and the lows are low. Among this unpredictability, have you ever asked: “Why does it seem like a certain team consistently faces opponents when they’re not at their best?”
This article aims to analyse which teams experienced comparatively smoother or tougher seasons, gauged by the performance levels of their opponents.

To begin the analysis, the first step was to gather a datasource. Its no secret that the internet is densely populated with many different footballing datasets, but this approach uses the free data available through the API-Sports available here.

Accessing this data was done through the following Python script below. This makes use of two packages requests and json, the former to call the data and the latter to locally save it.

import requests
import json


url = "https://api-football-v1.p.rapidapi.com/v3/fixtures"

headers = {
"X-RapidAPI-Key": "YOUR_KEY",
"X-RapidAPI-Host": "api-football-v1.p.rapidapi.com"
}

params = {
"league": "39", #League Code for the Premier League (can be replace with any desired league)
"season": "2022", #Season in question, again, can be replace with whatever season
}

# call the data
premier_league_results = requests.get(url, headers=headers, params=params)

if premier_league_results.status_code == 200:

json_response = premier_league_results.json()

file_path = '/Users/tomadams/Documents/FootballApi/Datasets/premier_league_results.json'
#save the data locally
with open(file_path, 'w') as file:
json.dump(json_response, file, indent=4)
print(f"Updated JSON data saved to '{file_path}'")
else:
print("Failed to retrieve data. Status code:", premier_league_results.status_code)

This api call returns the data in this format:

{
"get": "fixtures",
"parameters": {
"league": "39",
"season": "2022"
},
"errors": [],
"results": 380,
"paging": {
"current": 1,
"total": 1
},
"response": [
{
"fixture": {
"id": 867946,
"referee": "A. Taylor",
"timezone": "UTC",
"date": "2022-08-05T19:00:00+00:00",
"timestamp": 1659726000,
"periods": {
"first": 1659726000,
"second": 1659729600
},
"venue": {
"id": 525,
"name": "Selhurst Park",
"city": "London"
},
"status": {
"long": "Match Finished",
"short": "FT",
"elapsed": 90
}
},
"league": {
"id": 39,
"name": "Premier League",
"country": "England",
"logo": "https://media-4.api-sports.io/football/leagues/39.png",
"flag": "https://media-4.api-sports.io/flags/gb.svg",
"season": 2022,
"round": "Regular Season - 1"
},
"teams": {
"home": {
"id": 52,
"name": "Crystal Palace",
"logo": "https://media-4.api-sports.io/football/teams/52.png",
"winner": false
},
"away": {
"id": 42,
"name": "Arsenal",
"logo": "https://media-4.api-sports.io/football/teams/42.png",
"winner": true
}
},
"goals": {
"home": 0,
"away": 2
},
"score": {
"halftime": {
"home": 0,
"away": 1
},
"fulltime": {
"home": 0,
"away": 2
},
"extratime": {
"home": null,
"away": null
},
"penalty": {
"home": null,
"away": null
}
}
},
... //Other Games
}

The eagle-eyed will notice that this api call comes with a lot of available data for each game. Not all of this data is used in the further analysis, the key data taken further is the date of the game, the team names and the winner of the match.

Moving on, lets begin the script to generate the form for each team. The first step is to get a unique list of all the teams in the league. This is then stored in the unique_teams variable with a dict called team_matches that will be used later.

import json
from datetime import datetime

with open('/Users/tomadams/Documents/FootballApi/Datasets/premier_league_results.json', 'r') as f:
data = json.load(f)['response']

unique_teams = sorted(set([x['teams']['home']['name'] for x in data]))

team_matches = {team: [] for team in unique_teams}

Following on from this, the next step is to generate the rolling form for each team. This approach doesn’t discriminate between a home form and an away form, but treats them as a singularity.

The function calculate_team_form takes a single team and queries the api called data and filters it to only show games that the team in question is playing. Therefore, each results variable when called will have 38 games, 19 for the home games and 19 for the away games. This method required the step of then sorting these games by date.

The next for loop goes through each game that the team in question is in and finds whether that team played home or away. With this, it then determines whether that team won, lost or drew their game.

Subsequently, next step works through assigning the form to each team. I have set form to be stored as a rolling of the previous 5 games that the team has played. Therefore, the first round of games has been set to an array of 5 nulls. It is not until the 6th game will the form be rid of any null values.

Finally, form is then transformed to a numeric approach using the sum of the form_mapping divided by 5 (for the number of games used for form).
This gives a form ranging between 0–3, with the best possible outcome of a team being [W,W,W,W,W] which will give a form of [3,3,3,3,3] when summed and divided by 5 will give 3.

# Mapping dictionary
form_mapping = {
"null": 0,
"W": 3,
"D": 1,
"L": 0
}

def calculate_team_form(team_name):
results_home = [x for x in data if x['teams']['home']['name'] == team_name]
results_away = [x for x in data if x['teams']['away']['name'] == team_name]
results = results_home + results_away

for result in results:
result['fixture']['date'] = datetime.fromisoformat(result['fixture']['date'].replace('Z', '+00:00')).isoformat()

single_team_season = sorted(results, key=lambda x: x['fixture']['date'])

matches = []
form = []

for idx, match in enumerate(single_team_season):
match_date = match["fixture"]["date"]
selected_team = next(team for team in match["teams"].values() if team["name"] == team_name)

if selected_team["winner"] is True:
result = "W"
elif selected_team["winner"] is False:
result = "L"
else:
result = "D"

if match['teams']['home']['name'] == team_name:
home_away = 'home'
other_team_name = match['teams']['away']['name']
else:
home_away = 'away'
other_team_name = match['teams']['home']['name']

form_length = min(idx, 5)
form = [matches[i]["result"] for i in range(idx - form_length, idx)] if form_length > 0 else []

while len(form) < 5:
form.insert(0, None)

matches.append({
"date": match_date,
"name": team_name,
"home_away": home_away,
"opponent": other_team_name,
"result": result,
"form": form[:],
"form_score": sum(form_mapping.get(item, 0) /5 for item in form)
})


team_matches[team_name] = matches

[calculate_team_form(x) for x in unique_teams]

with open('/Users/tomadams/Documents/FootballApi/Datasets/team_matches.json', 'w') as json_file:
json.dump(team_matches, json_file, indent=2)

The function is called using a list comprehension and goes through each of the teams and produces the following format of data as a JSON.

{
"Arsenal": [
{
"date": "2022-08-05T19:00:00+00:00",
"name": "Arsenal",
"home_away": "away",
"opponent": "Crystal Palace",
"result": "W",
"form": [
null,
null,
null,
null,
null
],
"form_score": 0.0
},
{
"date": "2022-08-13T14:00:00+00:00",
"name": "Arsenal",
"home_away": "home",
"opponent": "Leicester",
"result": "W",
"form": [
null,
null,
null,
null,
"W"
],
"form_score": 0.6
},
...// rest of Arsenals games
],
"Aston Villa": [
{
"date": "2022-08-06T14:00:00+00:00",
"name": "Aston Villa",
"home_away": "away",
"opponent": "Bournemouth",
"result": "L",
"form": [
null,
null,
null,
null,
null
],
"form_score": 0.0
},
{
"date": "2022-08-13T11:30:00+00:00",
"name": "Aston Villa",
"home_away": "home",
"opponent": "Everton",
"result": "W",
"form": [
null,
null,
null,
null,
"L"
],
"form_score": 0.0
},
...// rest of Aston Villas Games
]
...// rest of the teams games
}

This JSON is a manipulated, streamlined version of that called through the API. The keynote addition to this data of course being the form array.

With the forms calculated for the team in question, the next step is to use this data to act as a lookup on itself to bring in the opposition teams form.
The lookup is used through searching for the string of opponents team name and the date that the game was played.

def merge_opponent_form(team_name, team_matches):
# Iterate through each game for the specified team
for game in team_matches[team_name]:
opponent_name = game['opponent']

# Find the opponent's data
opponent_data = None
for opponent_game in team_matches[opponent_name]:
if opponent_game['date'] == game['date']:
opponent_data = opponent_game
break

# Merge the opponent's form into the game object
if opponent_data:
game['opponent_form'] = opponent_data['form_score']
else:
game['opponent_form'] = None

return team_matches[team_name] # Return the updated game data for the specified team

updated_team_matches = {team_name: merge_opponent_form(team_name, team_matches) for team_name in team_matches}

with open('/Users/tomadams/Documents/FootballApi/Datasets/team_season.json', 'w') as json_file:
json.dump(updated_team_matches, json_file, indent=2)

Finally, with the oppositions form in the same object as the team in question, we can get the average opposition score. Ordering the average opposition score in ascending order will show us which team had the easiest (top) and hardest (bottom) season.

import json
from collections import defaultdict

with open('/Users/tomadams/Documents/FootballApi/Datasets/team_season.json', 'r') as f:
match_data = json.load(f)

opponent_forms = defaultdict(list)

# Iterate through each team's match data
for team, matches in match_data.items():
for match in matches:
opponent = match['opponent']
opponent_form = match['opponent_form']
opponent_forms[team].append(opponent_form)

# Calculate the average opponent_form for each team
average_opponent_form = {}
for team, forms in opponent_forms.items():
average_opponent_form[team] = sum(forms) / len(forms)

# Sort the teams by their average opponent form in ascending order
sorted_teams = sorted(average_opponent_form.items(), key=lambda x: x[1])

# Display the average opponent_form for each team in ascending order
for team, avg_form in sorted_teams:
print(f"Team: {team} - Average opponent_form: {round(avg_form,3)}")

So here we have it, we can see that through this approach Nottingham Forest has the easiest season with the average form of their opponents being 1.132.
Both Southampton and Tottenham had equally the toughest season with an average opposition form of 1.447.

--

--