Football Data Analysis Project (Python) using Docker Image

Iqra Ishtiaq
DiveDeepAI
Published in
7 min readMay 13, 2022

--

Docker

Docker is a platform that you can use to host and move your applications between environments. By using this open system of virtual machines, it’s easier than ever before for developers like yourself who are looking forward — not back-to-run software without worrying about provisioning servers or managing drivers’ conflicts with other platform release schedules; just create an image once then simply push it out into production. Docker is a game changer for developers. It enables them to ship, test and deploy code quickly so they can get back into the swing of things without waiting around forever or having their work delayed by some QA process.

Docker Container

Docker containers are lightweight, fast and easy to use. The isolation allows you to run many applications simultaneously on a given host without fear of conflicts or side effects from other processes running in the background slowing down your app’s performance- everything inside holds its own file system which means there is no need for an external database since everything can reside within one centralized location. Containers contain everything needed to run the application, so you do not need to rely on what’s currently installed. They can also be reused from one client machine or deployed across many servers. Docker provides a platform to manage the lifecycle of your containers:

  • Develop your application and its supporting components using containers.
  • The container becomes the unit for distributing and testing your application.
  • When you’re ready, deploy your application into your production environment, as a container or an orchestrated service. This works the same whether your production environment is a local data center, a cloud provider, or a hybrid of the two.

Docker Architecture

Figure-1: Docker Architecture

Docker Installation

To install the docker download from https://docs.docker.com/get-docker/

Data Analysis

Data analytics is the art of using algorithms to analyze raw data in order to draw conclusions about it. It can be automated, like many processes are now being done by machines instead of humans themselves. Data analytics is the study of analyzing vast quantities of data in order to draw conclusions about its contents. The techniques and processes involved with this field have been automated so they work over raw information without human interference, making them more efficient than ever before for consumers on both sides: those looking at charts or graphs that tell stories based off numbers; as well as professionals who need access quickly when there’s an outbreak situation happening somewhere else around our world.

Football Data Analysis Python Project with Docker

In this project I have taken a dataset of EPL 20/21 from Kaggle and performed a different analysis on the basis of that data. In this project I have done the following analysis:

  • Pie chart for showing the penalties that have been missed and scored.
  • Most Players come from which nation (top 10 and all over).
  • Clubs with the maximum number of players in their squad.
  • Clubs with a minimum number of players in their squad.
  • Total Players with Age Group.
  • Total under 20 players in each club.
  • Under 20 players in a specific club.
  • Players above 30 with their name and club for the specific club.
  • Average age of players in each club.
  • Total assists from each club.
  • Top 50 assists by players along with their Name, Club Name, No of Assists and Matches Played.
  • Total number of goals by a club.
  • Player with the most goals (maximum: 50)
  • Goals per match
  • Goals with and without Assist.
  • Players with a greater number of yellow cards.
  • Players with a greater number of red cards.

Football Data Analysis Project: Dataset

Below is the data of EPL 2020/2021 that I have taken from Kaggle, the link is as follows: ‘https://www.kaggle.com/datasets/charlesadedotun/epl-data-2021-22’.

Football Data Analysis Project: Implementation Step by Step

Following are the steps for the Football Data Analysis project:

  1. Importing the libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import csv

2. Loading Dataset

#Load DataSet
epl_df=pd.read_csv('Your file path')
epl_df

3. Create two new columns

#Create 2 new columnsepl_df["MinPerMatch"]=(epl_df["Mins"]/epl_df["Matches"]).astype(int)
epl_df["GoalsPerMatch"]=(epl_df["Goals"]/epl_df["Matches"]).astype(float)
epl_df.head()

4. Sum of total goals overall scored in EPL season 20/21

#Total Goals overall scored in Epl season 20/21
Total_Goals=epl_df["Goals"].sum()
print(Total_Goals)

5. Sum of total penalty goals overall scored in EPL season 20/21

#Penalty Goals
Total_Penalty_Goals=epl_df["Penalty_Goals"].sum()
print(Total_Penalty_Goals)

6. Sum of total penalty attempt overall scored in EPL season 20/21

#Penalty Attempts
Total_Penalty_Attempts=epl_df["Penalty_Attempted"].sum()
print(Total_Penalty_Attempts)

7. Pie chart for penalties missed and scored

#Pie chart for penalities missed and scoredplt.figure(figsize=(13,6))
plt_not_score=epl_df["Penalty_Attempted"].sum()- Total_Penalty_Goals
data=[plt_not_score,Total_Penalty_Goals]
labels=['Penalities Missed','Penalities Scored']
color = sns.color_palette("Set2")
#autopct is used to display percentage values
plt.pie(data,labels=labels,colors=color,autopct='%.0f%%')
plt.show()

Output:

Figure-2: Pie Chart: Penalities Missed Vs Penalities Scored

8. Most players come from which nation (top 10)

#Most Players comes from which nation from top 10
nationality=epl_df.groupby("Nationality").size().sort_values(ascending=False)
nationality.head(10).plot(kind="bar",figsize=(12,6),color=sns.color_palette("magma"))
Figure-3: Bar Chart: Top10 Most Players Nationality

9. Most players come from which nation (overall)

#Most Players comes from which nation 
nationality.plot(kind="bar",figsize=(12,6),color=sns.color_palette("flare"))

Output:

Figure-4: Bar Chart: Players Nationality

10. Clubs with max players in their squad

#Clubs with max players in their squad
epl_df["Club"].value_counts().nlargest(5).plot(kind='bar',color=sns.color_palette("viridis"))

Output:

Figure-5: Bar Chart: Maximum Players In Each Club

11. Clubs with minimum players in their squad

#Clubs with min players in their squad
epl_df["Club"].value_counts().nsmallest(5).plot(kind='bar',color=sns.color_palette("viridis"))

Output:

Figure-6: Bar Chart: Minimum Players In Each Club

12. Total Players with Age Groups

#Players based on age groupunder20=epl_df[epl_df["Age"]<=20]
age20_25=epl_df[(epl_df["Age"]>20) & (epl_df["Age"]<=25)]
age25_30=epl_df[(epl_df["Age"]>25) & (epl_df["Age"]<=30)]
above30=epl_df[epl_df["Age"]>30]
x=np.array([under20["Name"].count(),age20_25["Name"].count(),age25_30["Name"].count(),above30["Name"].count()])
mylabels=["<=20",">20 & <=25",">25&<=30",">30"]
plt.title("Total Players with Age Groups",fontsize=20)
plt.pie(x,labels=mylabels,autopct="%.1f%%")
plt.show()

Output:

Figure-7: Pie Chart: Total Players with Age Groups

13. Total under 20 players in each club

#Total under 20 players in each clubsplayers_under20=epl_df[epl_df["Age"]<20]
players_under20["Club"].value_counts().plot(kind="bar", color=sns.color_palette("cubehelix"))

Output:

Figure-8: Bar Chart: Total Under 20 Players In Each Club

14. Under 20 players in specific club

#Players under 20 with their club name.
pl_20=players_under20[players_under20["Club"]=="Manchester United"]
pl_20
pl_20[['Name', 'Club']]

15. Above 30 players in specific club

#Above 30 players in Man.City
pl_30=above30[above30["Club"]=="Manchester City"]
pl_30
pl_30[['Name', 'Club']]

16. Average age of players in each club

#Avg age of players in each club
plt.figure(figsize=(12,6))
sns.boxplot(x="Club",y="Age",data=epl_df)
plt.xticks(rotation=90)

Output:

Figure-9: BoxPlot Chart: Average Age Of Players In Each Club

17. Total Assists from each club

#Total assists from each clubAssist_by_clubs=pd.DataFrame(epl_df.groupby("Club",as_index=False)["Assists"].sum())
sns.set_theme(style="whitegrid",color_codes=True)
ax=sns.barplot(x="Club",y="Assists",data=Assist_by_clubs.sort_values(by="Assists"),palette="Set2")
ax.set_xlabel("Clubs",fontsize=25)
ax.set_ylabel("Assists",fontsize=25)
plt.xticks(rotation=75)
plt.rcParams["figure.figsize"]=(20,8)
plt.title("Plot of Clubs vs Total Assists", fontsize=25)

Output:

Figure-10: BoxPlot Chart: Total Assists From Each Club

18. Top 50 assists

#Top 50 assists
top_50_assists=epl_df[["Name","Club","Assists","Matches"]].nlargest(n=50,columns="Assists")
top_50_assists

19. Total goals by a club

Total Goals by a club
Goals_by_clubs=pd.DataFrame(epl_df.groupby("Club",as_index=False)["Goals"].sum())
sns.set_theme(style="whitegrid",color_codes=True)
ax=sns.barplot(x="Club",y="Goals",data=Goals_by_clubs.sort_values(by="Goals"),palette="rocket")
ax.set_xlabel("Clubs",fontsize=30)
ax.set_ylabel("Goals",fontsize=20)
plt.xticks(rotation=75)
plt.rcParams["figure.figsize"]=(20,8)
plt.title("Plot of Clubs vs Total Goals", fontsize=20)

Output:

Figure-11: BoxPlot Chart: Total Goals By Club

20. Goals per match

#Goals per matchtop_50_goals_per_match=epl_df[["Name","GoalsPerMatch","Matches","Goals"]].nlargest(n=50,columns="Goals")
top_50_goals_per_match

21. Goals goals by players

#Most goals by playerstop_50_assists=epl_df[["Name","Club","Goals","Matches"]].nlargest(n=50,columns="Goals")
top_50_assists

22. Goals with assists without assist

#Goals with assists without assistplt.figure(figsize=(14,7))
assists=epl_df["Assists"].sum()
data=[Total_Goals-assists,assists]
labels=["Goals without Assists","Goals with Assits"]
color=sns.color_palette("Set1")
plt.pie(data,labels=labels,colors=color,autopct="%.0f%%")
plt.show()

Output:

Figure-12: Pie Chart: Goals With And Without Assist

22. Players with most yellow cards

#Most number of players recieved Yellow cards
epl_yellow_card=epl_df.sort_values(by="Yellow_Cards",ascending=False)[:30]
plt.figure(figsize=(20,6))
plt.title("Players with most Yellow Cards")
c=sns.barplot(x=epl_yellow_card["Name"],y=epl_yellow_card["Yellow_Cards"],label="Players Name",color="yellow")
plt.ylabel("Number of Yellow Cards")
c.set_xticklabels(c.get_xticklabels(),rotation=45)
c

Output:

Figure-13: Bar Chart: Players With Most Yellow Cards

23. Players with most red cards

#Most number of players recieved Red cards
epl_red_card=epl_df.sort_values(by="Red_Cards",ascending=False)[:30]
plt.figure(figsize=(20,6))
plt.title("Players with most Red_Cards")
red=sns.barplot(x=epl_red_card["Name"],y=epl_red_card["Red_Cards"],label="Players Name",color="red")
plt.ylabel("Number of Red_Cards")
red.set_xticklabels(c.get_xticklabels(),rotation=45)
red

Output:

Figure-14: Bar Chart: Players With Most Red Cards

Football Data Analysis Project: Docker Image

Docker image contain all the above code. Just pull the image and run it as shown in below steps.

  1. Below is the link of DockerHub. Visit the link

2. Open cmd and pull the docker image from docker hub to run Football Project.

docker run -it –rm –name ds -p 8888:8888(Your port in which docker is running). /football_analysis_epl_20_21: latest

3. Wait until the image is downloaded and then open the link provided in your cmd to run the football project. E.g.

Figure-15: Jupyter Notebook Link

--

--