Creating an ML model for predicting English Premiership results (or using Machine Learning to figure out if Arsenal will return to the Champions League in 2020–2021)

Published in

Analytics Vidhya

18 min readJul 5, 2020

Hi everyone! This is my first article on Medium. I’ll be writing about my journey in learning Machine Learning, so let’s get started!

I grew up in Mexico so there is only one sport that I follow. Actually it’s more of a religion than a sport. Of course, I’m talking about soccer. As you might have already deduced from the picture above, my favorite team is Arsenal (we run North London!)so, when I was looking for some data to play with I was hoping to find something about soccer, especially about Arsenal. Thankfully, I found this public Kaggle dataset (thanks Alvin!!) that has the results of every Premiership match from 2000 to March 2020.

What we are going to do is take this dataset and build a neural network that will predict the results of future premiership matches. Even though Liverpool has already won the championship, Arsenal still has a chance of qualifying for the Champions League so we need to know if that is likely to happen. Specifically what we are going to do is:

Create a Jupyter Notebook where we will do all our calculations
Download Alvin’s dataset
Visualize a few dimensions of the dataset
Preprocess the data so it can be fed to a neural network
Breakup the data into training and test sets
Create a neural network
Train the neural network with our training data
Test how good our neural network is with our test data
Deploy our model as an API using TensorFlow model serving
Ask our API to predict all the results from the remaining Arsenal matches to see if we will make it back to the Champions League

Creating a Jupyter Notebook

There are many environments where we can create our notebook but we are going to use google colab to host our notebook. Just go to that URL and you will immediately be taken into your first notebook. If it’s your first time you should see something like the picture below:

Now we can start adding data and code to our notebook.

Downloading our data

We could have created our notebook in Kaggle’s environment but I already used Google Colab for some Coursera ML classes that I took, so I went with what I was familiar with.

There are many ways we could have downloaded Alvin’s dataset. What I did was download it to my Google Drive and from there I was able to download the dataset into the colab notebook. So the first thing we need to do is import all the libraries we need and mount our drive so its file system is accessible to our colab notebook. Your colab cell would look like this one:

import pandas as pd
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import os
import codecs, json
import tempfile
import requests
import base64from google.colab import drivedrive.mount("/content/drive")

Once you execute the cell, google will take you through an authorization cycle that involves giving Google Drive access to your colab notebook. Once that happens the drive is mounted. We can run an ls command to make sure the drive is mounted:

!ls /content/drive

and your response should be something like the image below.

Cool! Our drive is mounted. Our files are under ‘/content/drive/My Drive/’ so from here I can go look for the dataset we want to use. I called it EPLresults.csv so we will load this csv file into a pandas dataframe and we will display the first few rows from the file. The code below accomplishes these tasks:

file_path = "/content/drive/My Drive/EPLresults.csv"my_df = pd.read_csv(file_path)print('The shape of our dataset is ', my_df.shape)my_df.head()

The result you should get is shown in the image below:

Showing the first few rows from the English Premiership Games Dataset

As we can see, our dataset has 7,386 rows (that is a lot of soccer games!) with 22 columns per row. We can see some of the rows are:

Date the game was played
Home Team
Away Team
Full Time Result (FTR)

and so on. You can see a description of all the columns here. OK, so we have all the columns but we need to check what type they are. Why, you ask? Well, one reason is because Neural Networks do not accept strings as inputs. Let’s take a look at the column types with the command below:

my_df.info()

Your input should look like the image below:

Premiership Games Dataset’s Column Types

OK so we have 16 int columns and 6 object columns. Remember this fact as we will need to modify the object columns later so they can be used as inputs for our neural network.

Visualizing our data

It is recommended that we do some visualizations to familiarize ourselves with our dataset. We’ll do a couple for this dataset.

The first one is the frequency of the Full Time Result (FTR). We want to see what type of result is more prevalent: home team wins, away team wins or draw. We will use matplotlib to draw a histogram of the Full Time Result. The code is shown below:

fig, chart = plt.subplots()
data = my_df['FTR'].value_counts()points = data.index
frequency = data.valueschart.bar(points, frequency)chart.set_title('Frequency of different results in the English Premiership (2001-2020) ')
chart.set_xlabel('Result Type')
chart.set_ylabel('Frequency')

Our resulting histogram is shown below:

We can see that home wins are way more prevalent than away wins which makes sense. The crowd is always an important factor in a soccer game. We can then reason that the teams that have played the most home games are most likely to win and, therefore, more likely to win the title and/or qualify for the Champions League.

Using code similar to the above let’s plot the number of home games for each Premiership team. The resulting histogram is shown below:

Kind of hard to read but we can see there are about 6 teams with the most home games: Chelsea, Everton, Manchester United, Liverpool, Tottenham & Arsenal. You would think one of those teams is therefore most likely to win the title and these 6 teams are then most likely to finish at the top of the table and, therefore, most likely to be in the Champions League.

We can visualize other dimensions but let’s stop here for now.

Preprocessing the data

Now we have the data we need to decide what columns we are going to use as inputs to our neural network. Before doing that, we will make a copy of our data frame and make all our column changes on that copy. Why? We want to keep the original dataframe in case we screw up the column transformations in the new dataframe.

OK, so what columns do we keep and what columns do we get rid of, if any?

We definitely need the teams that play the games and statistics like number of fouls, number of yellow cards, number of red cards and half time score. All those columns look like features that will help us predict scores of future games. What about a couple of other columns?

Let’s start with the referee. A referee has a huge impact on the game but this dataset contains games for the past 20 seasons. Some of the referees have already retired so I’m not sure using the referee column is going to helps us with our predictions. We will drop it for this reason.

The code below creates our dataframe copy and drops the referee column. We also print the first few rows to confirm the referee column is gone.

epl_df_objects = my_df.copy()
epl_df_objects.drop('Referee', axis=1, inplace=True)epl_df_objects.head()

You should see a result like this one:

Our Dataframe columns after dropping the Referee column

Great, we can see we now have 21 columns (all the columns we originally had minus the referee column we just dropped).

Before moving on, let’s check if our dataset has any null values. If so, we will need to fix them. The command below will show us how many null values exist in our dataset:

print(epl_df_objects.isnull().values.sum())

You should get 0 as a result. This is great! Alvin has saved us from a tedious round of fixing the dataset. Alvin’s da man!!!

We can now move on to other columns. How about the date the game was played on? I’m going to say we don’t care too much about the actual day but more about the day of the week. Premiership games are usually played on Saturday, Sunday and Monday. If you play on Monday on one match day you have a short week to recover for the next match day. We can then hypothesize that teams that play on Monday are at disadvantage for matches the following week and they are more likely to lose or tie that game. However, we can also make the opposite argument: teams that play on Monday had a longer rest period than the teams that played on the weekend so they might have a better chance of winning the Monday game. In any case, it does seem like the day of the week is a factor so we’ll include it. We’ll then go with converting our date column to a day of the week column. The code below accomplishes that:

epl_df_objects["matchDate"] = pd.to_datetime(epl_df_objects["Date"], infer_datetime_format=True)epl_df_objects['matchDay'] = epl_df_objects['matchDate'].dt.day_name()print(epl_df_objects["matchDate"][0])
print(epl_df_objects['matchDay'][149])epl_df_objects.drop('Date', axis=1, inplace=True)
epl_df_objects.drop('matchDate', axis=1, inplace=True)
epl_df_objects.head()

What did we do? First, we created a new dataframe column, matchDate which converts the date string column (remember it had an ‘object’ type when we printed all the column types?) to a python datetime object. From then we extract the date of the week into a new dataframe column that we call ‘matchDay’. We then print a matchDay instance, row 149 in this case, just to make sure we have what we need. The 2 drop commands remove the columns we don’t need anymore. Finally, we print a few rows of our dataframe to confirm our dataframe is the correct one. You should see output like the one below:

Our dataframe once we converted the date to a day of the week column

Cool for school!! We are in business. Now we need to look at the rest of the object/string columns and decide what we are going to do with them. We definitely want these columns as inputs to our neural network since they are columns such as the home team and the away team. We need to convert them to numbers. One way would be to just convert them to numbers and we could end up with something like Arsenal = 1, Aston Villa = 2 and so on. This is OK but not totally OK. Suppose we have 50 teams in total and Arsenal = 1 and West Ham is the last one and is therefore 50. Does that mean our neural network will give West Ham 50 times more weight than to Arsenal? We don’t want that. To fix this we are going to use get_dummies dataframe command. The get_dummies command will create dummy columns for each team and their status so we that we will have equal values. We will end up with columns like the following:

HomeTeam_Arsenal
HomeTeam_Aston Villa
AwayTeam_Newcastle

and so on. Such columns will have binary values: 0’s for all the teams except for the home and away team that are actually playing on that game. We now have eliminated the feature weight problem. In Machine Learning-speak, we have converted categorical features (features that have a finite set of values that have no ordering between them) into binary/indicator variables. The code below accomplishes this for the HomeTeam, the AwayTeam, the HTR and the matchDay features.

epl_df_objects = pd.get_dummies(epl_df_objects, columns=['HomeTeam'], prefix = ['HomeTeam'])
epl_df_objects = pd.get_dummies(epl_df_objects, columns=['AwayTeam'], prefix = ['AwayTeam'])
epl_df_objects = pd.get_dummies(epl_df_objects, columns=['HTR'], prefix = ['HTR'])
epl_df_objects = pd.get_dummies(epl_df_objects, columns=['matchDay'], prefix = ['matchDay'])epl_df_objects.head()

We now have many more columns. Below we show just a few of these new columns.

OK, now we have all the features we are going to input to the Neural Network. How about the label? Glad you asked. The label has letters so we need to convert those letters to numbers. To accomplish that we use sklearn’s LabelEncoder, we then print the unique values to confirm we have numbers.

We print 3 full time results from different games so we can know which number represents which outcome (home team won, away team won or tie). Finally, we assign all the features to an intermediate variable. The code below accomplishes all these actions.

from sklearn.preprocessing import LabelEncoder
label_encoder = LabelEncoder()epl_df_objects['FTR']= label_encoder.fit_transform(epl_df_objects['FTR'])print('Unique values for our label are: ', epl_df_objects['FTR'].unique())print('if the home team wins the label is ', epl_df_objects['FTR'][0])
print('if the away team wins the label is ', epl_df_objects['FTR'][2])
print('if there is a tie the label is ', epl_df_objects['FTR'][3])label = epl_df_objects['FTR']print('the result for the match in row 149 is ', label[149])print(epl_df_objects.iloc[:,3:113])features = epl_df_objects.iloc[:,3:113]

We should get an output like the one below:

Output after converting our label to numbers

Great, we can see we have three unique values and we can also see that 0 means the away team won, 1 means there was a tie and 2 means the home team won. We can also see that we now have 110 features we will input to our neural network; that’s quite a change from the original 21 features we had at the beginning of this exercise.

Phew! And with that we have completed the preprocessing of our data. As you can see, it took us quite some time.

Creating Training and Test Sets

How are we going to break our features into training and test sets? Well, one suggestion is to have 67% of the original data set as the training set and the rest as the test set. We’ll go with that recommendation and we will use sklearn’s train_test_split to do the split.

The code below shows the test split. After the split we print the shapes of the resulting training and test sets to see how many rows we have on each one.

from sklearn.model_selection import train_test_splity=np.ravel(label)X = features
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, shuffle=False)print("The shape of X_train is " + str(X_train.shape))
print("The size of y_train is " + str(y_train.shape))
print("The size of X_test set is " + str(X_test.shape))
print("The size of y_test is " + str(y_test.shape))

Your output should be similar to the one below:

Next, we perform one last data processing exercise: we transform our label to a one-hot encoded variable using keras’ to_categorically command as shown in the code snippet below. We then print the shape of the one-hot encoded y sets and we print one row of the training labels to confirm the label is one-hot encoded.

#one hot-encoding y_train and y_testy_train = tf.keras.utils.to_categorical(y_train, num_classes=3)
y_test = tf.keras.utils.to_categorical(y_test, num_classes=3)print("The size of y_train is " + str(y_train.shape))
print("The size of y_test is " + str(y_test.shape))print(y_train[0])

Our output should be something like this:

Shape of the label sets and an example of one of our one-hot encoded labels

Now we can move on to our neural network.

Creating the Neural Network

We create our neural network as follows:

One layer with an input of 110 features which correspond to the number of features we want to input to our dataset
One intermediate layer
One output layer with activation softmax and 3 outputs which correspond to our 3 possible outcomes: home team win, away team win or tie. We use softmax as the activation because we have more than 2 possible outcomes. (If we only had 2 outcomes we would use sigmod as the activation function.)

The code below creates the neural network, prints the model summary and compiles it:

model = tf.keras.models.Sequential([tf.keras.layers.Dense(330, input_dim=110, activation='relu'),tf.keras.layers.Dense(10, input_dim=330, activation='relu'),tf.keras.layers.Dense(3,activation='softmax')])model.summary()model.compile(loss = 'categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

Our output should be similar to the one below:

Model Summary for our Neural Network model

Training the Neural Network with training data

We can now train the neural network with the training dataset. We accomplish this with the code below:

history = model.fit(X_train, y_train, epochs=65)

Below is the output from the last few epochs:

What do we notice? Well, we can see our loss function is going down and our accuracy is going up so this good. Let’s plot our loss function and accuracy to get a better picture of how they changed as we trained the model. We accomplish this with the code below:

#accuracy history
plt.plot(history.history['accuracy'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train'], loc='upper left')
plt.show()#loss history
plt.plot(history.history['loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train'], loc='upper left')
plt.show()

The resulting two charts are shown below.We can see our model loss going down and the model accuracy going up as we train the model. Also notice how at the end of both charts we see the function becoming unstable: going down and up. This is an indication we probably don’t need to run the last epochs so, for future training sessions, we can reduce the number of epochs.

Testing our neural network with test data

So are our results good? Yes, but are they too good? One of the dangers we face is that we are overfitting our model to the training data. This means that our model does great with the training dataset, but will not do well with other data.

How do we find out? We will test our model with the test data and see what kind of results we get. The code to test the model with the test set is below:

score = model.evaluate(X_test, y_test, verbose=1)print("Test Score:", score[0])
print("Test Accuracy:", score[1])

And we get the results below. Just as we suspected, we are overfitting and model accuracy and loss does not translate well to our test set.

Our model’s loss and accuracy when evaluated with test data

So here’s a fork in the road. We can spend more time changing our model to reduce overfitting or continue with deploying the model as an API and see if Arsenal is going to make it to the Champions League. We’ll continue with the deployment and we will work on optimizing our model in a future post.

Let’s make a prediction using our model and see what we get. We will input the Arsenal vs Norwich game that was played on July 1, 2020. The code below creates the data to send to the model, receives the prediction and prints it.

Xnew = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]])print(Xnew.shape)# make a prediction
ynew = np.argmax(model.predict(Xnew), axis=-1)# show the inputs and predicted outputs
print("X = %s " % Xnew)
print("Prediction = %s" % ynew[0])

The output is shown below. Our model is predicting the home team is going to win and it just so happens Arsenal won that game. Even a model that has 59% accuracy is right some of the time :-)

Sending a prediction to our model and printing the output

Now we can move on to deploy our model as an API.

Deploying our model as an API using TensorFlow model serving

The first thing we need to do is save our model files with the code shown below.

MODEL_DIR = tempfile.gettempdir()
version = 1export_path = os.path.join(MODEL_DIR, str(version))
if os.path.isdir(export_path):
    print('\nAlready saved a model, cleaning up\n')
    !rm -r {export_path}model.save(export_path, save_format="tf")print('\nexport_path = {}'.format(export_path))!ls -l {export_path}

Our model is saved to a temp directory as shown below:

Our model files saved in a temp directory

Now we can download the tensorflow-model-server code:

!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list && \curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -!apt update

and install the server:

!apt-get install tensorflow-model-server

You should see output like the one below:

Output after installing the model serving libraries

Cool, now we can run our api server:

os.environ["MODEL_DIR"] = MODEL_DIR%%bash --bgnohup tensorflow_model_server \
--rest_api_port=8501 \
--model_name=epl_predictions \
--model_base_path="${MODEL_DIR}" >server.log 2>&1

Once the server is running you should see the following message:

Asking the API to predict the results of the remaining Arsenal games

Cool, our server is now running. We are going to send the remaining Arsenal games, 8, to the api server and we’ll see what our model predicts.

First, we create an object with all the columns (all 110) for each game. We then convert it to a list and then we save it as a JSON object. The code below accomplishes that:

entry = np.array([[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0],[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0],[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1],[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0],[0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0]])print(type(entry))
print(entry.shape)the_list = entry.tolist()
print(type(the_list))data = json.dumps({"signature_name": "serving_default", "instances": the_list})
print('Data: {} ... {}'.format(data[:50], data[len(data)-52:]))

OK, now that we have the request in the right format we can send it to our api with the following code:

!pip install -q requestsheaders = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/epl_predictions:predict', data=data, headers=headers)response = json.loads(json_response.text)
predictions = response['predictions']print(json_response)
print(json_response.text)
print(response['predictions'])my_predictions = np.array(predictions)
print("The predictions are: ",np.argmax(my_predictions,axis=1))

What we do above is we send the request with the requests.post command and receive the response in json_response. We then extract the predictions object and print it so we can see the probabilities we get for each of our possible 3 outcomes (0 if away team wins, 1 if the teams tie, and 2 if the home team wins). Finally, we extract the array index with the max probability and that tells us that our prediction is 0,1 or 2. We got the output below:

Our model’s predictions for all the remaining Arsenal games

Our model is predicting the home team will win all games.

Do we believe that? Kind of, as we saw from our first histogram at the top of this page, home team winning is the most common outcome and we speculated it is because the home team has the advantage of the crowd. However, during this Covid-19 era there is no crowd anymore. Our model doesn’t know that, though, so that’s a good feature to add for a future iteration of the model.

For now let’s assume we believe it. If this is the case then we have the following record for Arsenal:

5 wins (it is playing as the home team in 5 of the games we sent) and 3 losses (it is playing as the away team in the other 3 games we sent)

Before playing Norwich (the first of the 8 games we sent to our API) Arsenal had 43 points. 43 + 15 (5 games won as the home team) = 58 so, according to our model, the Gunners will end the season with 58 points. Is that enough for the Champions League? Only the first 4 teams are eligible for the Champions League so Arsenal has to end up in 4th place or better. Is 58 points enough for 4th place? Well, let’s take a look at where things stand after match day 30 (the premiership has 38 match days):

Liverpool has way more than 58 points so there are only 3 spots left.
Man City has 63 points so they are also in the Champions League.
Leicester City has 54 points so they are 5 points away of ending up above our team. What are the chances of Leicester City not getting to 59 points? They need one win and two ties which will not be very hard since Leicester has been playing well this season. So we will consider them in. There’s now only one spot left.
Chelsea has 51 points so they only need 2 wins and a couple of ties to finish above Arsenal. That’s harder but doable.
To boot, Manchester United, Wolves and Sheffield United are also above Arsenal and playing well. My guess is one of these 4 teams (Chelsea, Man U, Wolves and Sheffield United) takes the last spot and we are left with the consolation price: the Europa League.

Conclusion

We went through quite a lot in this article.

For my fellow gooners the TL;DR is that our team will probably be in the Europa League (again!) next season; disappointing but give Mikel time, he just got here (end of last year) and I think he’s the right guy for Arsenal right now. He just needs time.

For my fellow ML enthusiasts, we went through a full cycle of ML model deployment: downloading a dataset, visualizing a few dimensions of the dataset, preprocessing the data, creating training and test sets, creating a neural network, training the neural network, testing it with test data where we discover that we are overfitting to our training data, saving the model to a file system, deploying the model as an API using TensorFlow model serving, creating an object to send to the api, receive all the predictions and interpret them. Whew, that’s a lot!

You can take a lot at this Jupyter notebook in github.

In future posts we will look at:

How we can try to reduce the overfitting we observed.
Try different ML models to see if those models are better than our current Neural Network.

Thanks for reading!! Please don’t forget to clap below if you feel inclined to do so.