PyTorch Neural Networks to predict matches results in soccer championships — Part II

André Luiz França Batista
8 min readJun 10, 2019

--

In the Part I we discussed how to collect and prepare our dataset to be used in the PyTorch models.

In this second part of the tutorial we will focus our attention to create and setup the neural networks models. Then we will train and test the models and after that we will check the outcomes.

As in the Part I, we are still using Python as programming language. For the neural networks models (training and testing) we will use the PyTorch platform.

> What is PyTorch?

According to the official website, PyTorch it’s a Python-based scientific computing package targeted at two sets of audiences:

  • A replacement for NumPy to use the power of GPUs;
  • A deep learning research platform that provides maximum flexibility and speed.

One of the greatest things in PyTorch is the Tensors. Tensors are similar to NumPy’s ndarrays, with the addition being that Tensors can also be used on a GPU to accelerate computing.

Example of how to construct a randomly initialized matrix:

import torch
x = torch.rand(5, 3)
print(x)

Out:

tensor([[0.1995, 0.7574, 0.3930],
[0.5963, 0.7576, 0.7534],
[0.9572, 0.2387, 0.2217],
[0.8329, 0.5595, 0.1679],
[0.0331, 0.8475, 0.8253]])

> Installing PyTorch

Install PyTorch is just easy. Just follow the instructions available in the Get Started section from PyTorch’s official website. It runs on Windows, Mac and Linux.

> First steps

Recapping what is an Artificial neural networks (ANN). According to Wikipedia:

Artificial Neural Networks are computing systems that are inspired by, but not necessarily identical to, the biological neural networks that constitute animal brains. Such systems “learn” to perform tasks by considering examples, generally without being programmed with any task-specific rules.

An ANN is based on a collection of connected units or nodes called artificial neurons, which loosely model the neurons in a biological brain. Each connection, like the synapses in a biological brain, can transmit a signal from one artificial neuron to another. An artificial neuron that receives a signal can process it and then signal additional artificial neurons connected to it.

The image above show us a example of neural network diagram.

Souce: https://www.learnopencv.com/understanding-feedforward-neural-networks/

> Build a feedforward neural network

Now let’s look at how to build a simple feedforward network model. The feedforward models have hidden layers in between the input and the output layers. After every hidden layer, an activation function is applied to introduce non-linearity. Below is an example feedforward model.

import torchclass Net(torch.nn.Module):
def __init__(self, input_size, hidden_size):
super(Net, self).__init__()
self.input_size = input_size
self.hidden_size = hidden_size
self.fc1 = torch.nn.Linear(self.input_size, self.hidden_size)
self.relu = torch.nn.ReLU()
self.fc2 = torch.nn.Linear(self.hidden_size, 1)
self.sigmoid = torch.nn.Sigmoid()
def forward(self, x):
hidden = self.fc1(x)
relu = self.relu(hidden)
output = self.fc2(relu)
output = self.sigmoid(output)
return output

As you can see in this example, we create a model with just one hidden layer but you can add as many hidden layers as you want. When we have more than two hidden layers, the model is also called the deep/multilayer feedforward model or multilayer perceptron model (MLP). You can find more information about the torch.nn.Module here.

After the hidden layer, we use ReLU as activation before the information is sent to the output layer. This is to introduce non-linearity to the linear output from the hidden layer as mentioned earlier. What ReLU does here is that if the function is applied to a set of numerical values, any negative value will be converted to 0 otherwise the values stay the same. For example, if the input set is [-2,0,3,-4,7] then the function will return [0,0,3,0,7].

We used Sigmoid as an output activation function. The prediction we want to make is binary classification task as we mentioned in the Part I, meaning we have binary categories to predict from. Sigmoid is the good function to use because it calculates the probability(ranging between 0 and 1) of the target output being label 1. The choice of the activation function depends on your task.

Now, let’s use this model we’ve built to create a predictor to soccer matches.

> Training the model

A typical training procedure for a neural network is as follows:

  • Define the neural network that has some learnable parameters (or weights).
  • Iterate over a dataset of inputs.
  • Process input through the network.
  • Compute the loss (how far is the output from being correct).
  • Propagate gradients back into the network’s parameters.
  • Update the weights of the network, typically using a simple update rule: weight = weight — learning_rate * gradient.

Data

Our data is ready to go since the Part I. We have four dataframes (training_input, training_output, test_input, test_output). To use this sets in PyTorch we may need convert these dataframes into Tensors. Its easy to do with torch.FloatTensor().

Input:

#convert to tensors
training_input = torch.FloatTensor(training_input.values)
training_output = torch.FloatTensor(training_output.values)
test_input = torch.FloatTensor(test_input.values)
test_output = torch.FloatTensor(test_output.values)

Setup the Model, Criterion, Optimizer

Let’s define the model with input dimension equals to the number of features that we have. For the hidden dimension let’s define as 30. For the loss function (criterion), we’re using BCELoss() (Binary Cross Entropy Loss) since our task is to classify binary results. The optimizer that we’re using is SGD (Stochastic Gradient Descent) with learning rate 0.9. Alternatively, you can use the SGD optimizer with the momentum parameter as well. It’s up to you.

Input:

input_size = training_input.size()[1] # number of features selected
hidden_size = 30 # number of nodes/neurons in the hidden layer
model = Net(input_size, hidden_size) # create the model
criterion = torch.nn.BCELoss() # works for binary classification
# without momentum parameter
optimizer = torch.optim.SGD(model.parameters(), lr = 0.9)
#with momentum parameter
optimizer = torch.optim.SGD(model.parameters(), lr = 0.9, momentum=0.2)

Training the model

To check how the model is improving, we can check the test loss before the model training and compare it with the test loss after the training.

Input:

model.eval()
y_pred = model(test_input)
before_train = criterion(y_pred.squeeze(), test_output)
print('Test loss before training' , before_train.item())

The mode.eval() here sets the PyTorch module to evaluation mode. In this mode the model do not learn new weights. We just want to see the loss before training. To train, we should switch the mode back to training mode.

Input:

model.train()
epochs = 5000
errors = []
for epoch in range(epochs):
optimizer.zero_grad()
# Forward pass
y_pred = model(training_input)
# Compute Loss
loss = criterion(y_pred.squeeze(), training_output)
errors.append(loss.item())
print('Epoch {}: train loss: {}'.format(epoch, loss.item()))
# Backward pass
loss.backward()
optimizer.step()

Let’s start training. First we switch the module mode to model.train() so that new weights can be learned after every epoch. Then we define the number of epochs for training to 5000 (you can change this as you wish). optimizer.zero_grad() sets the gradients to zero before we start backpropagation. This is a necessary step because PyTorch accumulates the gradients from the backward passes from the previous epochs. Also we set a array of errors to keep the loss for each epoch and plot a chart in the end of the process.

After the forward pass and the loss computation, we perform backward pass by calling loss.backward(), which computes the gradients. Then optimizer.step() updates the weights accordingly. The training process may took a while depending of how many epochs of training you set.

> Testing the model

We are almost there! Now, the training is done. Let’s see how the test loss changed after the training. Again, we switch the module mode back to the evaluation mode and check the test loss as the example below.

Input:

model.eval()
y_pred = model(test_input)
after_train = criterion(y_pred.squeeze(), test_output)
print('Test loss after Training' , after_train.item())

You must be wondering if our model has generated good or bad predictions. Let’s check it out plotting some charts using matplotlib library.

import matplotlib.pyplot as plt
import numpy as np
def plotcharts(errors):
errors = np.array(errors)
plt.figure(figsize=(12, 5)) graf02 = plt.subplot(1, 2, 1) # nrows, ncols, index
graf02.set_title('Errors')
plt.plot(errors, '-')
plt.xlabel('Epochs')
graf03 = plt.subplot(1, 2, 2)
graf03.set_title('Tests')
a = plt.plot(test_output.numpy(), 'yo', label='Real')
plt.setp(a, markersize=10)
a = plt.plot(y_pred.detach().numpy(), 'b+', label='Predicted')
plt.setp(a, markersize=10)
plt.legend(loc=7)
plt.show()
plotcharts(errors)

Oh the charts! One of the greatest things to visualize data and results! Let’s check it out!

Those charts are very simple and it’s only to facilitate the visualisation of the results. We do not intend to use the charts to comprove or disprove the model’s efficiency.

The first chart is the errors progress for each training epoch. Look how it decrease over the training process. The second chart represents the real values (Win at x = 1, Draw-Defeat at x = 0) with yellow circles; and the predicted values with blue crosses. The closer the blue cross is to the yellow circle, the more this prediction is accurated. If the blue cross in inside the yellow circle, or very close to it, meaning that the model predicted the result correctly. In other way, if the blue cross is distant from the yellow circle, it means the model made a bad prediction about this match.

In this particular case, our model made 15 good predictions and 5 not so good. Our model hits 75% (15/20) of the match results.

> Things you should try

In order to improve your predictions, here is a list of parameters you should try to change:

  • Learning rate: float values between 0 and 1.
  • Momentum rate: float values between 0 and 1.
  • Number of hidden layers: change the structure of your model adding more hidden layers.
  • Number of neurons/nodes in the hidden layer: integer values between 1 and ‘your imagination’.
  • Epochs: integer values between 1 and ‘your level of patience’ to wait the training process to be finished.

Important note:

As we mentioned earlier, this example illustrate a model that predict if a team will win a match OR draw-defeat. Run others models in order to predict the another two classes, i.e. if a team will loss a match OR win-draw; and if a team will draw a match OR win-defeat. In order to do this correctly remember to change the convert_output_win() function properly.

There is a number of different hyperparameter and model selection techniques popularly used but this is the general idea behind it. In the end, you can select the hyperparameters and the model structure that gives you the best performance.

> And that’s it!

In this two-part tutorial we covered a lot of things:

  • Collecting and preparing the data;
  • Build and setup the neural network model;
  • Training and testing the model;
  • Checking the results.

We hope you have learned how to use a neural network to predict the results of a sports match.

print("See you next time!")

--

--

André Luiz França Batista

/* Computer Science professor at Federal Institute of Triangulo Mineiro. Interested in Artificial Intelligence, Data Science, Games and Web development. */