My First Work With PyTorch

With this example, I make an introduction in Deep Learning by Pytorch.

Oscar Rojo
The Startup
10 min readJul 30, 2020

--

Facebook launched PyTorch 1.0 early this year with integrations for Google Cloud, AWS, and Azure Machine Learning. In this example, I assume that you’re already familiar with Scikit-learn, Pandas, NumPy, and SciPy. These packages are important prerequisites for this tutorial.

Photo by Pradnyal Gandhi on Unsplash

What is PyTorch?

It’s a Python-based scientific computing package targeted at two sets of audiences:

A replacement for NumPy to use the power of GPUs
a deep learning research platform that provides maximum flexibility and speed

First, we need to cover a few basic concepts that may throw you off-balance if you don’t grasp them well enough before going full-force on modeling.

In Deep Learning, we see tensors everywhere. Well, Google’s framework is called TensorFlow for a reason! What is a tensor, anyway?

Tensor

In Numpy, you may have an array that has three dimensions, right? That is, technically speaking, a tensor.

A scalar (a single number) has zero dimensions, a vector has one dimension, a matrix has two dimensions and a tensor has three or more dimensions. That’s it!

But, to keep things simple, it is commonplace to call vectors and matrices tensors as well — so, from now on, everything is either a scalar or a tensor.

Imports and Dataset

For this simple example we’ll use only a couple of libraries:

  • Pandas: for data loading and manipulation
  • Scikit-learn: for train-test split
  • Matplotlib: for data visualization
  • PyTorch: for model training Here are the imports if you just want to copy/paste:
import torch
import torch.nn as nn
import torch.nn.functional as F
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

As for the dataset, the Beer dataset, it can be found on this URL. https://www.kaggle.com/jtrofe/beer-recipes

Prepare folder, files and download dataset form Kaggle:

This is a dataset of 75,000 homebrewed beers with over 176 different styles. Beer records are user-reported and are classified according to one of the 176 different styles. These recipes go into as much or as little detail as the user provided, but there’s are least 5 useful columns where data was entered for each: Original Gravity, Final Gravity, ABV, IBU, and Color.

We’ll use the linux terminal:

Remove directorys and files

! rm -r input/ ! mkdir input/ ! cd input/

Show directory

! ls

Download Dataset

! kaggle datasets download -d jtrofe/beer-recipes

Unzip Dataset

! unzip beer-recipes.zip

Move zip file

!mv beer-recipes.zip input/beer.zip

Move csv file

!mv recipeData.csv input/recipeDate.csv !mv styleData.csv

Show folder

! ls input/

Post- ETL

We are going to use a clean dataset.

Here’s how to import it in Pandas directly:

beer = pd.read_csv('MyData.csv')
beer.head()
png

What we want to do now is to change, or remap, values from the Name column to something numeric — let’s say 0, 1, 2, 3, 4. Here’s how to do so:

mappings = {
'IPA': 0,
'PALE': 1,
'ALE': 2,
'PORTER': 3,
'STOUT':4
}
beer['Clase'] = beer['Clase'].apply(lambda x: mappings[x])

Executing the code from above results in the following DataFrame:

beer.head()
png

Which means we’re good to proceed!

Train/Test Split

In this section, we’ll use the Scikit-Learn library to do a train/test split.

Afterward, we’ll convert split data from Numpy arrays to PyTorch tensors.

Let’s see how.

To start out, we need to split the beer dataset into features and target — or X and y. The column Name will be the target variable and everything else will be a feature (or predictor).

I will also be using a random seed, so you are able to reproduce my results.

Here’s the code:

X = beer.drop('Clase', axis=1).values
y = beer['Clase'].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Random state ensures that the splits that you generate are reproducible.
# Scikit-learn uses random permutations to generate the splits.
# The random state that you provide is used as a seed to the random number generator.
# This ensures that the random numbers are generated in the same order.
X_train = torch.FloatTensor(X_train)
X_test = torch.FloatTensor(X_test)
y_train = torch.LongTensor(y_train)
y_test = torch.LongTensor(y_test)

If you were now to check the first and last 3 rows from X_train you’d get this:

X_traintensor([[ 5.7700, 29.0000, 35.9900,  4.6900],
[33.6000, 62.0000, 46.5400, 5.6600],
[ 9.4700, 35.0000, 71.3500, 6.0300],
...,
[ 6.4600, 22.0000, 26.6400, 4.0100],
[31.4400, 24.0000, 22.6200, 4.6900],
[ 6.5200, 66.0000, 67.7700, 6.6700]])

Same goes for the y_train:

y_traintensor([1, 4, 0, 0, 1, 0, 0, 3, 0, 3, 0, 3, 1, 1, 0, 0, 0, 0, 4, 1, 4, 0, 1, 0,
1, 1, 1, 2, 4, 1, 1, 0, 0, 0, 0, 2, 0, 0, 0, 1, 1, 1, 1, 0, 0, 4, 4, 0,
0, 4, 1, 2, 4, 1, 1, 0, 2, 1, 1, 1, 4, 0, 0, 0, 0, 1, 0, 0, 0, 2, 4, 0,
4, 0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 4, 0, 3, 0, 0, 0, 0, 2, 0, 1, 0, 3,
2, 4, 4, 1, 4, 1, 0, 1, 0, 0, 0, 4, 0, 1, 0, 0, 4, 1, 4, 0, 0, 0, 1, 0,
3, 0, 0, 1, 4, 0, 1, 0, 0, 1, 0, 1, 0, 0, 0, 4, 4, 0, 1, 2, 1, 0, 4, 1,
0, 3, 4, 0, 0, 0, 1, 2, 1, 1, 2, 0, 4, 0, 0, 4, 1, 1, 0, 0, 1, 0, 0, 1,
0, 1, 1, 0, 2, 3, 3, 3, 1, 4, 2, 0, 2, 0, 2, 4, 2, 4, 1, 0, 1, 0, 1, 0,
1, 0, 0, 1, 1, 4, 1, 0, 1, 0, 0, 2, 4, 0, 4, 4, 0, 0, 3, 0, 1, 0, 1, 0,
1, 0, 1, 3, 1, 1, 1, 0, 1, 4, 4, 0, 2, 0, 4, 0, 0, 1, 4, 0, 0, 0, 0, 0,
0, 0, 1, 0, 0, 1, 0, 1, 4, 1, 0, 0, 0, 3, 0, 3, 1, 0, 1, 0, 1, 0, 4, 2,
0, 4, 0, 0, 0, 4, 1, 1, 1, 0, 0, 0, 3, 0, 0, 1, 3, 1, 4, 2, 0, 0, 2, 1,
0, 0, 0, 1, 2, 0, 0, 0, 3, 0, 2, 0, 3, 0, 0, 3, 4, 1, 1, 0, 1, 0, 2, 0,
1, 0, 2, 4, 0, 0, 0, 1, 0, 0, 2, 3, 4, 3, 1, 0, 4, 0, 0, 3, 0, 1, 1, 1,
4, 1, 1, 1, 4, 4, 0, 0, 1, 0, 0, 4, 4, 4, 0, 0, 0, 0, 4, 0, 0, 0, 4, 4,
4, 3, 0, 0, 0, 0, 0, 1, 3, 0, 4, 0, 0, 1, 0, 0, 0, 0, 2, 3, 0, 0, 1, 0,
0, 0, 0, 0, 2, 1, 2, 0, 4, 4, 0, 0, 2, 0, 1, 4, 0, 2, 0, 3, 2, 1, 0, 4,
4, 0, 0, 1, 4, 1, 0, 0, 4, 1, 0, 2, 3, 4, 4, 0, 0, 1, 4, 0, 3, 4, 1, 4,
2, 0, 0, 0, 2, 3, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 2, 0, 1, 0, 3,
4, 0, 0, 1, 2, 0, 1, 0, 0, 3, 0, 0, 2, 1, 1, 4, 0, 0, 4, 0, 0, 3, 0, 0,
4, 0, 0, 1, 1, 4, 1, 3, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 1, 4, 0,
1, 1, 0, 2, 3, 2, 4, 4, 1, 4, 2, 1, 4, 1, 4, 0, 3, 1, 2, 1, 0, 1, 0, 0,
0, 3, 1, 0, 1, 4, 0, 0, 0, 1, 1, 1, 0, 1, 1, 4, 1, 3, 0, 0, 0, 3, 0, 0,
1, 1, 0, 0, 1, 0, 1, 0, 0, 4, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 2,
4, 0, 0, 0, 0, 1, 4, 0, 1, 3, 0, 4, 0, 3, 4, 0, 1, 1, 0, 0, 1, 0, 0, 1,
0, 0, 1, 4, 0, 0, 0, 0, 0, 0, 1, 0, 2, 0, 0, 0, 0, 1, 3, 0, 0, 2, 0, 0,
1, 1, 1, 0, 4, 4, 0, 0, 0, 1, 4, 0, 0, 1, 1, 0, 4, 0, 0, 0, 3, 0, 0, 1,
0, 2, 4, 2, 1, 0, 0, 1, 1, 1, 0, 4, 4, 0, 0, 0, 0, 3, 0, 0, 0, 1, 1, 2,
0, 0, 2, 3, 0, 4, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 2, 4, 0, 0,
1, 1, 3, 0, 0, 0, 0, 0, 0, 1, 3, 0, 0, 1, 2, 4, 0, 0, 0, 1, 1, 1, 0, 0,
0, 2, 0, 1, 3, 1, 0, 3, 0, 3, 0, 0, 0, 4, 1, 4, 0, 0, 1, 0, 0, 0, 0, 1,
0, 0, 0, 2, 3, 0, 0, 1, 4, 1, 0, 2, 2, 1, 1, 1, 0, 4, 0, 4, 1, 0, 2, 1,
0, 4, 1, 1, 1, 0, 3, 4, 0, 0, 1, 2, 0, 1, 1, 2, 4, 1, 0, 3, 0, 4, 2, 2,
1, 2, 0, 2, 0, 1, 4, 0])

We now have everything needed to create a Neural Networks — let’s do so in the next section.

Defining a Neural Network Model

As for the architecture of the model, it will be very simple. Let’s see how the network will be structured:

  • Fully Connected Layer (4 input features (number of features in X), 16 output features (arbitrary))
  • Fully Connected Layer (16 input features (number of output features from the previous layer), 12 output features (arbitrary))
  • Output Layer (12 input features (number of output features from the previous layer), 3 output features (number of distinct classes))

And that’s pretty much it. Besides that, we’ll use ReLU for our activation function. Let’s see how to implement this in code:

class ANN(nn.Module):
def __init__(self):
super().__init__()
self.fc1 = nn.Linear(in_features=4, out_features=16)
self.fc2 = nn.Linear(in_features=16, out_features=12)
self.output = nn.Linear(in_features=12, out_features=5)

def forward(self, x):
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.output(x)
return x

PyTorch uses this object-orientated way of declaring models, and it’s fairly intuitive. In the constructor, you will define all the layers and their architecture, and in the forward() method you will define a forward pass.

As simple as that.

Let’s now make an instance of the model and verify that its architecture matches the one we specified above:

model = ANN()
model
ANN(
(fc1): Linear(in_features=4, out_features=16, bias=True)
(fc2): Linear(in_features=16, out_features=12, bias=True)
(output): Linear(in_features=12, out_features=5, bias=True)
)

Great. Before we can train the model, there’s a couple of more things we need to declare:

* Criterion: basically how we measure loss, we’ll use CrossEntropyLoss

* Optimizer: optimization algorithm, we’ll use Adam with a learning rate of 0.01

Here’s how to implement it in code:

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)

And now the part we’ve been waiting for — model training!

Model Training

This part will also be extremely simple. We’ll train the model for 100 epochs, keeping track of time and loss. Every 10 epochs we’ll output to the console the current status — indicating on which epoch are we and what’s the current loss.

Here’s the code:

%%time

epochs = 100
loss_arr = []

for i in range(epochs):
y_hat = model.forward(X_train)
loss = criterion(y_hat, y_train)
loss_arr.append(loss)


if i % 10 == 0:
print(f'Epoch: {i} Loss: {loss}')

optimizer.zero_grad()
loss.backward()
optimizer.step()
Epoch: 0 Loss: 4.111096382141113
Epoch: 10 Loss: 1.1924655437469482
Epoch: 20 Loss: 0.9314764142036438
Epoch: 30 Loss: 0.8135778903961182
Epoch: 40 Loss: 0.7460428476333618
Epoch: 50 Loss: 0.7165687084197998
Epoch: 60 Loss: 0.7003863453865051
Epoch: 70 Loss: 0.6872875094413757
Epoch: 80 Loss: 0.6770954728126526
Epoch: 90 Loss: 0.6677737236022949
CPU times: user 574 ms, sys: 37.6 ms, total: 612 ms
Wall time: 319 ms

That was fast — please don’t get used to that feeling. If plain numbers mean absolutely nothing to you, here’s a visualization of our loss (epoch number on the x-axis and loss on the y-axis):

plt.title('Loss VS Epoch')
plt.xlabel("Loss")
plt.xlabel("Epoch")
plt.plot(loss_arr)
png

So, we’ve trained the model, but what now? We need to evaluate it on the previously unseen data somehow. Stay here for a minute more and you’ll find out how.

Model Evaluation

In the evaluation process, we want to somehow keep track of predictions made by the model. We’ll need to iterate over the X_test and make a prediction, and then later compare it to the actual value.

We will use torch.no_grad() here because we’re just evaluating — there’s no need to update weights and biases.

Anyway, here’s the code:

preds = []
with torch.no_grad():
for val in X_test:
y_hat = model.forward(val)
preds.append(y_hat.argmax().item())

The predictions are now stored in the preds array. We can now make a Pandas DataFrame with the following 3 attributes:

  • Y: actual value
  • YHat: predicted value
  • Correct: flag, 1 indicating Y and YHat match, 0 otherwise

Here’s the code:

df = pd.DataFrame({'Y': y_test, 'YHat': preds})
df['Correct'] = [1 if corr == pred else 0 for corr, pred in zip(df['Y'], df['YHat'])]

The first 10 rows of the df will look like this:

df.head(n=10)
png

That’s all great, but how to actually calculate accuracy?

Well it’s simple — we only need to sum up the Correct column and divide it with the length of df:

df['Correct'].sum() / len(df)0.64

The accuracy of our model is 64%.

Conclusion

And there you have it, it was easy.

I hope it will help you to develop your training.

Never give up!

See you in Linkedin!

--

--

Oscar Rojo
The Startup

Master in Data Science. Passionate about learning new skills. Former branch risk analyst. https://www.linkedin.com/in/oscar-rojo-martin/. www.oscarrojo.es