An Introduction to PyTorch by working on the Moons dataset using Neural Networks.
“By far the greatest danger of Artificial Intelligence is that people conclude too early that they understand it.”
― Eliezer Yudkowsky
This post is the first in a series of learning how to use deep learning library PyTorch. It is an attempt to help others who are just getting started with artificial intelligence and have trouble with other tutorials that start of at a high level. The approach in these series is to first understand the code behind the built-in functions of PyTorch, this way the theory behind it can be understood better, resulting in better overall knowledge. With every tutorial, more of the custom code will be replaced by the built-in functions, utilizing the true power of PyTorch.
Here’s the code that will be used for this tutorial:
How does PyTorch work?
PyTorch is currently fairly new, as they just recently announced their 1.0 release. The framework has a number of components, but one of the significant components is GPU Utilisation. What this means is that PyTorch can do the same as numpy but then using GPU power to calculate faster, which is very useful for Neural Networks. The important part to mention here is that this only takes one line of code to activate, where Tensorflow and others require a bit more work.
Required packages
- PyTorch
- Pandas
- Numpy
- Matplotlib
- Sklearn
- Seaborn
Getting Acquainted with PyTorch
PyTorch uses Matrix-like structures called Tensors. These tensors can be seen as a generalization of matrices and look like n-dimensional arrays (ndarrays in numpy). Click here if you want a more in-depth explanation about tensors.
To create a Tensor, we first have to import the Pytorch Module, called torch.
import torchx0 = torch.tensor(10) # 0-dimensional tensor (single value)
x1 = torch.tensor([10,2,5,2,10]) # 1-dimensional tensor (vector)
x2 = torch.tensor([[10,2,4],[5,2,10]]) # 2-dimensional tensor (vector)
x3 = torch.tensor([[[10,2],[5,2]], [[3,1],[15,8]]]) # 3-dimensional tensor (matrix)
We create 4 variables x0,x1,x2,x3 all having different shapes, which we can check by calling .shape , just as we do for a normal numpy array.
(torch.Size([]), torch.Size([5]), torch.Size([2, 3]),
torch.Size([2, 2, 2]))
Torch tensors also have types such as:
- torch.LongTensor
- torch.FloatTensor
- torch.DoubleTensor
Watch out: Tensor types need to match when doing calculations with them. If you get errors about a type mismatch, you might need to set the dtype of your tensor.
x1 = torch.tensor([[10,2,4],[5,2,10]])
x2 = torch.tensor([[0.1, 10.67],[0.15,0.22]])
x3 = torch.tensor([[0.1, 10.67],[0.15,0.22]], dtype=torch.double)
Convert numpy to torch tensor
A powerful transformation in PyTorch is the conversion from numpy array to a torch tensor and vice versa.
import numpy as npx_numpy = np.random.randn(10,2)
x_torch = torch.tensor(x_numpy)
type(x_numpy), type(x_torch), type(x_torch.numpy())>>> (numpy.ndarray, torch.Tensor, numpy.ndarray)
Now that you saw some capabilities of PyTorch, lets start with the Moons dataset and explore other PyTorch features.
The Moons dataset
The moons dataset is a simple built-in dataset from scikit-learn. We use a neural network(which we will create ourselves), to tackle this problem.
import matplotlib.pyplot as plt
from sklearn import datasets
# plt.style.use('seaborn')
# %matplotlib inline
We import all the necessary libraries. Note: if you are using Jupyter Notebook, you can uncomment the last two lines so the graphs can be generated in the notebook.
X,y = datasets.make_moons(n_samples=200, shuffle=True, noise=0.2, random_state=1234)
y = np.reshape(y, (len(y),1))
We create two variables, X and y which will store the data points and according labels. We want to have an y array of shape (200,1), so that we can match the data point to the right label, hence the reshape.
Now that we have our data, its time to make our own Neural Network!
Neural network
PyTorch has its own built-in neural network class, but for the purpose of demonstrating and learning, we will build our own. More information about the PyTorch neural net can be found here.
input_size = 2
hidden_size = 3 # randomly chosen
output_size = 1 # we want it to return a number that can be used to calculate the difference from the actual numberclass NeuralNetwork():
def __init__(self, input_size, hidden_size, output_size):# weights
self.W1 = torch.randn(input_size, hidden_size, requires_grad=True)
self.W2 = torch.randn(hidden_size, hidden_size, requires_grad=True)
self.W3 = torch.randn(hidden_size, output_size, requires_grad=True)# Add bias
self.b1 = torch.randn(hidden_size, requires_grad=True)
self.b2 = torch.randn(hidden_size, requires_grad=True)
self.b3 = torch.randn(output_size, requires_grad=True) def forward(self, inputs):
z1 = inputs.mm(self.W1).add(self.b1)
a1 = 1 / (1 + torch.exp(-z1))
z2 = a1.mm(self.W2).add(self.b2)
a2 = 1 / (1 + torch.exp(-z2))
z3 = a2.mm(self.W3).add(self.b3)
output = 1 / (1 + torch.exp(-z3))
return output
There’s a lot going on here, this code can be understood better with a visualisation. Lets start with the __init__ :
This image is a visualisation of the neural network we are building now. The input size(2), matches with the two features(x1,x2) in the image. We have specified the hidden size to be 3, which can be seen as the three vertical blocks or neurons (three per hidden layer) in the image. In the code we have specified W1,W2,W3 , which are matrices, hence the capital W. This means that W1 for example, exists of [w1,1 | w1,2 | w1,3 | w2,1 | w2,2 | w2,3] . Also, a bias is added.
z1 = inputs.mm(self.W1).add(self.b1)
a1 = 1 / (1 + torch.exp(-z1))
The forward function can be calculated by taking the sum of X times W1 together with the bias. In PyTorch we use the function “.mm” short for matrix-multiplication(since X1 and W1 are matrices) and we add the bias to it. Then we use an activation function, in this case the sigmoid to introduce nonlinearity in the model. This is done so the model can learn more complex relationships in the data. This output is then used as input for the next layer and so on.
Now the code we got so far only does half the job, it enables us to feed the input through the network, but the network needs to adapt to get better results, which is accomplished by backward-propagation.
epochs = 10000
learning_rate = 0.005model = NeuralNetwork(input_size, hidden_size, output_size)
inputs = torch.tensor(X, dtype=torch.float)
labels = torch.tensor(y, dtype=torch.float)#store all the loss values
losses = []
We set the variable “model” , passing it the input size of 2, the hidden size of 3 and the output size of 1. The inputs and labels are in this case both floats (x and y values). We create a variable “losses”, so we can later on show how the loss developed along the road.
Then, we create a loop for the backpropagation:
for epoch in range(epochs):# forward function
output = model.forward(inputs)#BinaryCrossEntropy formula
loss = -((labels * torch.log(output)) + (1 - labels) * torch.log(1 - output)).sum()#Log the log so we can plot it later
losses.append(loss.item())#calculate the gradients of the weights wrt to loss
loss.backward()#adjust the weights based on the previous calculated gradients
model.W1.data -= learning_rate * model.W1.grad
model.W2.data -= learning_rate * model.W2.grad
model.W3.data -= learning_rate * model.W3.grad
model.b1.data -= learning_rate * model.b1.grad
model.b2.data -= learning_rate * model.b2.grad
model.b3.data -= learning_rate * model.b3.grad#clear the gradients so they wont accumulate
model.W1.grad.zero_()
model.W2.grad.zero_()
model.W3.grad.zero_()
model.b1.grad.zero_()
model.b2.grad.zero_()
model.b3.grad.zero_()print("Final loss: ", losses[-1])
plt.plot(losses)
First, we pass the inputs through the model, which gives us a certain output. PyTorch has a built-in function to calculate the BinaryCrossEntropy, but to understand it more, we built it ourselves. The backward-propagation can be easily calculated by PyTorch by calling .backward(), automatically calculating all the gradients. So now we have the direction in which the weights should be changed in order to more accurately predict the outcome of a certain input, we have to actually update them.
model.W1.data -= learning_rate * model.W1.grad
Here we say that all the weights of W1 should be decreased by (learning rate * gradient). After doing that, we need to set the calculated gradients to zero, otherwise these will add up and mess up our network.
Finally, we can plot the loss to see how we performed.
Improvements
To test how good you understand the implementation of PyTorch, I suggest adding more layers, or changing the hidden size. Adding another activation function is also a possibility.
Conclusions
This article showed how to get started with PyTorch by using low-level built-in functions. In the next tutorials, we will use more of the built-in functions of PyTorch. The goal of this article was to better understand how a neural network can be implemented.
In the next series, we will use PyTorch and Neural nets on the Titanic dataset.