# PyTorch tips and tricks: from tensors to Neural Networks

## A deep dive into PyTorch and how to build Neural Networks from scratch.

Deep Learning is one of the *coolest *topic of the moment and has considerably increased the interest of the world in AI. Nowadays, deep learning is used everywhere and can create new business opportunities and improve the technology world.

For these reasons, several deep learning frameworks have been developed in the last 10 years and PyTorch has become one of the most popular of them, despite being one of the youngest*.*

In this post I will simply explain how PyTorch works and how to build a neural network from scratch.

**PyTorch: Getting started**

PyTorch is a deep learning framework that provides maximum flexibility and speed during implementing and building deep neural network architectures and it is completely open source. Its popularity has increased in the last few years thanks to its deep tie to Python. In fact, PyTorch’s approach in building Machine Learning models is partially similar to the approach of both Numpy and Scikit-Learn. This peculiarity can help developers and data scientists to learn to build neural networks with PyTorch faster than with other similar frameworks. Another important feature that distinguishes PyTorch is that the computation graph set up using this framework is dynamic. This means that your neural network can be redefined dynamically during the training phase as there are no separate build and run phases and this makes it easy to debug your models.

Now we will deep dive into 2 fundamental concepts that you have to know in order to build your own neural network using PyTorch: **Tensors **and **Gradient.**

## Tensors 🧱

Tensors are the central data units in PyTorch. They are array-like data structures very similar to Numpy arrays in terms of functions and properties. The most important difference between them is that PyTorch tensors can run on GPU’s devices in order to speed up computation.

You can declare a tensor using the ** Tensor **object:

import torchtensor_uninitialized = torch.Tensor(3, 3)

tensor_uninitialized"""

tensor([[1.7676e-35, 0.0000e+00, 3.9236e-44],

[0.0000e+00, nan, 0.0000e+00],

[1.3733e-14, 1.2102e+25, 1.6992e-07]])

"""

The above code creates an uninitialized tensor of shape 3x3.

We can also create tensors filled with zeros, ones or random values.

tensor_zeros = torch.zeros(3, 3)

tensor_zeros

"""

tensor([[0., 0., 0.],

[0., 0., 0.],

[0., 0., 0.]])

"""tensor_ones = torch.ones(3, 3)

tensor_ones

"""

tensor([[1., 1., 1.],

[1., 1., 1.],

[1., 1., 1.]])

"""tensor_rand = torch.rand(3, 3)

tensor_rand

"""

tensor([[0.6398, 0.3471, 0.6329],

[0.4517, 0.2253, 0.8022],

[0.9537, 0.1698, 0.5718]])

"""

Just like with Numpy arrays, PyTorch allows us to perform mathematical operations between tensors:

x = torch.Tensor([[1, 2, 3],

[4, 5, 6]])tensor_add = torch.add(x, x)"""

tensor([[ 2., 4., 6.],

[ 8., 10., 12.]])

"""tensor_mul = torch.mul(x, x)"""

tensor([[ 1., 4., 9.],

[16., 25., 36.]])

"""

Also other common operations in Numpy arrays, like indexing and slicing, can be achieved with tensors in PyTorch.

## Gradient 📉

Suppose to have 2 parameters **a** and **b**, the gradient is the partial derivative of a parameter computed with respect to the other one. A derivative tells you how much a given quantity changes when you slightly vary some other quantities.

In neural networks the gradient is the partial derivative of the loss function with respect to weights of the model. We just want to find the weights that bring the lowest gradient of the loss function. If you want to deep dive into how gradient descendent works, I suggest you to read my colleague’s post.

PyTorch uses ** Autograd **package inside

*torch*library to track operations on tensors.

By default, a tensor has no gradients associated.

tensor= torch.Tensor([[1, 2, 3],

[4, 5, 6]])

tensor.requires_grad"""False"""

You can enable tracking history on a tensor by calling **requires_grad_ **functions.

tensor.requires_grad_()"""tensor([[1., 2., 3.],

[4., 5., 6.]], requires_grad=True)"""tensor.requires_grad"""True"""

However, there are no gradients yet.

print(tensor.grad)"""None"""

Now, let’s create a new tensor equal to the mean of the element in the previous tensor in order to compute the gradients of the tensor with respect to the new one.

mean_tensor = tensor.mean()

mean_tensor"""tensor(3.5000, grad_fn=<MeanBackward0>)"""

As we can see, there are no gradients yet.

print(tensor.grad)"""None"""

To calculate the gradients, we need to explicitly perform a backward propagation calling the **backward()** function.

`mean_tensor.backward()`

Now the tensor has gradients.

print(tensor.grad)"""tensor([[0.1667, 0.1667, 0.1667],

[0.1667, 0.1667, 0.1667]])"""

**Neural Networks with PyTorch**

We can define our Neural Network as a Python class which extends the *torch.nn.Module* class. In this class we have to define 2 fundamental methods:

is the constructor of the class. Here, we have to define the layers that will compose our network.*__init__()*is where we define the structure of the network and how the layers are connected. This function takes an input that represents the features the model will be trained on.*forward()*

I will show you how to build a simple** Convolutional Neural Network** you can use in classification problems. I will train it on MNIST dataset.

First, we have to import **torch **and all the modules we need.

`import torch`

from torch import nn

import torch.nn.functional as F

import numpy as np

Now, we can create our model.

class My_CNN(nn.Module): def __init__(self): super(My_CNN, self).__init__()

self.conv1 = nn.Conv2d(1, 64, kernel_size=(3, 3), padding=1)

self.conv2 = nn.Conv2d(64, 64, kernel_size=(3, 3), padding=1)

self.avg_pool = nn.AvgPool2d(28)

self.fc1 = nn.Linear(64, 64)

self.fc2 = nn.Linear(64, 10) def forward(self, x): x = F.relu(self.conv1(x))

x = F.relu(self.conv2(x))

x = self.avg_pool(x)

x = x.view(-1, 64)

x = F.relu(self.fc1(x))

x = self.fc2(x)

x = F.softmax(x)

return x

Our CNN is composed by 2 *convolutional *layers, followed by a *global average pooling* layer. At the end, we have 2 *fully-connected* layers and a *softmax *to get the final output probabilities.

We can retrieve the MNIST dataset directly from PyTorch and split the dataset into training set and validation set using PyTorch utilites.

from torchvision.datasets import MNIST

from torch.utils.data import DataLoader

from torch.utils.data.sampler import SubsetRandomSamplermnist = MNIST("data", download=True, train=True)## create training and validation split

split = int(0.8 * len(mnist))

index_list = list(range(len(mnist)))

train_idx, valid_idx = index_list[:split], index_list[split:]## create sampler objects usingSubsetRandomSampler

train = SubsetRandomSampler(train_idx)

valid = SubsetRandomSampler(valid_idx)

We can create iterator objects using ** DataLoader** which provides the ability to batch, shuffle and load the data in parallel using multiprocessing workers.

`train_loader = DataLoader(mnist, batch_size=256, sampler=train)`

valid_loader = DataLoader(mnist, batch_size=256, sampler=valid)

Now we have all the elements to start training our model.

** Adam **will be used as optimizer and

**as loss function.**

*cross entropy*model = My_CNN()optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

loss_function = nn.CrossEntropyLoss()

All PyTorch training loops will go through each epoch and each ** DataPoint** (in the training

*DataLoader**object)*

*.*

epochs = 10for epoch in range(epochs): train_loss, valid_loss = [], []

for data, target in train_loader:

# forward propagation

outputs = model(data) # loss calculation

loss = loss_function(outputs, target) # backward propagation

optimizer.zero_grad()

loss.backward() # weights optimization

optimizer.step() train_loss.append(loss.item()) for data, target in valid_loader:

outputs = model(data)

loss = los_function(outputs, target)

valid_loss.append(loss.item()) print('Epoch: {}, training loss: {}, validation loss: {}'

.format(epoch, np.mean(train_loss), np.mean(valid_loss)))

Once the model is trained, we can get the predictions on the validation data.

In validation phase, we have to loop over data in the validation set as we have done in the training phase. The difference is that we do not need to do a backward propagation of the gradients.

with torch.no_grad():

correct = 0

total = 0

for data, target in valid_loader:

outputs = model(images)

_, predicted = torch.max(outputs.data, 1)

total += labels.size(0)

correct += (predicted == labels).sum().item()print('Validation set Accuracy: {} %'.format(100 * correct / total))

That’s it! Now you are ready to build your own Neural Network. You can try to achieve better performances by increasing the model complexity adding more layers to the network.