PyTorch tips and tricks: from tensors to Neural Networks

A deep dive into PyTorch and how to build Neural Networks from scratch.

Giorgio Pilotti
May 4 · 6 min read

Deep Learning is one of the coolest topic of the moment and has considerably increased the interest of the world in AI. Nowadays, deep learning is used everywhere and can create new business opportunities and improve the technology world.

For these reasons, several deep learning frameworks have been developed in the last 10 years and PyTorch has become one of the most popular of them, despite being one of the youngest.

In this post I will simply explain how PyTorch works and how to build a neural network from scratch.

PyTorch: Getting started

PyTorch is a deep learning framework that provides maximum flexibility and speed during implementing and building deep neural network architectures and it is completely open source. Its popularity has increased in the last few years thanks to its deep tie to Python. In fact, PyTorch’s approach in building Machine Learning models is partially similar to the approach of both Numpy and Scikit-Learn. This peculiarity can help developers and data scientists to learn to build neural networks with PyTorch faster than with other similar frameworks. Another important feature that distinguishes PyTorch is that the computation graph set up using this framework is dynamic. This means that your neural network can be redefined dynamically during the training phase as there are no separate build and run phases and this makes it easy to debug your models.

Now we will deep dive into 2 fundamental concepts that you have to know in order to build your own neural network using PyTorch: Tensors and Gradient.

Tensors are the central data units in PyTorch. They are array-like data structures very similar to Numpy arrays in terms of functions and properties. The most important difference between them is that PyTorch tensors can run on GPU’s devices in order to speed up computation.

You can declare a tensor using the Tensor object:

import torchtensor_uninitialized = torch.Tensor(3, 3)
tensor([[1.7676e-35, 0.0000e+00, 3.9236e-44],
[0.0000e+00, nan, 0.0000e+00],
[1.3733e-14, 1.2102e+25, 1.6992e-07]])

The above code creates an uninitialized tensor of shape 3x3.

We can also create tensors filled with zeros, ones or random values.

tensor_zeros = torch.zeros(3, 3)
tensor([[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]])
tensor_ones = torch.ones(3, 3)
tensor([[1., 1., 1.],
[1., 1., 1.],
[1., 1., 1.]])
tensor_rand = torch.rand(3, 3)
tensor([[0.6398, 0.3471, 0.6329],
[0.4517, 0.2253, 0.8022],
[0.9537, 0.1698, 0.5718]])

Just like with Numpy arrays, PyTorch allows us to perform mathematical operations between tensors:

x = torch.Tensor([[1, 2, 3],
[4, 5, 6]])
tensor_add = torch.add(x, x)"""
tensor([[ 2., 4., 6.],
[ 8., 10., 12.]])
tensor_mul = torch.mul(x, x)"""
tensor([[ 1., 4., 9.],
[16., 25., 36.]])

Also other common operations in Numpy arrays, like indexing and slicing, can be achieved with tensors in PyTorch.

Suppose to have 2 parameters a and b, the gradient is the partial derivative of a parameter computed with respect to the other one. A derivative tells you how much a given quantity changes when you slightly vary some other quantities.

In neural networks the gradient is the partial derivative of the loss function with respect to weights of the model. We just want to find the weights that bring the lowest gradient of the loss function. If you want to deep dive into how gradient descendent works, I suggest you to read my colleague’s post.

PyTorch uses Autograd package inside torch library to track operations on tensors.

By default, a tensor has no gradients associated.

tensor= torch.Tensor([[1, 2, 3],
[4, 5, 6]])

You can enable tracking history on a tensor by calling requires_grad_ functions.

tensor.requires_grad_()"""tensor([[1., 2., 3.],
[4., 5., 6.]], requires_grad=True)

However, there are no gradients yet.


Now, let’s create a new tensor equal to the mean of the element in the previous tensor in order to compute the gradients of the tensor with respect to the new one.

mean_tensor = tensor.mean()
"""tensor(3.5000, grad_fn=<MeanBackward0>)"""

As we can see, there are no gradients yet.


To calculate the gradients, we need to explicitly perform a backward propagation calling the backward() function.


Now the tensor has gradients.

print(tensor.grad)"""tensor([[0.1667, 0.1667, 0.1667],
[0.1667, 0.1667, 0.1667]])

Neural Networks with PyTorch

We can define our Neural Network as a Python class which extends the torch.nn.Module class. In this class we have to define 2 fundamental methods:

  • __init__() is the constructor of the class. Here, we have to define the layers that will compose our network.
  • forward() is where we define the structure of the network and how the layers are connected. This function takes an input that represents the features the model will be trained on.

I will show you how to build a simple Convolutional Neural Network you can use in classification problems. I will train it on MNIST dataset.

First, we have to import torch and all the modules we need.

import torch
from torch import nn
import torch.nn.functional as F
import numpy as np

Now, we can create our model.

class My_CNN(nn.Module):   def __init__(self):       super(My_CNN, self).__init__()
self.conv1 = nn.Conv2d(1, 64, kernel_size=(3, 3), padding=1)
self.conv2 = nn.Conv2d(64, 64, kernel_size=(3, 3), padding=1)
self.avg_pool = nn.AvgPool2d(28)
self.fc1 = nn.Linear(64, 64)
self.fc2 = nn.Linear(64, 10)
def forward(self, x): x = F.relu(self.conv1(x))
x = F.relu(self.conv2(x))
x = self.avg_pool(x)
x = x.view(-1, 64)
x = F.relu(self.fc1(x))
x = self.fc2(x)
x = F.softmax(x)

return x

Our CNN is composed by 2 convolutional layers, followed by a global average pooling layer. At the end, we have 2 fully-connected layers and a softmax to get the final output probabilities.

We can retrieve the MNIST dataset directly from PyTorch and split the dataset into training set and validation set using PyTorch utilites.

from torchvision.datasets import MNIST
from import DataLoader
from import SubsetRandomSampler
mnist = MNIST("data", download=True, train=True)## create training and validation split
split = int(0.8 * len(mnist))
index_list = list(range(len(mnist)))
train_idx, valid_idx = index_list[:split], index_list[split:]
## create sampler objects using SubsetRandomSampler
train = SubsetRandomSampler(train_idx)
valid = SubsetRandomSampler(valid_idx)

We can create iterator objects using DataLoader which provides the ability to batch, shuffle and load the data in parallel using multiprocessing workers.

train_loader = DataLoader(mnist, batch_size=256, sampler=train)
valid_loader = DataLoader(mnist, batch_size=256, sampler=valid)

Now we have all the elements to start training our model.

Adam will be used as optimizer and cross entropy as loss function.

model = My_CNN()optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
loss_function = nn.CrossEntropyLoss()

All PyTorch training loops will go through each epoch and each DataPoint (in the training DataLoader object).

epochs = 10for epoch in range(epochs):  train_loss, valid_loss = [], []
for data, target in train_loader:

# forward propagation
outputs = model(data)
# loss calculation
loss = loss_function(outputs, target)
# backward propagation
# weights optimization
train_loss.append(loss.item()) for data, target in valid_loader:
outputs = model(data)
loss = los_function(outputs, target)
print('Epoch: {}, training loss: {}, validation loss: {}'
.format(epoch, np.mean(train_loss), np.mean(valid_loss)))

Once the model is trained, we can get the predictions on the validation data.

In validation phase, we have to loop over data in the validation set as we have done in the training phase. The difference is that we do not need to do a backward propagation of the gradients.

with torch.no_grad():
correct = 0
total = 0
for data, target in valid_loader:
outputs = model(images)
_, predicted = torch.max(, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Validation set Accuracy: {} %'.format(100 * correct / total))

That’s it! Now you are ready to build your own Neural Network. You can try to achieve better performances by increasing the model complexity adding more layers to the network.

I hope you liked this post. If you want to read other interesting contents written by my colleagues at Quantyca, follow us on Medium and LinkedIn.


Quantyca — Data at Core

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store