A Beginners Guide to PyTorch

5 min readMay 15, 2024

“Theory will only take you so far”. Not sure if Oppenheimer ever intended the quote to apply to Deep Learning, but it fits perfectly!

While the theory of Neural Networks is mathematically stunning, there comes a time to apply our theoretical knowledge and build a NN. Thankfully, this process of model architecture has been simplified by tech giants Google and Meta with Tensorflow and PyTorch respectively. It’s a great learning exercise to build a NN from scratch, but in terms of Deep Learning, these open-source libraries are elite.

While this is a guide to PyTorch, I’ll first touch on the differences between Tensorflow and PyTorch. Tensorflow is useful for large-scale model deployment and visualization due to its built-in API. Tensorflow also uses static computation graphs. On the other hand, PyTorch is for Python enthusiasts like myself as it is more intuitive to Python coding. It is easier to build small projects and research models, but harder to scale, deploy, and visualize. PyTorch uses dynamic computation graphs and can be coupled with Cuda for GPU training.

If you have heard of Keras, it is a high-level NN API in Python that runs on top of Tensorflow (you may have seen tf.keras in NN code)

On to the basics of PyTorch!

What is a Tensor?

The fundamental data structure in all of Deep Learning is a tensor. Think of this as a multidimensional matrix (a true mathematical definition can be found here). A 0-D tensor is a scalar, 1-D is a vector, 2-D is a matrix, and so on. A PyTorch tensor is similar to a Numpy ndarray.

Visualization of a Tensor (Source: https://medium.com/@jayeshjain_246/what-are-tensors-495cf37c18e6)

Initializing a Tensor

A PyTorch tensor contains 3 attributes: shape (size of the tensor), data type (type of data in the tensor), and device (where the tensor is stored). An example of initializing a tensor and checking the attributes is shown:

import torch

x = torch.tensor([1, 3, 5])
print(x.shape) #torch.Size([3])
print(x.dtype) #torch.int64
print(x.device) #cpu

We can create a tensor from a multidimensional array (from_numpy), from an existing tensor size, or any given size (random initiated, zeros, or ones like in Numpy). Most Numpy operations can be used in the torch side of tensors (like arange).

# tensor from Numpy array
array = np.array([[1, 3, 5], [2, 4, 6]])
new_tensor = torch.from_numpy(array) #tensor([[1,3,5], [2,4,6]])
# random numbers based on size of previous tensor
new_tensor2 = torch.rand_like(t, dtype=torch.float)
# random numbers based on size
new_tensor3 = torch.rand((2,2))
# tensor of given shape initialized with zeros (can do same for ones)
new_tensor4 = torch.zeros((2,2))

For using random initializations on tensors, it is possible to control the randomness (that is to get the same random outcome everytime). This is extremely essential when creating a NN! Not sure why everyone uses 42 but it is the general convention.

torch.manual_seed(seed=42)

Accessing a part of a tensor as a Python scalar requires .item() (but make sure you are only accessing a scalar or PyTorch will throw an error)

scalar_tensor = torch.tensor(22)
print(scalar_tensor).item() # prints 22

Basic Tensor Operations

Tensors can easily be indexed, transposed, added/subtracted, and multiplied/divided. Due to the similarity of these operations to a ndarray in Numpy, I won’t show an example. Be sure to use built-in torch functions instead of iterating through and performing operations (like matrix multiplication) as torch functions are already optimized. Beware as one of the most common PyTorch errors is a shape mismatch error when trying to add or multiply tensors.

I had mentioned Cuda earlier (a way to speed up training on a GPU). Just a quick side note on how one would move to GPU:

# Assume tensor1 and tensor2 are torch tensors
device = "cuda" if torch.cuda.is_available() else "cpu"

# Move tensor to GPU (if available)
tensor_on_gpu = tensor.to(device)

# Perform operation on tensors if desired

PyTorch Neural Network

Now that we have covered basic PyTorch, let’s discuss some more functionality by building a basic, generic PyTorch NN. All info comes from the torch.nn module and will inherit from nn.Module.

Note: This assumes a basic understanding of Neural Networks

The first important PyTorch function is DataLoader which will iterate over the dataset in mini-batches instead of one at a time and shuffle data while training (if chosen to but you should).

import torch
from torch.utils.data import DataLoader
from torch import nn
import torch.optim as optim

# Assume training_data and test_data already loaded in
new_train = DataLoader(training_data, batch_size=64, shuffle=True)
new_test = DataLoader(test_data, batch_size=64, shuffle=True)

Now we define a Neural Network class for our model architecture using inheritance with nn.Module:

class SimpleNN(nn.Module):
    def __init__(self):
        super(SimpleNN, self).__init__()
        self.layer1 = nn.Linear(16, 32)
        self.layer2 = nn.Linear(32, 1)

    def forward(self, x):
        x = torch.relu(self.layer1(x))
        x = torch.sigmoid(self.layer2(x))
        return x

The 2-Layer NN consists of an input layer with 16 features moving to 32 neurons in the hidden layer and 1 in the output layer (since this simple example could be binary classification). The hidden layer uses a ReLU activation function while the output uses a sigmoid (to scale between 0 and 1). Now we can initialize the model and define loss and optimization:

model = SimpleNN()
criterion = nn.BCELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

We are using Binary Cross Entropy Loss and the Adam Optimizer for our network (two of the more common choices for a binary classifier). The learning rate is set to 0.01, but of course, that is a hyperparameter that would be optimized. Finally to train the model (for an arbitrary 100 epochs):

for epoch in range(num_epochs):
    outputs = model(inputs)
    loss = criterion(outputs, labels)

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

Outputs will be the forward pass, loss contains the loss function results, and the final 3 steps will be to update the weights in backpropagation.

That was a super basic PyTorch NN. A more advanced one would contain data preprocessing, more complex layers, and regularization for better results. Using PyTorch, any kind of Deep Learning model can be implemented (and PyTorch even allows you to create custom functions to adapt different models). Hopefully you enjoyed and good luck using PyTorch!

References

[1] Automatic Differentiation in PyTorch. https://openreview.net/pdf?id=BJJsrmfCZ

[2] Learning PyTorch with Examples. https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

[3] PyTorch Documentation. https://pytorch.org/docs/stable/index.html