Deep learning fundamentals using PyTorch

12 min readMar 4, 2024

Let’s delve into the fundamentals of deep learning using PyTorch, starting with the foundational building block: the perceptron.

A perceptron is a type of artificial neuron that forms the basic unit of a neural network. It takes multiple input values, applies weights to them, sums them up, and then passes the result through an activation function to produce an output. Here’s a step-by-step breakdown of how a perceptron works:

Inputs: A perceptron receives multiple input values, denoted as x1,x2,…,xn.Each input is associated with a weight, which determines its importance in the computation.

Weights: Each input value is multiplied by a corresponding weight w1,w2,…,wn. These weights represent the strength of the connection between the inputs and the perceptron.

Summation: The weighted inputs are summed together with an additional bias term b, which allows the perceptron to learn a threshold for activation. The sum can be expressed as:

z=∑i=1n(xi×wi)+b

Activation Function: The sum z is then passed through an activation function, which introduces non-linearity into the output of the perceptron. Common activation functions include the sigmoid, tanh, and ReLU (Rectified Linear Unit) functions.

Output: The output y of the perceptron is the result of applying the activation function to the sum z. It represents the perceptron’s decision or prediction based on the input data.

In PyTorch, you can implement a perceptron using the torch.nn module. Here's a simple example of how to create a perceptron with two input values and a sigmoid activation function:

import torch
import torch.nn as nn

class Perceptron(nn.Module):
    def __init__(self):
        super(Perceptron, self).__init__()
        self.linear = nn.Linear(2, 1)  
    def forward(self, x):
        return torch.sigmoid(self.linear(x))
# Create an instance of the perceptron
perceptron = Perceptron()
# Example input
x = torch.tensor([[0.5, 0.2]])
# Get the output from the perceptron
output = perceptron(x)
print("Output:", output)

In this example:

We define a Perceptron class that inherits from nn.Module, the base class for all neural network modules in PyTorch.
In the constructor (__init__), we create a linear layer (nn.Linear) with two input features and one output feature. This defines the weights and biases of the perceptron.
In the forward method, we apply the sigmoid activation function to the output of the linear layer.
We create an instance of the perceptron and pass an example input tensor x through it to get the output.

Understanding the perceptron is crucial as it forms the foundation for more complex neural network architectures used in deep learning. From here, you can explore more advanced concepts such as multi-layer perceptrons (MLPs), convolutional neural networks (CNNs), and recurrent neural networks (RNNs) to build powerful deep learning models for various tasks.

Now what is MLP?

Multi-Layer Perceptron (MLP) is an artificial neural network comprising multiple layers of interconnected nodes. Each layer is responsible for learning and identifying progressively intricate features within the input data.

# Load MNIST using sklearn.datasets.fetch_openml

import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import fetch_openml

# Load data from https://www.openml.org/d/554
X, y = fetch_openml("mnist_784", version=1, return_X_y=True, parser="auto")

# Split into train and test
X_train, X_test = X[:60000], X[60000:]
y_train, y_test = y[:60000], y[60000:]
print(f"Training set size: {len(X_train)}")
print(f"Test set size: {len(X_test)}")

# Convert to numpy arrays and scale for the model
X_train = np.array(X_train) / 255
X_test = np.array(X_test) / 255
y_train = np.array(y_train, dtype=np.int8)
y_test = np.array(y_test, dtype=np.int8)

# Show the first 3 images
plt.figure(figsize=(20, 4))
for index, (image, label) in enumerate(zip(X_train[0:3], y_train[0:3])):
    plt.subplot(1, 3, index + 1)
    plt.imshow(np.reshape(image, (28, 28)), cmap=plt.cm.gray)
    plt.title("Label: %s\n" % label, fontsize=20)

# Train an MLP classifier using sklearn.neural_network.MLPClassifier

from sklearn.neural_network import MLPClassifier

# Create an MLPClassifier object
mlp = MLPClassifier(
    hidden_layer_sizes=(50,),
    max_iter=10,
    alpha=1e-4,
    solver="sgd",
    verbose=10,
    random_state=1,
    learning_rate_init=0.1,
)


# Train the MLPClassifier
mlp.fit(X_train, y_train)

# Show the accuracy on the training and test sets

print(f"Training set score: {mlp.score(X_train, y_train)}")
print(f"Test set score: {mlp.score(X_test, y_test)}")

# Show the images, predictions, and original labels for 10 images

# Get the predictions for the test dataset
predictions = mlp.predict(X_test)

# Show the predictions in a grid
plt.figure(figsize=(8, 4))

for index, (image, prediction, label) in enumerate(
    zip(X_test[0:10], predictions[0:10], y_test[0:10])
):
    plt.subplot(2, 5, index + 1)
    plt.imshow(np.reshape(image, (28, 28)), cmap=plt.cm.gray)

    # Green if correct, red if incorrect
    fontcolor = "g" if prediction == label else "r"
    plt.title(
        "Prediction: %i\n Label: %i" % (prediction, label), fontsize=10, color=fontcolor
    )

    plt.axis("off")  # hide axes

What is pytorch ?

PyTorch stands as a dynamic and potent framework for constructing and training machine learning models. It streamlines the workflow with its core components such as tensors and neural networks, while also providing robust mechanisms for defining objectives and enhancing models through loss functions and optimizers. Harnessing PyTorch empowers individuals to proficiently handle extensive datasets and innovate in the realm of AI applications.

Pytorch Tensors :

import torch

# Create a 3-dimensional tensor
images = torch.rand((4, 28, 28))

# Get the second image
image= images[0]

a = torch.tensor([[1, 1], [1, 0]])

print(a)
# tensor([[1, 1],
#         [1, 0]])

print(torch.matrix_power(a, 2))
# tensor([[2, 1],
#         [1, 1]])

print(torch.matrix_power(a, 3))
# tensor([[3, 2],
#         [2, 1]])

print(torch.matrix_power(a, 4))
# tensor([[5, 3],
#         [3, 2]])

Let’ s see more in details how to create tensors :

#import modules
import torch
import math

x = torch.empty(3, 4)
print(type(x))
print(x)

Output :

<class 'torch.Tensor'>
tensor([[0., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]])

The type of the object returned is torch.Tensor, which is an alias for torch.FloatTensor; by default, PyTorch tensors are populated with 32-bit floating point numbers.

A brief note about tensors and their number of dimensions, and terminology:

You will sometimes see a 1-dimensional tensor called a vector.
Likewise, a 2-dimensional tensor is often referred to as a matrix.
Anything with more than two dimensions is generally just called a tensor.

More often than not, you’ll want to initialize your tensor with some value. Common cases are all zeros, all ones, or random values, and the torch module provides factory methods for all of these:

zeros = torch.zeros(2, 3)
print(zeros)

ones = torch.ones(2, 3)
print(ones)

torch.manual_seed(1729)
random = torch.rand(2, 3)
print(random)

Random Tensors and Seeding

Speaking of the random tensor, did you notice the call to torch.manual_seed() immediately preceding it? Initializing tensors, such as a model’s learning weights, with random values is common but there are times - especially in research settings - where you’ll want some assurance of the reproducibility of your results. Manually setting your random number generator’s seed is the way to do this. Let’s look more closely:

torch.manual_seed(1729)
random1 = torch.rand(2, 3)
print(random1)

random2 = torch.rand(2, 3)
print(random2)

torch.manual_seed(1729)
random3 = torch.rand(2, 3)
print(random3)

random4 = torch.rand(2, 3)
print(random4)

tensor([[0.3126, 0.3791, 0.3087],
        [0.0736, 0.4216, 0.0691]])
tensor([[0.2332, 0.4047, 0.2162],
        [0.9927, 0.4128, 0.5938]])
tensor([[0.3126, 0.3791, 0.3087],
        [0.0736, 0.4216, 0.0691]])
tensor([[0.2332, 0.4047, 0.2162],
        [0.9927, 0.4128, 0.5938]])

Often, when you’re performing operations on two or more tensors, they will need to be of the same shape — that is, having the same number of dimensions and the same number of cells in each dimension. For that, we have the torch.*_like() methods

x = torch.empty(2, 2, 3)
print(x.shape)
print(x)

empty_like_x = torch.empty_like(x)
print(empty_like_x.shape)
print(empty_like_x)

zeros_like_x = torch.zeros_like(x)
print(zeros_like_x.shape)
print(zeros_like_x)

ones_like_x = torch.ones_like(x)
print(ones_like_x.shape)
print(ones_like_x)

rand_like_x = torch.rand_like(x)
print(rand_like_x.shape)
print(rand_like_x)

torch.Size([2, 2, 3])
tensor([[[0.0000e+00, 0.0000e+00, 9.4877e-20],
         [0.0000e+00, 9.4877e-20, 0.0000e+00]],

        [[0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00]]])
torch.Size([2, 2, 3])
tensor([[[0.0000e+00, 0.0000e+00, 8.8442e-24],
         [0.0000e+00, 8.8442e-24, 0.0000e+00]],

        [[0.0000e+00, 0.0000e+00, 0.0000e+00],
         [0.0000e+00, 0.0000e+00, 0.0000e+00]]])
torch.Size([2, 2, 3])
tensor([[[0., 0., 0.],
         [0., 0., 0.]],

        [[0., 0., 0.],
         [0., 0., 0.]]])
torch.Size([2, 2, 3])
tensor([[[1., 1., 1.],
         [1., 1., 1.]],

        [[1., 1., 1.],
         [1., 1., 1.]]])
torch.Size([2, 2, 3])
tensor([[[0.6128, 0.1519, 0.0453],
         [0.5035, 0.9978, 0.3884]],

        [[0.6929, 0.1703, 0.1384],
         [0.4759, 0.7481, 0.0361]]])

The last way to create a tensor that will cover is to specify its data directly from a PyTorch collection:

some_constants = torch.tensor([[3.1415926, 2.71828], [1.61803, 0.0072897]])
print(some_constants)

some_integers = torch.tensor((2, 3, 5, 7, 11, 13, 17, 19))
print(some_integers)

more_integers = torch.tensor(((2, 4, 6), [3, 6, 9]))
print(more_integers)

tensor([[3.1416, 2.7183],
        [1.6180, 0.0073]])
tensor([ 2,  3,  5,  7, 11, 13, 17, 19])
tensor([[2, 4, 6],
        [3, 6, 9]])

torch.tensor() creates a copy of the data.

Tensor Data Types

a = torch.ones((2, 3), dtype=torch.int16)
print(a)

b = torch.rand((2, 3), dtype=torch.float64) * 20.
print(b)

c = b.to(torch.int32)
print(c)

tensor([[1, 1, 1],
        [1, 1, 1]], dtype=torch.int16)
tensor([[ 0.9956,  1.4148,  5.8364],
        [11.2406, 11.2083, 11.6692]], dtype=torch.float64)
tensor([[ 0,  1,  5],
        [11, 11, 11]], dtype=torch.int32)

Available data types include:

torch.bool
torch.int8
torch.uint8
torch.int16
torch.int32
torch.int64
torch.half
torch.float
torch.double
torch.bfloat

Math & Logic with PyTorch Tensors

ones = torch.zeros(2, 2) + 1
twos = torch.ones(2, 2) * 2
threes = (torch.ones(2, 2) * 7 - 1) / 2
fours = twos ** 2
sqrt2s = twos ** 0.5

print(ones)
print(twos)
print(threes)
print(fours)
print(sqrt2s)

tensor([[1., 1.],
        [1., 1.]])
tensor([[2., 2.],
        [2., 2.]])
tensor([[3., 3.],
        [3., 3.]])
tensor([[4., 4.],
        [4., 4.]])
tensor([[1.4142, 1.4142],
        [1.4142, 1.4142]])

Introduction to PyTorch Tensors - PyTorch Tutorials 2.2.1+cu121 documentation

The simplest way to create a tensor is with the torch.empty() call: Let's unpack what we just did: We created a tensor…

pytorch.org

Please read for more understanding

Now, let’s read about Pytorch NN

What is torch.nn really? - PyTorch Tutorials 2.2.1+cu121 documentation

Authors: Jeremy Howard, fast.ai. Thanks to Rachel Thomas and Francisco Ingham. We recommend running this tutorial as a…

pytorch.org

let’s create simple MLP

import torch.nn as nn

class MLP(nn.Module):
    def __init__(self, input_size):
        super(MLP, self).__init__()
        self.hidden_layer = nn.Linear(input_size, 64)
        self.output_layer = nn.Linear(64, 2)
        self.activation = nn.ReLU()

    def forward(self, x):
        x = self.activation(self.hidden_layer(x))
        return self.output_layer(x)

model = MLP(input_size=10)
print(model)

nn.Linear :
Let’s understand nn.Linear in details

>>> m = nn.Linear(20, 30)
>>> input = torch.randn(128, 20)
>>> output = m(input)
>>> print(output.size())
torch.Size([128, 30])

Define a Linear Layer:

nn.Linear(20, 30) creates a linear transformation layer (also known as a fully connected or dense layer) with an input size of 20 and an output size of 30. This means that the layer will perform a linear transformation of input tensors with 20 features to output tensors with 30 features.

Generate Input Tensor:

torch.randn(128, 20) generates a random input tensor of shape (128, 20). Here, 128 represents the batch size (number of samples) and 20 represents the number of input features.

Pass Input Through Linear Layer:

output = m(input) passes the input tensor through the linear layer m, applying the linear transformation to each sample in the batch. The output tensor will have a shape determined by the batch size and the output size of the linear layer, which in this case is (128, 30).

Print Output Size:

print(output.size()) prints the size of the output tensor, which is (128, 30). This indicates that the output tensor has a batch size of 128 and each sample in the batch has 30 features after passing through the linear layer.

In summary, the code snippet demonstrates how to create and use a linear layer in PyTorch to perform a linear transformation on input tensors. The linear layer’s parameters are automatically initialized during its creation, and it can be used to efficiently transform input data in neural network architectures.

The output size (128, 30) corresponds to the shape of the output tensor resulting from passing the input tensor through the linear layer. Let's break it down:

Input Tensor Shape: The input tensor has a shape of (128, 20). This means it has a batch size of 128 (128 samples) and each sample has 20 features.
Linear Layer: The linear layer defined as nn.Linear(20, 30) specifies that it will transform input tensors with 20 features into output tensors with 30 features.
Output Tensor Shape: When we pass the input tensor through the linear layer (output = m(input)), each sample in the batch undergoes a linear transformation, resulting in an output tensor with the same batch size but with each sample having 30 features instead of 20.

So, the output tensor shape becomes (128, 30), indicating that we have a batch size of 128 and each sample in the batch now has 30 features after passing through the linear layer.

Now let’s go to our next section pytorch Loss functions

Loss functions assess a model’s effectiveness by measuring the variance between its predictions and the actual outcomes.

Cross-Entropy Loss: Primarily used in classification tasks, like discerning between different categories such as identifying whether an image portrays a cat or a dog. It evaluates the concordance between the model’s predictions and the true labels.

Mean Squared Error: This method computes the average of the squared differences between predicted values, such as forecasted prices, and the observed values. Commonly utilized in regression tasks, it is adept at predicting continuous values rather than discrete categories.

import torch
import torch.nn as nn

loss_function = nn.CrossEntropyLoss()


target_tensor = torch.tensor([1])
target_tensor


loss_function = nn.MSELoss()


predicted_tensor = torch.tensor([320000.0])
actual_tensor = torch.tensor([300000.0])


loss_value = loss_function(predicted_tensor, actual_tensor)
print(loss_value.item())

PyTorch optimizers play a crucial role in enhancing how a neural network learns from data by tweaking the model’s parameters. By utilizing optimizers like stochastic gradient descent (SGD) with momentum or Adam, we can efficiently kickstart the learning process!

Gradients: These indicate the direction and magnitude in which a function increases the most. Adjusting the model’s parameters in the opposite direction of the gradient of the loss function helps minimize the loss.

Learning Rate: Think of this as a step size during training that determines how big the adjustments to the neural network’s settings should be. If it’s too large, you might overshoot the optimal settings; if it’s too small, it’ll take a longer time to reach the optimal solution.

Momentum: Momentum is a method that aids in accelerating the optimizer in the correct direction while also reducing oscillations, helping to smooth out the learning process.

import torch.optim as optim

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

optimizer = optim.Adam(model.parameters(), lr=0.01)

Dataset and Dataloaders

Dataset:

A dataset in PyTorch is like a container that holds your data in a structured manner. It’s basically a collection of data that you want to use to train or test your machine learning model. Each piece of data in the dataset is usually paired with its corresponding label (if it’s a supervised learning task), so the model can learn from it.

For example, if you’re working on an image classification task, your dataset would contain images of different objects along with their labels (e.g., “cat,” “dog,” “car”).

Data Loader:

A data loader in PyTorch is like a conveyor belt that helps you efficiently load and process your data during training or testing. It takes the dataset and handles tasks like batching, shuffling, and parallelization, making it easier to feed the data into your model.

Batching: Instead of feeding your entire dataset into the model at once, which can be slow and inefficient, the data loader divides it into smaller batches. Each batch contains a subset of your data, and the model processes one batch at a time. This speeds up training and allows the model to learn from multiple examples simultaneously.
Shuffling: Shuffling means randomly rearranging the order of your data. This helps prevent the model from memorizing patterns in the data and improves its ability to generalize to unseen examples. The data loader automatically shuffles your dataset or batches before feeding them into the model.
Parallelization: Data loaders can take advantage of multiple processors or GPUs to load and process your data in parallel. This further speeds up training, especially for large datasets, by distributing the workload across multiple computing units.

In summary, datasets hold your data, while data loaders handle the logistics of loading and processing it efficiently, making it easier for your model to learn from the data and improve its performance.

import torch
from torch.utils.data import Dataset, DataLoader

# Step 1: Define a custom dataset
class CustomDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

# Step 2: Create some sample data
data = torch.tensor([[1, 2], [3, 4], [5, 6], [7, 8]])
labels = torch.tensor([0, 1, 0, 1])

# Step 3: Instantiate the custom dataset
dataset = CustomDataset(data, labels)

# Step 4: Create a data loader
batch_size = 2
shuffle = True
num_workers = 2
loader = DataLoader(dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)

# Step 5: Iterate through the data loader
for batch_data, batch_labels in loader:
    print("Batch Data:", batch_data)
    print("Batch Labels:", batch_labels)
    print("Batch Size:", len(batch_data))

let’ s see how to use this :

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

# Step 1: Define a simple model
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.fc = nn.Linear(2, 1)

    def forward(self, x):
        return self.fc(x)

# Step 2: Define your custom dataset
class CustomDataset(Dataset):
    def __init__(self, data, labels):
        self.data = data
        self.labels = labels

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        return self.data[idx], self.labels[idx]

# Step 3: Create sample data and labels
data = torch.tensor([[1, 2], [3, 4], [5, 6], [7, 8]], dtype=torch.float32)
labels = torch.tensor([0, 1, 0, 1], dtype=torch.float32)

# Step 4: Instantiate the custom dataset
dataset = CustomDataset(data, labels)

# Step 5: Create a data loader
batch_size = 2
shuffle = True
num_workers = 2
loader = DataLoader(dataset, batch_size=batch_size, shuffle=shuffle, num_workers=num_workers)

# Step 6: Instantiate the model, loss function, and optimizer
model = SimpleModel()
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Step 7: Training loop
num_epochs = 10
for epoch in range(num_epochs):
    running_loss = 0.0
    for batch_data, batch_labels in loader:
        # Zero the gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(batch_data)
        loss = criterion(outputs.squeeze(), batch_labels)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

        # Update running loss
        running_loss += loss.item()

    # Print average loss for the epoch
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {running_loss / len(loader)}")

Let’s see one more example related to trainng Loops.

for epoch in range(10):
    loss = 0.0
    for number_pairs, sums in dataloader:  # Iterate over the batches
        predictions = model(number_pairs)  # Compute the model output
        loss = loss_function(predictions, sums)  # Compute the loss
        loss.backward()  # Perform backpropagation
        optimizer.step()  # Update the parameters
        optimizer.zero_grad()  # Zero the gradients

        loss += loss.item()  # Add the loss for all batches

    # Print the loss for this epoch
    print("Epoch {}: Sum of Batch Losses = {:.5f}".format(epoch, loss)

If you’re interested in exploring advanced frameworks and libraries, consider delving into the capabilities of Hugging Face in Deep learning.

Thanks for reading 😊