Creating a PyTorch Neural Network with ChatGPT

11 min readFeb 4, 2023

Welcome to this guide on how to create a PyTorch neural network using the state-of-the-art language model, ChatGPT.

Deep learning has become an integral part of many fields, ranging from computer vision to natural language processing. PyTorch, one of the most popular deep learning frameworks, provides a seamless way to build and train neural networks with its intuitive interface and dynamic computational graph. However, for those who have not used PyTorch before, and those with little Python programming experience, it can be challenging to get started . Fortunately, with the release of ChatGPT, programmers and non-programmers alike have access to a powerful tool that can make the process of using PyTorch, much easier.

The objective of this post is simple: I want to demonstrate how to use ChatGPT to create a neural network in PyTorch. The process is simple, and by asking ChatGPT the right questions, we can create our very own neural network in a fraction of the time it would take to learn how to use PyTorch. We will also leverage Google Colab in order to build and run our neural network. This tutorial is intended for anyone who would like to see how to create a deep learning project with ChatGPT, regardless of whether you are an expert programmer or have never written a line of code in your life.

What is ChatGPT?

ChatGPT is a state-of-the-art language model developed by OpenAI. It is a transformer-based neural network trained on a massive amount of text data, allowing it to generate human-like responses to natural language input. With its ability to understand and generate text, ChatGPT has a wide range of applications, from language translation to question-answering and beyond. ChatGPT’s advanced capabilities make it one of the most powerful language models available. While ChatGPT can be incorporated into diverse projects through an API provided by OpenAI, its chat-bot feature is equally as useful, and that is what we will be using for this tutorial.

As of the writing of this post ChatGPT is free to the public, although a Plus version was just announced. In order to use it, all you need is an account on the OpenAI ChatGPT website.

What is PyTorch?

PyTorch is an open-source machine learning library for Python, widely used for its ease of use and flexibility in building and training deep learning models. It provides a dynamic computational graph, which allows for faster prototyping and more intuitive model building compared to traditional static computation graphs. PyTorch also integrates seamlessly with the most popular Python libraries, making it a popular choice among researchers and practitioners alike. Its popularity and active community make PyTorch a go-to choice for many deep learning projects, and it is used in a variety of industries, including computer vision, natural language processing, and reinforcement learning.

Defining the Scope of the Network

Neural networks can be designed for a large number of applications, ranging from image classification (hotdog vs. no-hotdog) to large language models like ChatGPT. For this tutorial, we will be performing a multi-class classification on the popular Fashion MNIST dataset. Our goal will be to classify different pieces of clothing into their respective categories. There are a total of 10 different labels that range from t-shirts to purses.

An example of the Fashion MNIST dataset (source: https://github.com/zalandoresearch/fashion-mnist)

I picked this dataset because it is readily available on PyTorch, along with many other popular datasets.

Getting Started: Opening Google Colab

In this tutorial we will be using Google Colab, a free collaborative Jupyter interface hosted by Google (for more about Jupyter Notebooks, see here). This will allow us to have an interactive environment to build and execute our Python code. Google Colab also comes pre-installed with many deep learning packages, so we have to do very little upfront work and can go straight into coding. If you want the final Jupyter notebook generated in this tutorial, you can find it here.

Importing the Fashion MNIST dataset

To get started, we will need to import the Fashion MNIST dataset into a format that PyTorch can use. This can be done through what are known as DataLoaders. We will ask ChatGPT to do this for us. In the ChatGPT text box, type:

“write code to load the fashion MNIST dataset from pytorch into a dataloader”

import torch
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt

# Define a transform to preprocess the data
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5,), (0.5,))
])

# Load the Fashion MNIST dataset
fashion_mnist_dataset = torchvision.datasets.FashionMNIST(
    root='./data',
    train=True,
    download=True,
    transform=transform
)

# Split the dataset into training and validation sets
train_dataset, val_dataset = torch.utils.data.random_split(fashion_mnist_dataset, [50000, 10000])

# Define a dataloader to load the data in batches
batch_size = 32
train_dataloader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=batch_size,
    shuffle=True
)

val_dataloader = torch.utils.data.DataLoader(
    val_dataset,
    batch_size=batch_size,
    shuffle=False
)

The instantaneous response gives us exactly what we need, and more! ChatGPT imports all of the necessary libraries. It also creates a transform which normalizes the pictures into a specific intensity range (this helps the neural network during training). It then loads the dataset, applies the transform, and saves the transformed data into a variable called fashion_mnist_dataset. Finally, it creates code to split the dataset into training and validation sets, subsequently creating two DataLoaders, one for the training set (train_dataloader) and one for the validation set (test_dataloader). It even took the liberty of specifying the batch size for us, setting it to 32 .

We can take the output of ChatGPT and paste it on Google Colab and run the cell. Our code executed without issues!

Placing the code generated by ChatGPT into a Google Colab

Visualizing the Fashion MNIST Dataset

We can now ask ChatGPT to help us visualize the Fashion MNIST dataset. I used the following instruction to ChatGPT in order to get the code for this:

generate code to visualize examples from the training set of fashion mnist dataset from above in a grid

Note that I specified from above in my command such that ChatGPT uses the information from the previous code it generated in the new output.

Output from ChatGPT for visualizing the Fashion MNIST dataset

import matplotlib.pyplot as plt
import numpy as np

# Get a batch of data from the training set
images, labels = next(iter(train_dataloader))

# Plot the images in a grid
num_images = images.shape[0]
rows = np.ceil(np.sqrt(num_images))
cols = np.ceil(num_images / rows)

fig, axes = plt.subplots(rows, cols, figsize=(5,5))
axes = axes.ravel()

for i in range(num_images):
    ax = axes[i]
    ax.imshow(images[i][0], cmap='gray')
    ax.axis('off')

plt.show()

Putting this code on Colab, however, spits out an error:

We get an error when we copy the visualization code on Colab

It seems that line 12 requires the variables rows and cols to be int instead of float. This is an easy fix, but let us see if ChatGPT can troubleshoot this code and fix it for us. I asked ChatGPT the following:

Line 12 from the code above has a bug, could you fix it

import matplotlib.pyplot as plt
import numpy as np

# Get a batch of data from the training set
images, labels = next(iter(train_dataloader))

# Plot the images in a grid
num_images = images.shape[0]
rows = int(np.ceil(np.sqrt(num_images)))
cols = int(np.ceil(num_images / rows))

fig, axes = plt.subplots(rows, cols, figsize=(5,5))
axes = axes.ravel()

for i in range(num_images):
    ax = axes[i]
    ax.imshow(images[i][0], cmap='gray')
    ax.axis('off')

plt.show()

Interestingly, the response from ChatGPT above is incorrect, since the original code already had images[i][0](see above) . It seems that line 12 in Google Colab is different from line 12 in ChatGPT, however, if we look at the actual problem line, the one that before was:

rows = np.ceil(np.sqrt(num_images))
cols = np.ceil(num_images / rows)

it now is written as:

rows = int(np.ceil(np.sqrt(num_images)))
cols = int(np.ceil(num_images / rows))

successfully converting the variables rows and cols to integers, and fixing the problem we had before. ChatGPT fixed our code even though its response was referring to a different part of the script that did not require fixing!

Using this on Colab, we now get a successful grid-like representation of our loaded MNIST dataset:

Creating the Neural Network

Now that we have our Fashion MNIST dataset loaded, we can use ChatGPT in order to create a Neural Network for classifying the different images into its labels.

At this stage, we can be as generic or as specific as we want when asking ChatGPT to write the neural network architecture for us. I will ask ChatGPT to build a 2-dimensional convolutional U-Net for classifying the different images in our dataset. This is the instruction I used:

write a 2-dimensional convolutional neural network for classifying images from the fashion MNIST dataset defined above

import torch
import torch.nn as nn
import torch.optim as optim

# Define the neural network
class ConvNet(nn.Module):
    def __init__(self):
        super(ConvNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=5, stride=1, padding=2)
        self.bn1 = nn.BatchNorm2d(32)
        self.relu1 = nn.ReLU()
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.conv2 = nn.Conv2d(32, 64, kernel_size=5, stride=1, padding=2)
        self.bn2 = nn.BatchNorm2d(64)
        self.relu2 = nn.ReLU()
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        
        self.fc1 = nn.Linear(7 * 7 * 64, 128)
        self.fc2 = nn.Linear(128, 10)
        
    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu1(x)
        x = self.maxpool1(x)
        
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu2(x)
        x = self.maxpool2(x)
        
        x = x.view(-1, 7 * 7 * 64)
        x = self.fc1(x)
        x = self.fc2(x)
        
        return x

model = ConvNet()

The code above generates a 2-dimensional convolutional neural network (ConvNet) implemented in PyTorch. It imports the necessary PyTorch libraries: torch, torch.nn, and torch.optim. The __init__ method sets up the structure of the neural network by defining the layers and their properties, including two convolutional layers, two batch normalization layers, two ReLU activation layers, two max pooling layers, and two fully-connected (FC) layers. The forward method implements the forward pass of the neural network. It applies the convolutional, normalization, activation, and pooling layers in sequence to the input image, followed by the FC layers to obtain the output class scores.

ChatGPT actually generated the training loop for us in the above output (I did not include a screenshot of the full output), but we will re-generate it in the following section.

Creating the Training Loop

Create a training loop for the above Fashion MNIST dataset and the above ConvNet

Note how I keep referring to the above code so that ChatGPT can make use of the chat history in order to make sure the name of variables is consistent in the generated code.

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
num_epochs = 10
for epoch in range(num_epochs):
    running_loss = 0.0
    for i, data in enumerate(train_loader):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        
    print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, running_loss/len(train_loader)))

print('Training finished!')

This code first defines a cross-entropy loss function (criterion = nn.CrossEntropyLoss()) and an Adam optimizer with a learning rate of 0.001 (optimizer = optim.Adam(model.parameters(), lr=0.001)). In the training loop, it iterates over each mini-batch in the training set (remember we had the batch size set to 32), computes the output of the model on the mini-batch (outputs = model(inputs)), computes the loss between the output and the ground-truth labels (loss = criterion(outputs, labels)), performs backpropagation to compute the gradients (loss.backward()), and updates the model parameters using the optimizer (optimizer.step()). The loop repeats for a specified number of epochs. ChatGPT has also added helpful printing lines that tell us how the loss has changed across epochs, as well as line that tells us when the training has finished. After the training loop, the model will have learned to classify the Fashion MNIST images.

When putting this code on Colab, we notice an error. ChatGPT called the training set DataLoader train_loader, but when we had first asked it to generate the DataLoader, it called ittraining_dataloader. We could probably ask ChatGPT to fix these mistakes directly, but for brevity, I will go ahead and fix them myself:

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train the model
num_epochs = 10
for epoch in range(num_epochs):
    running_loss = 0.0
    for i, data in enumerate(train_loader):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()
        
    print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, running_loss/len(train_loader)))

print('Training finished!')

We now have our updated training loop, and it works perfectly in Colab:

Creating the inference loop and testing the performance of the model on the validation set

With a trained model, the last thing we need to do is to run it on the validation set, and see how it performs on data that the model has not been trained on. This is traditionally referred as “inference” in the deep learning community. Once again, we can ask ChatGPT to help us do that using the following instruction:

create an inference loop to test the accuracy on the validation set, include visualizations

I wanted to visualize some of the predictions, so I made that explicit in the instruction I gave ChatGPT.

import matplotlib.pyplot as plt

# Evaluation/Inference Loop
model.eval() # Set model to evaluation mode

correct = 0
total = 0

# Disable gradients to speed up inference
with torch.no_grad():
  for images, labels in val_dataloader:

    outputs = model(images)
    _, predicted = torch.max(outputs.data, 1)

    total += labels.size(0)
    correct += (predicted == labels).sum().item()

    # Visualize a few images
    if total == 20:
      break
    
    fig = plt.figure(figsize=(15, 15))
    for i in range(20):
      ax = fig.add_subplot(5, 5, i+1, xticks=[], yticks=[])
      ax.imshow(images[i].reshape(28, 28), cmap='gray')
      ax.set_title("True: " + str(labels[i].item()) + "\nPred: " + str(predicted[i].item()),
                  fontsize=10)

print('Validation accuracy: {:.2f}%'.format(100 * correct / total))
plt.show()

ChatGPT once again provides everything we requested. In the inference loop, the model is set to evaluation mode using model.eval(). The gradients are disabled using torch.no_grad() to speed up the inference process. Then, the images and labels from the validation set are loaded and passed through the model to obtain the outputs. The predicted class is obtained by taking the maximum value along the 1-th dimension of the outputs tensor, and the accuracy is calculated by counting the number of correct predictions. Finally, a few images from the validation set are visualized along with their true and predicted labels.

Example output from the inference loop. Most examples are labeled correctly except for image number 3 which has a true label of 3 and a predicted label of 6.

Using this code on Colab, we can visualize which labels are the true labels and which labels are predicted for many different examples. We also get a validation accuracy of 91%, which is phenomenal for a neural network that was generated almost exclusively by another neural network!

Conclusion

I hope this tutorial was informative and helpful. ChatGPT is here to stay and the earlier we are able to incorporate it into our workflows, the more we can benefit from it. As mentioned, earlier if you want the final Jupyter notebook generated in this tutorial, you can find it here. Feel free to reach out with any questions or comments, and subscribe for more quality content!