MNIST Handwritten Digit Recognition With Pytorch

7 min readNov 23, 2019

In this article, I will be discussing how to create an MLP (multi-layer perceptron) to classify images from the MNIST Dataset as seen above. The MNIST Dataset is a very easy dataset to train on and is often used as the “Hello World” into Machine Learning Models. I will be using the dataset to introduce the basics of Pytorch and machine learning by creating your first MLP.

Why Pytorch?

PyTorch is a Python machine learning package based on Torch, which is an open-source machine learning package. It is used for applications such as computer vision and natural language processing. There are other machine learning libraries available, one of the most popular being TensorFlow however, I will be using PyTorch. Compared to TensorFlow, PyTorch has many advantages as a Deep Learning library. For one, Pytorch is generally more “pythonic” and much easier to use because it executes code at runtime. This is different from TensorFlow, where you have to define the execution graph first, with the input and output shapes, activation functions, and order of each layer. The code is then run in a separate session. In PyTorch, you define the graph as a class of type nn.module, and feed the input data through it. The code then simply runs as a class. This makes the code easier to read, execute, and debug.

The Structure

We need to design a model can accurately distinguish each of the numbers in the dataset. Using deep learning, we can take a data-driven approach to train an algorithm that can examine these images and discover patterns that can distinguish one number from another. Our algorithm will need to attain an understanding of what makes a hand-drawn 1 look like a 1 and how images of 1’s differ from say 7’s.

The process to classify images on our MNIST dataset will be broken down into the following steps:

Load and visualize the data
Define the neural network
Train the model
Evaluate the performance of our trained model on MNIST!

Loading the Dataset

MNIST contains 70,000 images of handwritten digits: 60,000 for training and 10,000 for testing. The images are grayscale, 28x28 pixels, and centred to reduce preprocessing and get started quicker. The MNIST dataset is one of the most common datasets used for image classification, Pytorch allows us to import and download the MNIST dataset directly from its API. We also have to assign how many samples there will be per batch to load (batch_size) and the percentage of the training set to use as validation.

# Define the number of subprocesses to use for data loading, the batch size, and validation sizenum_workers = 0
batch_size = 20
valid_size = 0.2#convert to float tensor
transform = transforms.ToTensor()# choose the training and test datasets
train_data = datasets.MNIST(root='data', train=True,
                                   download=True, transform=transform)
test_data = datasets.MNIST(root='data', train=False,
                                  download=True, transform=transform)print (train_data)

Next, we obtain and assign training indices that are used for validation of the model later.

num_train = len(train_data)
indices = list(range(num_train))
np.random.shuffle(indices)
split = int(np.floor(valid_size * num_train))
train_idx, valid_idx = indices[split:], indices[:split]

We then need to create loaders for training, testing and validating. These loaders take the data we defined above: our batch size and the number of workers. The training and test loaders give us a way to iterate through the data one batch at a time and the valid loader validates the batch size during the training loop.

train_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
    num_workers=num_workers)valid_loader = torch.utils.data.DataLoader(train_data, batch_size=batch_size,
    num_workers=num_workers)test_loader = torch.utils.data.DataLoader(test_data, batch_size=batch_size,
    num_workers=num_workers)

We then simply, obtain one batch of images for training using iter.

dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy()

The Network Architecture

We will now be defining the MLP model which will be responsible for taking the inputs of 784-dim Tensor and its pixel values for each image and producing a Tensor of length 10 (our number of classes) that indicates the class scores for an input image. This particular example uses two hidden layers and a dropout layer to avoid overfitting. The number of hidden nodes in each layer is 512, therefore we let hidden_1 and hidden_2 equal to 512. We also put a dropout layer with a p-value of 0.5, which is simply the probability of a node being zeroed in a layer.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        hidden_1 = 512
        hidden_2 = 512
        self.fc1 = nn.Linear(28 * 28, hidden_1)
        self.fc2 = nn.Linear(hidden_1, hidden_2)
        self.fc3 = nn.Linear(hidden_2, 10)
        self.dropout = nn.Dropout(0.5)

Next, we have to define the feed-forward behaviour of the network. This is how an input, x, will be passed through various layers and is transformed. The input X is a grayscale image, and we flatten the image by using the view function. The flattened vector is then passed through our fully connected layer defined above. We then apply a ReLu activation function to make the values consistent and positive. The final returned input is a list of class scores.

def forward(self, x):
    # flatten image input
    x = x.view(-1, 28 * 28)
    # add hidden layer, with relu activation function
    x = F.relu(self.fc1(x))
    # add dropout layer
    x = self.dropout(x)
    # add hidden layer, with relu activation function
    x = F.relu(self.fc2(x))
    # add dropout layer
    x = self.dropout(x)
    # add output layer
    x = self.fc3(x)
    return x# initialize the NN
model = Net()
print(model)

Pre-Training

After loading in data and defining a model, we have to define our loss and optimization functions. For MNIST, we will be using the cross-entropy loss function. A cross-entropy loss function actually combines a softmax function and a negative log loss. This means we only need the model to produce class scores and then it is turned into probabilities. The optimizer used in this model is stochastic gradient descent with a learning rate of 0.01. Generally, a larger learning rate allows the model to learn faster, at the cost of a less accurate final set of weights. A smaller learning rate may allow the model to learn a more optimal or even globally optimal set of weights but may take significantly longer to train. We also set the number of epochs to 10 however this number is subject to change. Epochs are how many times you want the model to iterate through the entire training dataset. Lastly, before we begin training our model we initialize our tracker for minimum validation loss and set the initial “min” to infinity.

criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
n_epochs = 10
valid_loss_min = np.Inf

Note: If we were using a GPU for training, we should have also sent the network parameters to the GPU using e.g. network.cuda(). It is important to transfer the network's parameters to the appropriate device before passing them to the optimizer because the optimizer will not be able to keep track of them in the right way.

Training and Validating the Model

To train the model, we loop over each of the epochs and keep track of the training loss and validation loss as we go. Inside of this epoch loop, there is a batch loop where the train loader loads training data and the true labels for that batch. Firstly, we clear any gradients and then we call our model and perform a forward pass. Our model takes the data from the train loader and returns a predicted class score which is the output. The defined loss function compares the true label and the predicted label and we can now calculate the cross-entropy loss. We then perform a backward pass to compute the gradient of the loss and update the parameter. We then simply print the epoch along with its training loss.

model.train() 
for data, target in train_loader:
    optimizer.zero_grad()
    output = model(data)
    loss = criterion(output, target)
    loss.backward()
    optimizer.step()
    train_loss += loss.item() * data.size(0)

model.eval() 
for data, target in valid_loader:
    output = model(data)
    loss = criterion(output, target)
    valid_loss += loss.item() * data.size(0)

train_loss = train_loss / len(train_loader.sampler)
valid_loss = valid_loss / len(valid_loader.sampler)

print('Epoch: {} \tTraining Loss: {:.6f} \tValidation Loss: {:.6f}'.format(
    epoch + 1,
    train_loss,
    valid_loss
))

Testing the Trained Network

Finally, we test our model on previously unused test data and evaluate its performance. Testing on unseen data is a good way to check that the model generalizes well.

model.eval()

for data, target in test_loader:
    output = model(data)
    loss = criterion(output, target)
    test_loss += loss.item()*data.size(0)
    _, pred = torch.max(output, 1)
    correct = np.squeeze(pred.eq(target.data.view_as(pred)))

    for i in range(len(target)):
        label = target.data[i]
        class_correct[label] += correct[i].item()
        class_total[label] += 1# calculate and print avg test loss
test_loss = test_loss/len(test_loader.sampler)
print('Test Loss: {:.6f}\n'.format(test_loss))

Visualizing Our Results

Now that we have gotten data from the train loader with its true label along with its predicted labels from our model. We can visualize the results using the library matplotlib. The text will be green for accurately classified examples and red for incorrect predictions.

fig = plt.figure(figsize=(25, 4))
for i in np.arange(20):
    ax = fig.add_subplot(2, 20/2, i+1, xticks=[], yticks=[])
    ax.imshow(np.squeeze(images[i]), cmap='gray')
    ax.set_title("{} ({})".format(str(preds[i].item()), str(labels[i].item())),
                color=("green" if preds[i]==labels[i] else "red"))

plt.show()

Conclusion

In this article, I have successfully shown how to create an MLP (multi-layer perceptron) to classify images from the MNIST Dataset. The test accuracy for the whole dataset is around 97% with 20 epochs. For further understanding of PyTorch or to modify this model, a good resource is the PyTorch Documentation.