Chapter 5 — Logistic Regression

Build a Logistic Regression Model in PyTorch

Published in

Analytics Vidhya

8 min readJul 15, 2020

In previous blogs we used the Linear Regression technique to create models. But Linear Regression technique is unbounded, it has infinite possibilities and can be used only when the response variable is continuous, and this brings logistic regression to frame.

Logistic Regression is used to describe data and to explain the relationship between dependent categorical variable with one or more nominal, ordinal, interval or ratio independent variable.

Categorical variable represent the type of data which may be divided into groups or levels. Example marital status, gender these questions might have answer yes/no or m/f in case of gender.

Types of Logistic Regression:-

Binary Logistic Regression — It deals with categorical variables with two possible outcomes.
Multi-nominal Logistic Regression — It deals with categorical variables with three or more nominal categories.
Ordinal Logistic Regression — It deals with categorical variables with three or more ordinal categories. Ordinal means the categories will be in order. Example:- Rating(1–5).

Logistic regression model is similar to the Linear Regression model, that is, there are weights and biases and output is obtained using simple matrix operations.

For this notebook we would be considering the MNIST dataset, which consists of 28px by 28px gray-scale images of hand-written digits(0 to 9) along with labels for each image, indicating which digit it represents. It contains 60000 images which is used to train the model.

In the above code block we import torchvision which consists of popular datasets, model architectures and common image transformations for computer vision. It contains utilities to download and import popular datasets.

As we can see when the statement is executed for the first time, it downloads the data to the data/ directory next to the notebook and creates a PyTorch Dataset. On subsequent executions, the download is skipped as the data is already downloaded.

Generated by Author

From the above code we can see the size of the dataset, it has 60,000 images which can be used to train the model. There is also an additonal test set of 10,000 images which can be created by passing train=False to the MNIST class.

We can see that it is a pair consisting of 28px by 28px and a label. The image is an object of the class PIL.Image.Image, which is part of Python Imaging Library. It is a free and open-source library which adds processing capability to Python interpreter. This library supports many file formats and provide powerful image processing and graphics capability.

Generated by Author

Matplotlib is library for plotting graphs in Python. We can view the image within Jupyter using matplotlib. In addition, we add %matplotlib inline, this line tells Jupyter notebook that we want to plot the graph within the notebook, without this line plot will be shown as a popup. The statement that start with % are called magic commands and are used to configure the behavior of Jupyter.

We can analyze some of the images by plotting them with help of the Matplotlib. The cmap is a dictionary which maps numbers to colors. Matplotlib provides many built-in color maps. In this case the we set cmap as ‘gray’ so it maps the value between 0 and 1.

Note:- We should understand the fact that PyTorch cannot work with images. We need to convert images to tensors. We can do this by using torchvision.tranforms package which contain predefined functions for this purpose. We use ToTensor transform to convert images to PyTorch tensors.

In the above code block, we convert the images to tensors. From the max and min values it is clear that values are bound between 0 and 1 where 0 is black, 0 is white and values between are different shades of gray.

Training And Validation Dataset

There is no separate validation set, we manually split the 60000 images into training and validation dataset. We do this by using the random-split method in PyTorch.

It’s important to choose a random sample for creating a validation set, because training data is often ordered by the target labels i.e. images of 0s, followed by images of 1s, followed by images of 2s and so on. If we were to pick a 20% validation set simply by selecting the last 20% of the images, the validation set would only consist of images of 8s and 9s, whereas the training set would contain no images of 8s and 9s. This would make it impossible to train a good model using the training set, which also performs well on the validation set (and on real world data).

We can now create data loaders to help us load the data in batches. We’ll use a batch size of 128.

We set shuffle=True for the training dataloader, so that the batches generated in each epoch are different, and this randomization helps generalize & speed up the training process. On the other hand, since the validation dataloader is used only for evaluating the model, there is no need to shuffle the images.

Evaluation Metric and Loss Function

We need a way to evaluate how well our model is performing. A natural way to do this would be to find the percentage of labels that were predicted correctly i.e. the accuracy of the predictions.

The == performs an element-wise comparison of two tensors with the same shape, and returns a tensor of the same shape, containing 0s for unequal elements, and 1s for equal elements. Passing the result to torch.sum returns the number of labels that were predicted correctly. Finally, we divide by the total number of images to get the accuracy.

Model

Now that we have prepared our data loaders, we can define our model.

A logistic regression model is almost identical to a linear regression model i.e. there are weights and bias matrices, and the output is obtained using simple matrix operations (pred = x @ w.t() + b).
we can use nn.Linear to create the model instead of defining and initializing the matrices manually.
Since nn.Linear expects the each training example to be a vector, each 1x28x28 image tensor needs to be flattened out into a vector of size 784 (28 * 28), before being passed into the model.
The output for each image is vector of size 10, with each element of the vector signifying the probability a particular target label (i.e. 0 to 9). The predicted label for an image is simply the one with the highest probability.

Training the model

Evaluate function, will perform the validation phase, and a fit function which will perform the entire training process.

The fit function records the validation loss and metric from each epoch and returns a history of the training process. This is useful for debugging & visualizing the training process. Before we train the model, let’s see how the model performs on the validation set with the initial set of randomly initialized weights & biases.

Configurations like batch size, learning rate etc. need to picked in advance while training machine learning models, and are called hyper-parameters. Picking the right hyper-parameters is critical for training an accurate model within a reasonable amount of time, and is an active area of research and experimentation.

The initial accuracy is around 10%, which is what one might expect from a randomly initialized model (since it has a 1 in 10 chance of getting a label right by guessing randomly). Also note that we are using the .format method with the message string to print only the first four digits after the decimal point.

Now let’s train the model. We should pass the number of epochs.

From the first 5 epochs our model reached an accuracy of 82% on validation set. Let’s continue the process to see if we can improve upon its accuracy.

Testing with individual images

Let’s test out our model with some images from the predefined test dataset of 10000 images. We begin by recreating the test dataset with the ToTensor transform.

img.unsqueeze simply adds another dimension at the beginning of the 1x28x28 tensor, making it a 1x1x28x28 tensor, which the model views as a batch containing a single image.

Define a helper function predict_image, which returns the predicted label for a single image tensor.

Let’s take look at the overall loss and accuracy of the model on the test set.

In conclusion, we can see that the model probably won’t cross the accuracy threshold of 86% even after training for a very long time. One possible reason for this is that the learning rate might be too high. It’s possible that the model’s parameters are “bouncing” around the optimal set of parameters that have the lowest loss. You can try reducing the learning rate and training for a few more epochs to see if it helps.

Thanks for reading and see you on the next one!

Leave a comment if you need the link to notebook of this blog.