Logistic Regression With PyTorch

Rina Buoy
5 min readMay 6, 2019

--

From course note of Nando de Freitas ( Machine Learning — 2014–2015)

This article looks into logistic regression(LR) which is one of the popular ML algorithms. LR is a special case of artificial neural network in which there is no hidden layer of neurons. LR can be applied to binary and multi-class classification problems. LR is readily available in most machine learning packages (TensorFlow, Scikit-learn, ML.NET etc.) and so it is unlikely that you will need to code LR from scratch. However, understanding exactly how it works will help you if you ever need to work on LR using machine learning packages.

Understanding Logistic Regression

What LR tries to do is to identity the ‘best’ class to which an input/feature belongs. To achieve that, LR computes class probabilities and an input/feature is assigned to the class of highest probability. Mathematically, we can express as follows:

Binary case:

φ is an feature/input, w is a weight vector. σ is logistic sigmoid function (that is how the name Logistic regression comes from). The probability of class 2 given feature, φ is obtained by probability rule:

Multi-class case:

We replace sigmoid function by soft-max transformation.

k denotes k-th class.

Activation ak is given by:

Likelihood, maximum likelihood and loss

For a given input vector, likelihood function is given by:

Class probability (Udacity — Intro to PyTorch)

For n input vectors, joint-likelihood is :

Since probabilities can be very small, joint-likelihood can face numerical precision issue due to undesirable multiplication effect. To avoid this, it is more desirable to take negative logarithm of joint-likelihood. This is called negative log-likehood (NLL) . We can see that NLL is basically cross-entropy.

The goal of LR is to find best weight matrix which maximize the likelihood or minimize the NLL.

There is no closed-form analytical solution to the NLL and therefore, we need to rely on some sort of iterative optimization algorithms to locate the optimal set of weights.

Now we’ve done with the theoretical parts. Let’s see how we can implement LR in PyTorch.

LR in PyTorch

In fact, we can implement LR from scratch with just numpy. That means we need to code LR model, batch management, gradient computations and weight update. However, we’ll show that this can be easily done in PyTorch. PyTorch is a relatively new deep learning library which is based on powerful autograd engine. It has many handy support modules to facilitate deep learning model set up, data loading and processing, and inference (learning).

In this section, we’ll demonstrate LR implementations in PyTorch via well-known MNIST dataset which consists of greyscale handwritten digits. Each image is 28x28 pixels and there are 10 outputs (0 to 9). So in this case of classification, we have 784*10 weights to be inferred from the training data, which is quite a lot. In reality, we may opt for CNN for sake of speed and accuracy.

We start by importing required libraries.

import torch
import numpy as np
import matplotlib.pyplot as plt
from torch import nn
from torch import optim
from torchvision import datasets, transforms

We want to convert input pixels into tensors and standardize them. So we create a transform object.

# Define a transform to normalize the datatransform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((0.5,), (0.5,)),])

We use datasets to download the MNIST dataset and DataLoader to manage training batches.

trainset = datasets.MNIST('~/.pytorch/MNIST_data/', download=True, train=True, transform=transform)trainloader = torch.utils.data.DataLoader(trainset, batch_size=64, shuffle=True)

We set batch size of 64 and randomize order of data within batch by setting shuffle=True.

Next, we set-up a logistic regression model which takes input vector of size = 784 and produces output vector of size =10. We take advantage of nn.Sequentia module lin PyTorch to do so.

# Build a logistic regression modelmodel = nn.Sequential(nn.Linear(784, 10),nn.LogSoftmax(dim=1))

Then, we create loss and optimizer objects. In this case, we use negative log likelihood loss function and stochastic gradient descent optimizer which are available in PyTorch.

# Define the losscriterion = nn.NLLLoss()# Optimizers require the parameters to optimize and a learning rateoptimizer = optim.SGD(model.parameters(), lr=0.01)

Now, we are ready to setup a batch-training loop. Basically, for each iteration/epoch and for each training batch, we compute the gradient of weights and the optimizer uses the calculated gradients to update weights. The process repeats until maximum iteration is reached.

epochs = 10for e in range(epochs):running_loss = 0for images, labels in trainloader:# Flatten MNIST images into a 784 long vectorimages = images.view(images.shape[0], -1)optimizer.zero_grad() # empty the gradients, otherwise gradients are accumulated.output = model(images)loss = criterion(output, labels)loss.backward() # auto-grad optimizer.step() # update  weights running_loss += loss.item()else:print(f"Training epoch {e} : loss: {running_loss/len(trainloader)}")

Here are the training results :

With the model trained, let’s check out a prediction on one of the training data. The below pieces of codes will calculate class probabilities for one of the training images. Then, we plot gray-scale image along with bar chat of class probabilities.

images, labels = next(iter(trainloader))img = images[0].view(1, 784)# We don't need to gradients here. Turn it off with torch.no_grad(): logps = model(img)# Output of the network are log-probabilities, need to take exponential for probabilitiesps = torch.exp(logps)fig1, (ax1, ax2) = plt.subplots(1, 2)ax1.imshow(images[0].numpy().squeeze(), cmap='Greys_r')ax2.bar(np.arange(10), ps.numpy().squeeze(), align='center', alpha=0.5)plt.show()

Here is what we get:

Another image:

Wrapping Up

There are many sophisticated machine learning algorithms. So LR is not fancy algorithm, compared to those. However, its theoretical fundamentals are very fascinating and are involved in many core concepts of machine learning. I personally learnt a lot from writing this article. Having solid understanding of how LR works will be helpful in case we will need to work with LR using different machine learning packages.

Most of equations in this article are from Bishop’s Pattern Recognition and Machine Learning.

--

--

Rina Buoy

An applied NLP researcher at Techo Startup Center (TSC)