Image classification (American Sign language) using PyTorch

Sachin Som
9 min readJul 1, 2020

--

This is a part of course project conducted by jovian.ml with freecodecamp. In this project, I have used sign language MNIST dataset to predict sign language images using different models like logistic regression, feed forward NN, convolution NN.

Sign Language MNIST Dataset:

The American Sign Language letter database of hand gestures represent a multi-class problem with 24 classes of letters (excluding J and Z which require motion).

The dataset format is patterned to match closely with the classic MNIST. Each training and test case represents a label (0–25) as a one-to-one map for each alphabetic letter A-Z (and no cases for 9=J or 25=Z because of gesture motions). The training data (27,455 cases) and test data (7172 cases) are approximately half the size of the standard MNIST but otherwise similar with a header row of label, pixel1, pixel2….pixel784 which represent a single 28x28 pixel image with grayscale values between 0–255. The original hand gesture image data represented multiple users repeating the gesture against different backgrounds. The Sign Language MNIST data came from greatly extending the small number (1704) of the color images included as not cropped around the hand region of interest.

Preparing The Data:

First, I will import some libraries that I will use throughout this project:

Now we will load our csv file, for that we will define two variables and load test and train csv files:

The next step is to convert dataframes into NumPy arrays:

Next step is to convert all NumPy arrays into PyTorch tensors

We can see that we converted each image in a 3-dimensions tensor (1, 28, 28). The first dimension is for the number of channels. The second and third dimensions are for the size of the image, in this case, 28px by 28px.

Now we will define hyperparameters for our model

# Hyperparmeters
batch_size = 64
learning_rate = 0.001

# Other constants
in_channels = 1
input_size = in_channels * 28 * 28
num_classes = 26

Training and validation dataset

Now we are going to use three datasets-

  1. Training set — used to train the model (compute the loss and adjust the weights of the model using gradient descent).
  2. Validation set — used to evaluate the model while training, adjust hyperparameters (learning rate etc.) and pick the best version of the model.
  3. Test set — used to compare different models, or different types of modeling approaches, and report the final accuracy of the model.
val_size = 7455
train_size = len(train_ds_full) - val_size

train_ds, val_ds = random_split(train_ds_full, [train_size, val_size,])
len(train_ds), len(val_ds), len(test_ds)

Out:

(20000, 7455, 7172)

Now we will load the training, validation and test dataset in batches

train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=4, pin_memory=True)
val_dl = DataLoader(val_ds, batch_size*2, num_workers=4, pin_memory=True)
test_dl = DataLoader(test_ds, batch_size*2, num_workers=4, pin_memory=True)
for img, label in train_dl:
print(img.size())
break

torch.Size([64, 1, 28, 28])

Models for image classification

We are going to create three different models for this project:

  1. Logistic Regression
  2. Deep Neural Network
  3. Convolutional Neural Network

Logistic regression

class ASLModel(nn.Module):
def __init__(self):
super().__init__()
self.linear = nn.Linear(input_size, num_classes)

def forward(self, xb):
xb = xb.reshape(-1, in_channels*28*28)
out = self.linear(xb)
return out

def training_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
return loss

def validation_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss.detach(), 'val_acc': acc.detach()}

def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}

def epoch_end(self, epoch, result):
print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc']))

model = ASLModel()

Training The Model

def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))

def evaluate(model, val_loader):
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):
# Training Phase
for batch in train_loader:
loss = model.training_step(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
model.epoch_end(epoch, result)
history.append(result)
return history
result0 = evaluate(model, val_dl)
result0

Out:

{'val_loss': 163.75135803222656, 'val_acc': 0.040651481598615646}

The initial accuracy is around 4%, which is what one might expect from a randomly intialized model (since it has a 1 in 10 chance of getting a label right by guessing randomly). Also note that we are using the .format method with the message string to print only the first four digits after the decimal point.

We are now ready to train the model. Let's train for 5 epochs and look at the results.

history4 = fit(10, 0.000001, model, train_dl, val_dl)

Epoch [0], val_loss: 10.3801, val_acc: 0.9426 Epoch [1], val_loss: 10.3712, val_acc: 0.9425 Epoch [2], val_loss: 10.3667, val_acc: 0.9421 Epoch [3], val_loss: 10.3638, val_acc: 0.9422 Epoch [4], val_loss: 10.3586, val_acc: 0.9425 Epoch [5], val_loss: 10.3527, val_acc: 0.9424 Epoch [6], val_loss: 10.3484, val_acc: 0.9421 Epoch [7], val_loss: 10.3414, val_acc: 0.9424 Epoch [8], val_loss: 10.3342, val_acc: 0.9425 Epoch [9], val_loss: 10.3324, val_acc: 0.9428

Now with 40 iteration we went from 4% accuracy to 94% accuracy. It's quite amazing.

Deep Neural Network

A deep neural network (DNN) is an artificial neural network (ANN) with multiple layers between the input and output layers. The DNN finds the correct mathematical manipulation to turn the input into the output, whether it be a linear relationship or a non-linear relationship.

Defining the model

def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))
class ASLModel2(nn.Module):
"""Feedfoward neural network with 2 hidden layer"""
def __init__(self, in_size, out_size):
super().__init__()
# hidden layer 1
self.linear1 = nn.Linear(in_size, 512)
# hidden layer 2
self.linear2 = nn.Linear(512, 256)
# hidden layer 3
self.linear3 = nn.Linear(256, 128)
# output layer
self.linear4 = nn.Linear(128, out_size)

def forward(self, xb):
# Flatten the image tensors
out = xb.view(xb.size(0), -1)
# Get intermediate outputs using hidden layer 1
out = self.linear1(out)
# Apply activation function
out = F.relu(out)
# Get intermediate outputs using hidden layer 2
out = self.linear2(out)
# Apply activation function
out = F.relu(out)
# Get inermediate outputs using hidden layer 3
out = self.linear3(out)
# Apply a activation function
out = F.relu(out)
# Get predictions using output layer
out = self.linear4(out)
return out

def training_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
return loss

def validation_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss, 'val_acc': acc}

def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}

def epoch_end(self, epoch, result):
print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc']))

Using a GPU

To work with GPU’s we have to take help of some utility functions, so let’s define couple utility functions:

torch.cuda.is_available()

Out:

Truedef get_default_device():
if torch.cuda.is_available() == True:
return torch.device('cuda')
else:
return torch.device('cpu')
device = get_default_device()
device

Out:

device(type='cuda')def to_device(data, device):
"""Move tensor(s) to chosen device"""
if isinstance(data, (list,tuple)):
return [to_device(x, device) for x in data]
return data.to(device, non_blocking=True)
class DeviceDataLoader():
"""Wrap a dataloader to move data to a device"""
def __init__(self, dl, device):
self.dl = dl
self.device = device

def __iter__(self):
"""Yield a batch of data after moving it to device"""
for b in self.dl:
yield to_device(b, self.device)
def __len__(self):
"""Number of batches"""
return len(self.dl)
train_dl = DeviceDataLoader(train_dl, device)
val_dl = DeviceDataLoader(val_dl, device)
test_dl = DeviceDataLoader(test_dl, device)
print(train_dl.device)
print(test_dl.device)
print(val_dl.device)

cuda cuda cuda

Training the Model

def evaluate(model, val_loader):
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)
def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):
# Training Phase
for batch in train_loader:
loss = model.training_step(batch)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
model.epoch_end(epoch, result)
history.append(result)
return history
input_size, num_classes

Out:

(784, 26)model = ASLModel2(input_size, out_size = num_classes)# for loading our model into GPU
model = to_device(model, device)
model

Out:

ASLModel2(
(linear1): Linear(in_features=784, out_features=512, bias=True)
(linear2): Linear(in_features=512, out_features=256, bias=True)
(linear3): Linear(in_features=256, out_features=128, bias=True)
(linear4): Linear(in_features=128, out_features=26, bias=True)
)
history = [evaluate(model, val_dl)]
history

Out:

[{'val_loss': 14.12060546875, 'val_acc': 0.041877392679452896}]

So initially, this model has very small accuracy of almost 3% that is very low. To improve this, we will iterate the process upto some epochs:

In [172]:

history += fit(10, .001, model, train_dl, val_dl)

Epoch [0], val_loss: 1.9782, val_acc: 0.3867 Epoch [1], val_loss: 1.3152, val_acc: 0.5732 Epoch [2], val_loss: 1.0640, val_acc: 0.6374 Epoch [3], val_loss: 0.8769, val_acc: 0.6941 Epoch [4], val_loss: 0.6305, val_acc: 0.7931 Epoch [5], val_loss: 0.5267, val_acc: 0.8190 Epoch [6], val_loss: 0.3588, val_acc: 0.8940 Epoch [7], val_loss: 0.1764, val_acc: 0.9652 Epoch [8], val_loss: 0.2343, val_acc: 0.9314 Epoch [9], val_loss: 0.1089, val_acc: 0.9845

Testing on test dataloader

result = evaluate(model, test_dl)

In :

result

Out:

{'val_loss': 0.8095934987068176, 'val_acc': 0.7504112124443054}

So, with DNN we got 75% accuracy on our test dataloader.

Convolution Neural Network

In Deep Learning, a convolutional neural network (CNN, or ConvNet) is a class of deep neural networks, most commonly applied to analyzing visual imagery. They are also known as shift invariant or space invariant artificial neural networks (SIANN), based on their shared-weights architecture and translation invariance characteristics. They have applications in image and video recognition, recommender systems, image classification, medical image analysis, natural language processing, and financial time series.

def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))

class ASLBase(nn.Module):
def training_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
return loss

def validation_step(self, batch):
images, labels = batch
out = self(images) # Generate predictions
loss = F.cross_entropy(out, labels) # Calculate loss
acc = accuracy(out, labels) # Calculate accuracy
return {'val_loss': loss.detach(), 'val_acc': acc}

def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}

def epoch_end(self, epoch, result):
print("Epoch [{}], train_loss: {:.4f}, val_loss: {:.4f}, val_acc: {:.4f}".format(
epoch, result['train_loss'], result['val_loss'], result['val_acc']))

In:

class ASLCNNModel(ASLBase):
def __init__(self, in_channels, num_classes):
super().__init__()
self.network = nn.Sequential(
nn.Conv2d(in_channels, 28, kernel_size=3, padding=1),
nn.ReLU(),
nn.Conv2d(28, 28, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), #image size : 28*14*14

nn.Conv2d(28, 56, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.Conv2d(56, 56, kernel_size=3, stride=1, padding=1),
nn.ReLU(),
nn.MaxPool2d(2, 2), # image size : 56*7*7

nn.Flatten(),
nn.Linear(56*7*7, 512),
nn.ReLU(),
nn.Linear(512, 128),
nn.ReLU(),
nn.Linear(128, num_classes))

def forward(self, xb):
return self.network(xb)
model = ASLCNNModel(in_channels, num_classes)

Now again we have to load all our data into GPU:

train_dl = DeviceDataLoader(train_dl, device)
val_dl = DeviceDataLoader(val_dl, device)
test_dl = DeviceDataLoader(test_dl, device)
to_device(model, device);

Train the model:

@torch.no_grad()
def evaluate(model, val_loader):
model.eval()
outputs = [model.validation_step(batch) for batch in val_loader]
return model.validation_epoch_end(outputs)

def fit(epochs, lr, model, train_loader, val_loader, opt_func=torch.optim.SGD):
history = []
optimizer = opt_func(model.parameters(), lr)
for epoch in range(epochs):
# Training Phase
model.train()
train_losses = []
for batch in train_loader:
loss = model.training_step(batch)
train_losses.append(loss)
loss.backward()
optimizer.step()
optimizer.zero_grad()
# Validation phase
result = evaluate(model, val_loader)
result['train_loss'] = torch.stack(train_losses).mean().item()
model.epoch_end(epoch, result)
history.append(result)
return history

Evaluating our model:

evaluate(model, val_dl)

Out:

{'val_loss': 3.2676336765289307, 'val_acc': 0.04229172319173813}

now to improve this accuracy , we will iterate through 10 epochs

history = fit(num_epochs, 0.001 , model, train_dl, val_dl, opt_func)

Epoch [0], train_loss: 0.8871, val_loss: 0.0532, val_acc: 0.9804 Epoch [1], train_loss: 0.0219, val_loss: 0.0454, val_acc: 0.9852 Epoch [2], train_loss: 0.0154, val_loss: 0.0004, val_acc: 1.0000 Epoch [3], train_loss: 0.0001, val_loss: 0.0001, val_acc: 1.0000 Epoch [4], train_loss: 0.0000, val_loss: 0.0001, val_acc: 1.0000 Epoch [5], train_loss: 0.0000, val_loss: 0.0001, val_acc: 1.0000 Epoch [6], train_loss: 0.0000, val_loss: 0.0001, val_acc: 1.0000 Epoch [7], train_loss: 0.0000, val_loss: 0.0001, val_acc: 1.0000 Epoch [8], train_loss: 0.0000, val_loss: 0.0000, val_acc: 1.0000 Epoch [9], train_loss: 0.0000, val_loss: 0.0000, val_acc: 1.0000

Testing with test data

result = evaluate(model, test_dl)
result

Out:

{'val_loss': 0.36439788341522217, 'val_acc': 0.9439418911933899}

Now, we come to an end of our post.

We will predict some of our test images and compare them with their original labels. For this, we will define a function

def predict_image(img, model):
# Convert to a batch of 1
xb = to_device(img.unsqueeze(0), device)
# Get predictions from model
yb = model(xb)
# Pick index with highest probability
_, preds = torch.max(yb, dim=1)
# Retrieve the class label
return preds[0].item()

Thank you!

Entire notebook link:- https://jovian.ml/sachinsom507/final-project-sign-language-prediction

--

--