Image Classification with K Nearest Neighbours

Paarth Bir
The Startup
Published in
8 min readJul 26, 2019

K-Nearest Neighbours (k-NN) is a supervised machine learning algorithm i.e. it learns from a labelled training set by taking in the training data X along with it’s labels y and learns to map the input X to it’s desired output y.

The k-NN algorithm is arguably the simplest of the machine learning algorithms. The model only consists of the training data, that is, the model simply learns the entire training set and for prediction gives the output as the class with the majority in the ‘k’ nearest neighbours calculated according to some distance metric.

A taste of k-NN (From scikit-learn documentation)

The working in a little more detail is as follows:

After the model has stored the training set for prediction, it takes a test image to be predicted, calculates the distance to every image in the training set and obtains the ‘k’ training images closest to the test image. It then outputs the class according to some voting procedure from the labels of these ‘k’ neighbours , generally a majority vote.

The distance metric used to calculate distances may differ, such as either a L1 distance function which is the summation of the differences between the pixels of the images.

L1 distance metric
An example of how the L1 distance works.

An alternative distance metric may be the L2 distance or more commonly called the Euclidean distance.

L2 distance metric

In other words we would be computing the pixel-wise difference as before, but this time we square all of them, add them up and finally take the square root.

It is interesting to note that due to the squared differences in the L2 distance, it is much more strict when the pixel difference is too large.

Now, we move onto the practical considerations: Hyperparameters in k-NN and how they affect the performance.

As k-NN is a very simple algorithm it doesn’t really have a lot of hyperparameters to tweak, just the two: the distance metric and the value of ‘k’. So what we can do is, run our model for various values of ‘k’ and get the model with the best validation accuracy, which will be used as our final model on the test set.

Code

We’ll be using a subset of the Kaggle Cats and Dogs dataset creating a train directory, with two directories of class 0 i.e. Cats and class 1 i.e. Dogs both containing ~2000 images.

Code Block 1: Importing useful libraries

import numpy as np
from keras.preprocessing import image
import cv2 as cv
import matplotlib.pyplot as plt
from pathlib import Path
from sklearn.model_selection import GridSearchCV, train_test_split
from skimage.io import imread
print("Files imported successfully")
import pandas as pd
import warnings
warnings.filterwarnings('ignore')
pd.set_option('display.max_columns', 30*30 + 1)

Code Block 2: Preparing X, y

def load_image_files(container_path, dimension=(64, 64)):
image_dir = Path(container_path)
folders = [directory for directory in image_dir.iterdir() if directory.is_dir()]
categories = [fo.name for fo in folders]

descr = "A image classification dataset"
images = []
flat_data = []
target = []
count = 0
train_img = []
for i, direc in enumerate(folders):
for file in direc.iterdir():
count += 1
img = imread(file)
img = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
img_pred = cv.resize(img, (50, 50), interpolation=cv.INTER_AREA)
img_pred = image.img_to_array(img_pred)
img_pred = img_pred / 255
train_img.append(img_pred)

X = np.array(train_img)

return X

X = []
X = load_image_files("data/train")

y0 = np.zeros(2000)
#2000 is the number of Cats in X
y1 = np.ones(2134)
#2134 is the number of Dogs in X
y = []
y = np.concatenate((y0,y1), axis=0)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.2)
X_val, X_test, y_val, y_test = train_test_split(X_test, y_test, random_state=42, test_size=0.5)
print("X_train: "+str(X_train.shape))
print("X_test: "+str(X_test.shape))
print("X_val: "+str(X_val.shape))
print("y_train: "+str(y_train.shape))
print("y_test: "+str(y_test.shape))
print("y_val: "+str(y_val.shape))

from builtins import range
from builtins import object

num_training = X_train.shape[0]
mask = list(range(num_training))
X_train = X_train[mask]
y_train = y_train[mask]

num_test = X_test.shape[0]
mask = list(range(num_test))
X_test = X_test[mask]
y_test = y_test[mask]

num_val = X_val.shape[0]
mask = list(range(num_val))
X_val = X_val[mask]
y_val = y_val[mask]

# Reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))

print("X_train: "+str(X_train.shape))
print("X_test: "+str(X_test.shape))
print("X_val: "+str(X_val.shape))
print("y_train: "+str(y_train.shape))
print("y_test: "+str(y_test.shape))
print("y_val: "+str(y_val.shape))

Output:

X_train: (3307, 50, 50, 1)
X_test: (414, 50, 50, 1)
X_val: (413, 50, 50, 1)
y_train: (3307,)
y_test: (414,)
y_val: (413,)

X_train: (3307, 2500)
X_test: (414, 2500)
X_val: (413, 2500)
y_train: (3307,)
y_test: (414,)
y_val: (413,)

Explanation:

We iterate for all images in the data/train directory, convert the images into grayscale and resize to a specific size (50, 50). Using the Keras pre-processing library the image is converted to an array and then normalised. Append each such derived array into a numpy array X.

For generating ‘y’,form a numpy array of zeros and ones (which are the categories: Cats and Dogs) according to the number of samples in each class.

Using train_test_split from scikit-learn, ‘X’ and ‘y’ are distributed into (X_train, y_train), (X_val, y_val) and (X_test, y_test).

Code Block 3: k-NN class definition

class KNearestNeighbor(object):
""" a kNN classifier with L2 distance """

def __init__(self):
pass

def predict_label(self, dists, k=1):
num_test = dists.shape[0]
y_pred = np.zeros(num_test)
for i in range(num_test):
closest_y = []
closest_y = self.y_train[np.argsort(dists[i])][0:k]
y_pred[i] = np.bincount(closest_y).argmax()
return y_pred

def train(self, X, y):
"""
Train the classifier. For k-nearest neighbors this is just
memorizing the training data.

Inputs:
- X: A numpy array of shape (num_train, D) containing the training data
consisting of num_train samples each of dimension D.
- y: A numpy array of shape (N,) containing the training labels, where
y[i] is the label for X[i].
"""
self.X_train = X
self.y_train = y

def predict(self, X, k=1):
"""
Predict labels for test data using this classifier.

Inputs:
- X: A numpy array of shape (num_test, D) containing test data consisting
of num_test samples each of dimension D.
- k: The number of nearest neighbors that vote for the predicted labels.
- num_loops: Determines which implementation to use to compute distances
between training points and testing points.

Returns:
- y: A numpy array of shape (num_test,) containing predicted labels for the
test data, where y[i] is the predicted label for the test point X[i].
"""
dists = self.compute_distances_no_loops(X)

return self.predict_labels(dists, k=k)

def compute_distances_no_loops(self, X):
"""
Compute the distance between each test point in X and each training point
in self.X_train using no explicit loops.

Input / Output: Same as compute_distances_two_loops
"""
num_test = X.shape[0]
num_train = self.X_train.shape[0]
dists = np.zeros((num_test, num_train))
#########################################################################
dists = np.sqrt((X ** 2).sum(axis=1, keepdims=1) + (self.X_train ** 2).sum(axis=1) - 2 * X.dot(self.X_train.T))

return dists

def predict_labels(self, dists, k=1):
"""
Given a matrix of distances between test points and training points,
predict a label for each test point.

Inputs:
- dists: A numpy array of shape (num_test, num_train) where dists[i, j]
gives the distance betwen the ith test point and the jth training point.

Returns:
- y: A numpy array of shape (num_test,) containing predicted labels for the
test data, where y[i] is the predicted label for the test point X[i].
"""
num_test = dists.shape[0]
y_pred = np.zeros(num_test)
for i in range(num_test):
# A list of length k storing the labels of the k nearest neighbors to
# the ith test point.
closest_y = []
closest_y = self.y_train[np.argsort(dists[i])][0:k]
closest_y = closest_y.astype(int)
y_pred[i] = np.bincount(closest_y).argmax()
return y_pred

Explanation:

Class KNearestNeighbor is defined with functions to train, predict and compute distance. In the train function, the model simply stores the values of X_train and y_train. For computing distance, L2 distance metric is used.

Code Block 4: Testing our implementation

print("Val Accuracy for k=1")
classifier = KNearestNeighbor()
classifier.train(X_train, y_train)
dists = classifier.compute_distances_no_loops(X_val)
y_val_pred = classifier.predict_labels(dists, k=1)
num_correct = np.sum(y_val_pred == y_val)
accuracy = float(num_correct) / num_val
print('Got %d / %d correct => accuracy: %f' % (num_correct, num_val, accuracy))

Output:

Val Accuracy for k=1
Got 213 / 413 correct => accuracy: 0.515738

Explanation:

Validation accuracy of 51.57% is obtained with using one nearest neighbor.

Code Block 5: Finding the best ‘k’

print("Using SKLEARN")
lix = []
liy = []
index=0
acc=0
from sklearn.neighbors import KNeighborsClassifier
for k in range(1, 100):
neigh = KNeighborsClassifier(n_neighbors=k)
neigh.fit(X_train, y_train)
liy.append(neigh.score(X_val, y_val))
if liy[k-1]>acc:
acc=liy[k-1]
index=k-1
lix.append(k)

plt.plot(lix, liy)
plt.show()
print("max acc at k="+str(index+1)+" acc of "+str(acc))

Output:

Using SKLEARN

A plot of Validation accuracy for various values of k.

max acc at k=43 acc of 0.6101694915254238

Explanation:

Using k-NN from the sklearn library, a loop is run over 100 values of k and the value that provides maximum validation accuracy is used. This is obtained as k=43 giving validation accuracy as 61.01%

Code Block 6: Finding Test Accuracy for best k

neigh = KNeighborsClassifier(n_neighbors=43)
neigh.fit(X_train, y_train)
print("Test Accuracy: "+str(neigh.score(X_test, y_test)))

print("Using our own k-NN")
classifier = KNearestNeighbor()
classifier.train(X_train, y_train)
dists = classifier.compute_distances_no_loops(X_test)
y_test_pred = classifier.predict_labels(dists, k=43)
num_correct = np.sum(y_test_pred == y_test)
accuracy = float(num_correct) / num_test
print('With k = 43 Got %d / %d correct => accuracy: %f' % (num_correct, num_test, accuracy))

Output:

Test Accuracy: 0.5917874396135265
Using our own k-NN
With k = 43 Got 245 / 414 correct => accuracy: 0.591787

Explanation:

Accuracy of the algorithm is determined for k = 43, using both the scikit library kNN and our own kNN implementation. Same test accuracy of 59.17% is observed in both cases.

Code Block 7: Predicting an Image

print("Predicting custom image")
img = cv.imread("data/test/Dog/12.jpg")
img = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
img_pred = cv.resize(img, (50, 50), interpolation=cv.INTER_AREA)
img_pred = image.img_to_array(img_pred)
img_pred = img_pred/255
img_pred = np.reshape(img_pred, (1, img_pred.shape[0]*img_pred.shape[1]))

classifier2 = KNearestNeighbor()
classifier2.train(X_train, y_train)
# Test your implementation:
dists2 = classifier2.compute_distances_no_loops(img_pred)
labels = ["Cat", "Dog"]
y_test_pred = classifier2.predict_labels(dists2, k=43)
print(labels[int(y_test_pred)])
The image used for prediction.

Output:

Predicting custom image
Dog

Explanation:

The kNN algorithm is now used to classify an input image from the categories.

Conclusion:

The k-NN algorithm gives a testing accuracy of 59.17% for the Cats and Dogs dataset, only a bit better than random guessing (50%) and a large distance from human performance (~95%). The k-Nearest Neighbour is hence, near only in it’s etymology.

A large part of the code is influenced by the assignment related to k-NN for the course CS231n offered by Stanford University. I find it an extremely informative course on Computer Vision with Deep Learning.

--

--

Paarth Bir
The Startup

A Machine learning enthusiast with a penchant for Computer Vision.