Histogram of Oriented Gradients (HOG) for Multiclass Image Classification and Image Recommendation

Anirban Malick
The Startup
Published in
8 min readJul 15, 2020

Introduction:

The magic of machine learning is the more we understand the concepts and the idea of origination, the easier it becomes for us. Here in this article we will look into the approach of using Histogram of Oriented Gradients in image classification and image recommendation. The data can be found here and the Jupyter Notebook containing the solution can be found here.

The Dataset:

Source: Kaggle Fashion Image Classification Dataset (Small)

The dataset has Master Category, Sub Category, Gender, Season and Usage type labelled to each of the image. The idea is to use the dataset for an image classification as well as for a recommendation engine. Let’s look at the distribution first!

Unique values for each column. For each of gender, masterCategory, subCategory, gender, usage and season columns KNN Classifiers have been used for image classification followed by, K Nearest Neighbours being used for image recommendation

The objective of this design was to come up with a solution which would classify all the categories for different classes( classes are as mentioned in below charts with there distribution). Followed by, a recommendation engine was built which would give us top n matched image based on a test image selected by user.

Number of records for different classes (Only Top 10 were shown) under each column

The classification and recommendation are built on a local feature extraction and description method called Histogram of Oriented Gradients (HOG). With the use of different feature detector (Example: SIFT, Shi-Thomas, ORB, FAST etc.) we can localize the features as well as match the extracted features among multiple images. But in order to use the information for training a model we would need the extracted features in a 1D vector form(like [x1,x2,..,xn]). The idea of HOG (“Histogram of Oriented Gradients for Human Detection” — Dalal & Triggs, 2005)was built on the same intuition . Let’s see below how HOG works and how we can compute and configure it in Python.

Note: HOG was originally invented by Dalal & Triggs (2005) and they used specific parameters for best performance in human detection. However, the parameters are not generic and may vary across different problems based on the object of the image type

Steps for Computing HOG:

HOG is a technique for transforming an image to a histogram of gradients and later use the histograms to make a 1D matrix which would be used for training a model.

Before we compute let’s import first!

import os
import numpy as np
import pandas as pd
import cv2 as cv
from pathlib import Path
import warnings
from skimage.feature import hog
import tqdm
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import NearestNeighbors
warnings.filterwarnings("ignore")
pd.options.display.max_columns = None

and then read the images!

all_images = []
#labels = []
def load_image(ids,path=image_folder):
img = cv.imread(image_folder+ids+'.jpg',cv.IMREAD_GRAYSCALE) #load at gray scale
#img = cv.cvtColor(img, cv.COLOR_BGR2GRAY) #convert to gray scale
return img,ids
#20k samples were taken for modeling
for ids in tqdm(list(styles.id)[:20000]):
img,ids = load_image(str(ids))
if img is not None:
all_images.append([img,int(ids)])
#labels.append(ids)
len(all_images)

Now let’s consider the below image,

Let us assume the red coloured box represents the 8x8 matrix in grayscale with numbers in each cell. Before stepping into any feature engineering for images, it is always recommended to do the followings:

  1. Resize : Resize all the images into a single shape in order to avoid any future issue related to computation. For this case, all the images were of shape (60x80). However, if you want to perform resizing operation refer below:
def resize_image(img,ids):
return cv.resize(img, (60, 80),interpolation =cv.INTER_LINEAR)

all_images_resized = [[resize_image(x,y),y] for x,y in all_images]
len(all_images_resized)

2. Normalize : In order to avoid brightness, contrast or other illumination effects.

3. Filtering: In order to take account a few neighboring pixels in stead of taking a single pixel value as the true value of the pixel. Gaussian filtering gives weightage to central pixel the most and the neighbouring pixel in decreasing order w.r.t. it’s distance from the central pixel based on the size of the window.

And finally with the filtered image the computation goes as below:

The whole image is divided into a number of blocks(b). A typical block is as mentioned above in red box. Furthermore, a block can be considered as a collection of cells(c) as mentioned in black box. For the above image, size of b is 8x8 and c is 4x4

Next, for each cell the gradient magnitude and direction are computed (Gradient magnitude can be simply assumed as Sobel derivative or the difference between any two consecutive pixel values in x and y for simplicity) at each point in the cell. Thereafter, a histogram (of bin size n) is formed in order to bin the gradient magnitude values w.r.t gradient direction. And finally the histogram is normalized based on some rule to form a n dimensional vector.

So, for one cell we end up with a n dimensional vector. Next, the operation is done by shifting the block right with 50% overlap and down with 50% overlap to cover the whole image.

Finally all these histograms are concatenated to form a 1D vector known as HOG feature descriptor.

HOG can be computed with the following piece of code.

##HOG Descriptor#Returns a 1D vector for an image
ppcr = 8
ppcc = 8
hog_images = []
hog_features = []
for image in tqdm(train_images):
blur = cv.GaussianBlur(image,(5,5),0) #Gaussian Filtering
fd,hog_image = hog(blur, orientations=8, pixels_per_cell=(ppcr,ppcc),cells_per_block=(2,2),block_norm= ‘L2’,visualize=True)
hog_images.append(hog_image)
hog_features.append(fd)
hog_features = np.array(hog_features)
hog_features.shape

Parameters:

For this problem,

— block size of 16x16 has been considered and

— cell size was 8x8

which makes the pixel_per_cell = (8x8) and cells_per_block = (2x2)

— orientations(=8) is the number of histogram bins for each cell. In these 8 bins the 16 gradient magnitudes will be placed and they will be added in each bin to represent magnitude of that orientation bin. In case of conflict in assignment of a gradient among two consecutive bins, the gradient magnitudes are usually voted on the basis of gradient interpolation.

— block_norm = ‘L2’. Other options are L1 normalization or L2-Hys (Hysteresis). L2-Hys works for some of the cases to reduce noise. It is done using L2-norm, followed by limiting the maximum values to 0.2 and renormalization using L2-norm.

#normalization by 'L2-Hys'
out = block / np.sqrt(np.sum(block ** 2) + eps ** 2)
out = np.minimum(out, 0.2)
out = out / np.sqrt(np.sum(out ** 2) + eps ** 2)

With the actual image shape of 60x80 and block size of 16x16, total 6x9 = 54 blocks will be created (considering 50% overlap in any step in x,y) whereas, in each block we will have 4 cells having 8 bin histogram each. Hence, the length of feature vector will be 54x4x8 = 1728

Below are some of the visual representations of the HOG images:

The idea of using gradient direction in modeling is because human cortex system works in similar way. Cerebral Cortex gets attention when human sees some objects into a particular direction, or human changes the angle with the object in order to see it better

As the classification is multiclass and within class distribution is also not uniform, it is recommended to use stratified sampling.

X_train, X_test, y_train, y_test = train_test_split(hog_features,df_labels['class'],test_size=0.2,stratify=df_labels['class'])print('Training data and target sizes: \n{}, {}'.format(X_train.shape,y_train.shape))print('Test data and target sizes: \n{}, {}'.format(X_test.shape,y_test.shape))
============================================= Training data and target sizes:
(15998, 1728), (15998,)
Test data and target sizes:
(4000, 1728), (4000,)

And finally the data is ready to be fit by a classifier. For this problem SVM, Random Forest and KNN have been used. It was observed that with all of the nearest neighbor finding algorithms (ball_tree, kd_tree and brute force) KNN outperformed the other classifiers. Finally, ‘brute’ force search was used as the computation was much faster as compared to ball_tree and kd_tree.

Snippet for KNN Classifier

test_accuracy = []
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X_train)
classifier = KNeighborsClassifier(n_neighbors=3,algorithm='brute')
classifier.fit(X_scaled, y_train)
test_accuracy = classifier.score(scaler.transform(X_test), y_test)
print(test_accuracy)

The code only classifies among the five following categories for column masterCategory: [‘Apparel’, ‘Accessories’, ‘Footwear’, ‘Personal Care’, ‘Free Items’]. All records belonging to other categories are named as ‘Other’.

By changing the column names in the Jupyter Notebook, the classification can be done for any column type.

Below are some of the accuracy figures which would help evaluate the model performance.

And the classification report for reference!

list_of_categories = categories +['Others']print("Classification Report: \n Target: %s \n Labels: %s \n Classifier: %s:\n%s\n"
% (target,list_of_categories,classifier, metrics.classification_report(y_test, y_pred)))
df_report = pd.DataFrame(metrics.confusion_matrix(y_test, y_pred),columns = list_of_categories )
df_report.index = [list_of_categories]
df_report

The classification report is shown for masterCategory only.

Finally let’s make the inference for a test image

#test image with id
test_data_location = root+'/test/'
img = cv.imread(test_data_location+'1570.jpg',cv.IMREAD_GRAYSCALE) #load at gray scale
image = cv.resize(img, (60, 80),interpolation =cv.INTER_LINEAR)
ppcr = 8
ppcc = 8
hog_images_test = []
hog_features_test = []
blur = cv.GaussianBlur(image,(5,5),0)
fd_test,hog_img = hog(blur, orientations=8, pixels_per_cell=(ppcr,ppcc),cells_per_block=(2,2),block_norm= 'L2',visualize=True)
hog_images_test.append(hog_img)
hog_features_test.append(fd)
hog_features_test = np.array(hog_features_test)
y_pred_user = classifier.predict(scaler.transform(hog_features_test))
#print(plt.imshow(hog_images_test))
print(y_pred_user)
print("Predicted MaterCategory: ", mapper[mapper['class']==int(y_pred_user)]['masterCategory'])

And some recommendations!

scaler_global = MinMaxScaler()
final_features_scaled = scaler_global.fit_transform(hog_features)

neighbors = NearestNeighbors(n_neighbors=20, algorithm='brute')
neighbors.fit(final_features_scaled)
distance,potential = neighbors.kneighbors(scaler_global.transform(hog_features_test))
print("Potential Neighbors Found!")
neighbors = []
for i in potential[0]:
neighbors.append(i)
recommendation_list = list(df_labels.iloc[neighbors]['id'])
recommendation_list

a simple webservice has been built using the classification values and recommended images using Flask (The development of UI is kept out of the scope for this article). The webservice looks as below:

Conclusion:

The above explanation shows what is the intuition behind HOG, how we can use it to describe features of an image. In the next, the HOG features were computed and used in a KNN classifier and later in finding out K Nearest Neighbors. Both the cases achieve high level of accuracy without using any deep learning methods. There was a few cases where the image was mislabelled or image having multiple objects but labelled in a single class which affected our model. Next step would be to identify the root cause of misclassification and making a better classification and recommendation engine.

Thanks for reading my post! Feel free to connect in case of any clarification. 🍻 🍺 😃 😃

--

--