Bag of Visual Words(BoVW)

2 min readJul 12, 2019

BoVW is a commonly used technique in image classification. The idea behind this technique, is similar to the bag of words in NLP but in this technique we use image features as words.

Step by Step Bag Of Visual Words

We extract local features from several images using SIFT.

2. Quantize the feature space. Make this operation via clustering algorithms such as K-means. The center points, that we get from the clustering algorithm, are our visual words.

3. Extract local features and compare these features with visual words to create histograms for each image both for the test and train dataset.

4. Predict the class of test images comparing with each histogram of train images. We will use 1-NN to predict the class of each test image.

5. Calculate the accuracy.

Implementation of BoVWs in Python

Load train and test images into dictionaries.

2. Extracts local features from images using SIFT. The below function returns an array whose first index holds a list that holds all local features from all images without an order. This is our visual dictionary. And the second index holds the sift vectors dictionary which holds the descriptors but this is separated class by class

NOT : To create visual dictionary, we only use train dataset.

3. Send the visual dictionary to the k-means clustering algorithm and find the visual words which are center points.

4. Create histograms for both test and train images.

5. Predict classes of the test images with k-NN function. “k” is 1 in this case.

6. Calculate the accuracy.

Not: See also k-NN blog if you do not know.

Bag of Visual Words(BoVW)

Step by Step Bag Of Visual Words

Implementation of BoVWs in Python

Written by Aybüke Yalçıner