Feature Matching

Manav Kapadnis
K. R. I. S. S
Published in
7 min readJun 1, 2020

A feature descriptor is a representation of an image or an image patch that simplifies the image by extracting useful information and throwing away extraneous information. In the HOG feature descriptor, the distribution ( histograms ) of directions of gradients are used as features. Gradients of an image are useful because the magnitude of gradients is large around edges and corners ( regions of abrupt intensity changes ).Local shape information is often well described by the distribution of intensity gradients or edge directions even without precise information about the location of the edges themselves.

Preprocessing

To calculate a HOG descriptor, we need to first calculate the horizontal and vertical gradients; after all, we want to calculate the histogram of gradients. This is easily achieved by filtering the image with the following kernels.

Now we find the magnitude of the gradient using the formula:

G=sqrt[(Gₓ)²+(Gy)²]

θ=tan⁻¹(Gy/Gₓ)

Calculate Histogram of Gradients in 8×8 cells

The image is divided into 8×8 cells and a histogram of gradients is calculated for each 8×8 cells.Calculating a histogram over a patch makes this representation more robust to noise.8×8 cells in a photo of a pedestrian scaled to 64×128 are big enough to capture interesting features ( e.g. the face, the top of the head etc. ).The histogram is essentially a vector ( or an array ) of 9 bins ( numbers ) corresponding to angles 0, 20, 40, 60 … 160.

To calculate the final feature vector for the entire image patch, the 36×1 vectors are concatenated into one giant vector.

HOG in OpenCV

import matplotlib.pyplot as pltfrom skimage.feature import hog
from skimage import data, exposure
image = data.astronaut()fd, hog_image = hog(image, orientations=8, pixels_per_cell=(16, 16),
cells_per_block=(1, 1), visualize=True, multichannel=True)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4), sharex=True, sharey=True)ax1.axis('off')
ax1.imshow(image, cmap=plt.cm.gray)
ax1.set_title('Input image')
# Rescale histogram for better display
hog_image_rescaled = exposure.rescale_intensity(hog_image, in_range=(0, 10))
ax2.axis('off')
ax2.imshow(hog_image_rescaled, cmap=plt.cm.gray)
ax2.set_title('Histogram of Oriented Gradients')
plt.show()

The Local Binary Pattern Operator (LBP)

The LBP operator takes the 3x3 surrounding of a pixel and generates a binary 1 if the neighbour Of the centre pixel has larger value than the centre pixel. The operator generates a binary 0 if the neighbour is less than the centre. The eight neighbour of the centre can then be represented with an 8-bit number such as an unsigned 8-bit integer making it very compact.

The LBP operator was first introduced as a complimentary measure for contrast and therefore the contrast C is calculated as the average of the pixels above the threshold minus the average of the pixels under the threshold.

Laws’ Measures of Texture Energy

Laws proposed a method for classifying each pixel in an image based upon measures of local “texture energy”.The texture energy features represent the amounts of variation within a sliding window applied to several filtered versions of the given image. The filters are specified as separable 1D arrays for convolution with the image being processed.

L3 = [ 1 2 1 ] E3 = [ −1 0 1 ] S3 = [ −1 2 −1 ].

The operators L3,E3, and S3 perform center weighted averaging,symmetric first differencing (edge detection), and second differencing (spot detection), respectively.Nine 3 × 3 masks may be generated by multiplying the transposes of the three operators (represented as vectors) with their direct versions.The result of (L3T)* E3 gives one of the 3 × 3 Sobel masks.Operators of length five pixels may be generated by convolving the L3,E3, and S3 operators in various combinations.

L5 = L3 ∗ L3 = [ 1 4 6 4 1 ] (local average)

E5 = L3 ∗ E3 = [ −1 −2 0 2 1 ] (edges)

S5 = −E3 ∗ E3 = [ −1 0 2 0 −1 ] (spots)

R5 = S3 ∗ S3 = [ 1 −4 6 −4 1 ] (ripples)

W5 = −E3 ∗ S3 = [ −1 2 0 −2 1 ] (waves)

In the analysis of texture in 2D images, the 1D convolution operators are used in pairs to achieve various 2D convolution operators:

each of which may be represented as a 5 × 5 array or matrix. Following the application of the selected filters, texture energy measures are derived from each filtered image by computing the sum of the absolute values in a sliding window.

All of the filters listed, except L5, have zero mean, and hence the texture energy measures derived from the filtered images represent measures of local deviation or variation.The result of the L5 filter may be used for normalization with respect to luminance and contrast.Feature vectors composed of the values of various Laws’ operators for each pixel may be used for classifying the image into texture categories on a pixel by pixel basis. The results may be used for texture segmentation and recognition.

K-NN matching

The KNN algorithm is probably the simplest machine learning algorithm, that is, given a trained data set, for the new input instance, find the K instances closest to the instance in the training data set. The majority of the K instances belong to a certain feature for each class, it is determined that the input instance belongs to the same class.

If a sample in the feature space has the most similar among the k most similar samples belongs to a certain category, then the sample also belongs to this category. The K points closest to them vote to decide which category of data to be classified.

Bag-of-Visual-Words

Bag-of-features represent a data item (document, texture, image) as a histogram over features.

An object has several local features which are known as bag-of-features.

There are four main steps of BoW:

1. Extract features

2. Learn “visual vocabulary”

3. Quantize features using visual vocabulary

4. Represent images by frequencies of “visual words”

Visual Vocabulary

TF-IDF weighting

• Instead of computing a regular histogram distance, we’ll weight each word by its inverse document frequency

• inverse document frequency (IDF) of word j =

VLAD : Vector of Locally Aggregated Descriptors

  • Learning: k-means

output: k centroids : c1,…,ci,…c k

  • VLAD computation:
  • L2-normalized
  • Typical parameter: k=64 (D=8192)

Indexing algorithm: searching with quantization

  • Search/Indexing = distance approximation problem
  • The distance between a query vector x and a database vector y is estimated by
  • The choice of the quantizer is critical needs many centroids regular k-means and approximate k-means can not be used we typically want k=264 for 64-bit codes

Product quantization for nearest neighbor search

  • Vector split into m subvectors
  • Subvectors are quantized separately by quantizers

Compute the square distance approximation in the compressed domain To compute distance between query and many codes compute for each subvector and all possible centroids stored in look-up tables for each database code: sum the elementary square distances Each 8x8=64-bits code requires only m=8 additions per distance! IVFADC: combination with an inverted file to avoid exhaustive search

VLAD vectors undergoes two approximations

► mean square error from PCA projection

► mean square error from quantization

--

--

Manav Kapadnis
K. R. I. S. S

A Final Year undegrad of Dept. of Electrical Engineering at IIT Khragpur.I am deeply passionate about developing NLP and ML applications.