HOG Feature Descriptor

Dahi Nemutlu
7 min readSep 9, 2022

--

In computer vision there are many algorithms that are designed to extract spatial features to identify objects using information about image gradients. HOG, or Histogram of Oriented Gradients, is one of these algorithms. A histogram is an approximate representation of the distribution of numerical data that looks like a looks a bar graph. Each bar represents a group of data that falls in a certain range of values, also called bins. Orientation means the direction or orientation of an image gradient. HOG will produce a histogram of gradient directions in an image.

The HOG algorithm is applied in the following steps:

Image gradients
  1. Calculate the magnitude and direction of the gradients at each pixel of the input image. The above figure shows the gradients of a 8*8 cell in the image. The gradient change in the image is represented by a vector at each pixel. The direction of the vectors indicates the direction of the change of pixel intensity, and the magnitude tells us how strong the change in intensity is.
  2. Divide the image into cells of the same size (blue windows in the below animation). The cells’ size is an optional parameter. The size should be chosen in a way that the scale of features will fit to the cell.
  3. Group the gradient directions of all pixels in each cell into a specified number of orientation bins. Sum the magnitudes of the gradients in each bin, which will be the heights of the bins. The number of bins is usually set to 9. So that each bin’s width will be 20 degrees.
  4. Group the cells into blocks of same size (the red sliding window in the below animation). The amount of movement of the block window over the image is called stride. It is usually set to half the block size. The number of cells in the block and the stride are free parameters which set by the user.
  5. Normalize the cell histogram according to the other cells in the block. All the normalized histograms from all the blocks will be added up into a single feature vector. This feature vector is called the HOG descriptor.

We will be using OpenCV’s HOGDescriptor class to create the HOG descriptor. The parameters of the HOG descriptor are setup using the HOGDescriptor() function. The parameters of the HOGDescriptor() function and their default values are given below:

cv2.HOGDescriptor(win_size=(64, 128),
block_size=(16, 16),
block_stride=(8, 8),
cell_size=(8, 8),
nbins=9,
win_sigma=DEFAULT_WIN_SIGMA,
threshold_L2hys=0.2,
gamma_correction=true,
nlevels=DEFAULT_NLEVELS)
  • win_size: Size of detection window in pixels (width, height). Defines the region of interest. Must be an integer multiple of cell size.
  • block_size: Block size in pixels (width, height). Defines how many cells are in each block. Must be an integer multiple of cell size and it must be smaller than the detection window. The smaller the block the finer detail you will get.
  • block_stride: Block stride in pixels (horizontal, vertical). It must be an integer multiple of cell size. The block_stride defines the distance between adjecent blocks, for example, 8 pixels horizontally and 8 pixels vertically. Longer block_strides makes the algorithm run faster (because less blocks are evaluated) but the algorithm may not perform as well.
  • cell_size: Cell size in pixels (width, height). Determines the size fo your cell. The smaller the cell the finer detail you will get.
  • nbins: Number of bins for the histograms. Determines the number of angular bins used to make the histograms. With more bins you capture more gradient directions. HOG uses unsigned gradients, so the angular bins will have values between 0 and 180 degrees.
  • win_sigma: Gaussian smoothing window parameter. The performance of the HOG algorithm can be improved by smoothing the pixels near the edges of the blocks by applying a Gaussian spatial window to each pixel before computing the histograms.
  • threshold_L2hys: L2-Hys (Lowe-style clipped L2 norm) normalization method shrinkage. The L2-Hys method is used to normalize the blocks and it consists of an L2-norm followed by clipping and a renormalization. The clipping limits the maximum value of the descriptor vector for each block to have the value of the given threshold (0.2 by default).
  • gamma_correction: Flag to specify whether the gamma correction preprocessing is required or not. Performing gamma correction slightly increases the performance of the HOG algorithm.
  • nlevels: Maximum number of detection window increases.
HOG Descriptor: [0.25606513 0.01537703 0.04601376 ... 0.08963854 0.02995563 0.08873854]
HOG Descriptor has shape: (34596,)

The resulting HOG Descriptor (feature vector), contains the normalized histograms from all cells from all blocks in the detection window concatenated in one long vector. Therefore, the size of the HOG feature vector will be given by the total number of blocks in the detection window, multiplied by the number of cells per block, times the number of orientation bins.

Visualizing The HOG Descriptor

Unlike the Scikit-image library, the HOG Descriptor cannot be easily visualized by OpenCV, so we will first manipulate our feature vector before we plot it. To begin, we will reshape the HOG Descriptor in order to simplify our calculations. Then we will calculate the average histogram of each cell, and the histogram bins will then be transformed into vectors. We will plot the corresponding vectors for each cell in an image once we get the vectors.

Let’s take a closer look at the various cells in both the original and the HOG image to better understand the HOG image and histogram.

zoom_hog_cell(16, 2)

We selected a cell in the image that encloses a horizontal edge. Edges are areas in an image where pixel density changes abruptly. When we examine the gradients, we see a high rate of vertical changes. For this reason, as we can see in the histogram, the 90 degree bin will be quite dominant compared to the other bins.

zoom_hog_cell(27, 19)

When we look at the pixel gradients in the cell containing a vertical edge, we see that the pixel density changes are predominantly in the horizontal, close to 180 degree direction. Therefore, we expect the 170–180 degree bin in the histogram to dominate compared to the other bins. But when we look at the histogram, we also see that the 0–10 degree box is dominant. This is actually because the HOG algorithm uses unsigned gradients, so we can accept 0 and 180 degree gradients in the same direction.

zoom_hog_cell_2(22, 14)

Finally, let’s examine a cell containing a diagonal edge. Since the direction of pixel density change in a diagonal edge will be diagonal, the gradients will also be in the diagonal direction. We expect to see the dominant gradient strength in the 40–50 degree bin as the diagonal edge in the image is close to 45 degrees. We already observe this in the histogram. But we see two other, less dominant bins adjoining the 40–50 degree bin. This is because when generating histograms, gradients at angles near the boundaries of the bin contribute proportionally to adjacent bins. For example, a gradient with an angle of 40 degrees is right in the middle of the bins of 30 degrees and 50 degrees. Therefore, the gradient’s magnitude is divided equally into 30 degree and 50 degree bins. This is reason we see adjacent less dominant bins adjoining the most dominant bin.

In this article, we went into the details of the HOG algorithm, which is widely used in object identification, we obtained the HOG feature vector from an image using the Python programming language and the OpenCV library. We tried to understand and interpret it better by examining the HOG image and histograms looking at a closer scale. If you want to take a look at an application where I clustered images using the HOG feature descriptor, you can reach it using the link below. Also you can access to my github repo via below link where the full code is contained. I hope you enjoyed reading and applying it. See you in my next post!

--

--