A journey from pixels to perception: Gradients in Computer Vision

Ali Nazem
Vizion Lab
Published in
5 min readNov 7, 2023
Illustration credit Ali Nazem

Imagine you’re holding a black-and-white sketch of a sunny beach scene. The outlines of the umbrellas, the distinct horizon where the sea meets the sky, and the silhouettes of people walking along the shore are all visible because of the edges, the lines where the color changes sharply. That’s pretty close to what gradients do in the world of computer vision. But why are these gradients so crucial, and how do computers use them to interpret images? I’ll try to scratch the surface of the fascinating world of gradients and their role in helping cold indifferent computers understand (aka to perceive) visuals.

The power of change: Gradients in computer vision

In the realm of computer vision, gradients are the subtle yet powerful changes in light and dark, helping to define shapes and textures in an image. Just like how I can tell a ball from a box by looking at the shadows and highlights, my computer uses gradients to distinguish between different objects and features in the image.

The gradient of an image at each pixel points in the direction of the greatest increase in intensity and its magnitude corresponds to the rate of change in that direction. In a case of grayscale image (2D), gradients calculated along both x (horizontal) and y (vertical) directions are commonplace. Gradients are also images. They are matrices after all.

See Figure 1 for an straightforward schematic I borrowed from OpenCV Documentation where the notion of gradient is established along the horizontal axes. Note how gradient direction is perpendicular to the “edge”. The edge is nothing but a sudden noticeable change in pixel intensity across neighboring pixels. The intensity value of point C (green) is unmistakably different than that of point B. For the sake of conversation, allow me to posit that the change in intensity from C to B is the greatest in magnitude. The juice? The larger the difference between nearby pixel intensities, the more likely it is for the computer to “perceive” the edge. And objects are nothing but enclosed edges. Wallaaaa!

Figure 1: Notion of gradients — Illustration credit OpenCV Documentation

A Tiny matrix, big role: Understanding kernels

To detect gradients, computer vision algorithms use something called a kernel, a tiny grid that’s a bit like a miniature magnifying glass. A 3x3 matrix (with the smallest kernel size 3x3) that captures the concept of directionality: Central pixel with an equal number of neighbors on each side. As it glides over an image, the kernel examines each pixel and its immediate neighbors to understand how much change there is in the brightness around that tiny area. The application of kernels to an image is called convolution. Each element of the kernel matrix has a weight, and these weights determine how the convolution affects the image. See the figure below that illustrate the notion of convolution.

Figure 2: Convolution operation on a 7x7 image matrix with a 3x3 kernel — Illustration credit Madhushree Basavarajaiah

For example, a kernel with higher weights at the center tends to preserve the central pixel’s value, while a kernel with even weights might average the values. Take Sobel kernels, for example, where two 3x3 matrices are introduced to obtain gradients along horizontal and vertical axes when applied to an image; these kernels are designed to respond maximally to edges running vertically and horizontally relative to the pixel grid, one kernel for each of the two perpendicular orientations. Such matrices follow the form (Sobel kernels)

Figure 3: Horizontal and vertical gradient kernels of Sobel — Illustration credit Ali Nazem

where they are designed to respond maximally to edges running vertically and horizontally relative to the pixel grid. As I reviewed earlier, the process of using these kernels to produce gradient approximations is known as convolution. For each pixel, there are now two values: Gx and Gy (see Figure 4). Guess what? We are now able to calculate magnitude and direction of the gradient. The magnitude conveys how strong the edge is at that pixel. The direction of the gradient points in the direction where the intensity is increasing most rapidly.

High magnitude → A strong edge presence

Low magnitude → A potentially flat region

Direction → How the edge is oriented, e.g., vertical, horizontal, or diagonal

Figure 4: Convolution operation to calculate bi-directional gradients of Sobel — Illustration credit Ross P. Holder

To see an application of this whole effort, I applied the Sobel gradient kernels to an interesting image I found when exploring Adobe stock images. I mean Python did it! Figure 5 presents the original image, grayscale version, and both gradient images.

Figure 5: Example outputs of the Sobel gradients — Original image from Adobe Stock

Real-world applications of gradients

The edges detected by the Sobel convolution ( notion of gradients) can be incredibly useful. Here are some ways this tool is used in the real world:

Autonomous Vehicles: By detecting the edges of lanes and road signs, the Sobel operator helps self-driving cars ‘see’ where they’re going.

Medical Imaging: It helps to delineate the contours of organs or abnormalities in scans, aiding doctors in diagnosis and treatment planning.

Quality Control: In factories, it can spot defects in products by noticing where edges don’t match up with where they should be.

Security: Enhancing the edges in fingerprint scans leads to better recognition and matching in security systems.

Digital Photography: It can sharpen your images by emphasizing the edges, making your vacation photos look crisp and clear.

Conclusion: The unseen heroes of computer vision

Gradients are the among the heroes of computer vision. They work behind the scenes, allowing computers to interpret the visual information almost as we do. With kernels and operators like Sobel, gradients help turn a cascade of pixels into meaningful insights that power technology, healthcare, security, and more. Next time I snap a photo or use a navigation system, I’d remember the role of gradients — they might just be one of the reasons my technology works so seamlessly!

--

--

Ali Nazem
Vizion Lab

A product detective. A cross of math, art, and empathy.