Visual Saliency for Object Detection

Saurabh Kadam
Analytics Vidhya
Published in
5 min readApr 16, 2020

This topic is for theory and basic understanding in visual object tracking

In simple terms, Visual saliency is the term where how your item differentiates between your other object. Basically it means how it grabs our attention than other things.

How visual saliency works for the human brain?

Let's say we see the object. Imagine we are watching a picture of a kid playing with a ball. There are two objects in Image. Human and Ball.

Visual attention is a process that directs a tiny fraction of the information arriving at the primary visual cortex to high-level centers involved in visual working memory and pattern recognition.

Visual attention works as a spotlight manner. Theory of spotlight proposed by William James. Where Spotlight includes a focal point in which things are viewed clearly.The area surrounded by a focal point called fringe which is visible but not as focused as a center or a certain distance from center. Area outside of the fringe is called margin.

another theory called the zoom-lens model. It contains camera lens type architecture in which we can adjust the focal distance for the image.

so this is how we deduce important points in the image. Some time color wavelength plays an important part in visual saliency. Dark-colored object is much more catch our attention. our neuron works on visual stimulus as per person visual ability. Like a color-blind person can have issues identifying some images.

For example

Color Pop Out
Color Pop-out

The left side image has one-color pop out. Brain picked up that part without any processing due to wavelength differentiation of certain pixels.

This image is no different. We can see a change in shape or appearance among other objects.

In this image, we can see there is a combination of both color and orientation of the object.

This shows movement capture our eyes. particular the one dot which rotating rapidly in the middle. Speed is another criterion in visual attention.

Let's talk about Visual Saliency in terms of Machine.

It is a psychological term to grab important information from images. Our brain scan important part in images look surrounding it and grabs an important part of it. Then flag it back to us.

Image Saliency map for Porsche model

Brain able to focus on important details from our spotlight perspective. Our brain able to do it in the biological matter but how does it different regarding Machine learning?

When we load the image in the Machine learning algorithm. It spits out all data. Machine overloads with information. Information overloading is important when we talk about the CV model. As the above example machine, there are a lot of objects in images so how Machines able to fetch important parts in images.

To learn more about we need to look into the frequency spectrum.

Enter Fourier Transformation

Cat Analogy

Jokes Apart. Let's go back if you are a physics student. You know about signal processing. Let's say we have data which in time domain we can convert that data in the frequency domain.

Fourier transform

Basically we can move from frequency domain to time domain easily with Fourier domain. Fourier transform one of the beautiful things on the image domain that recovers part that exposed to sunlight and expose them. We are no longer talking about co-ordinates instead we are talking in terms of a spectrum. Enough with theory let’s see this with a more programmatic approach.

Code for Fourier Transform and Inverse

Let's look at the results.

Output for above code.

We can convert the images in Spectrum. Centered spectrum result we get when all zero-frequency components move to the center and give us that star-like structure. The last image is built by a computer with reference to that. We don’t use our original image to build the last image.

Saliency Map

This is an important part of the term of object detection. This is how the machine read specific images from the scene. Spectral residual model in Saliency Detection: A Spectral Residual Approach by Xiaodi Hou and Liqing Zhang

In this paper, he stated that we can remove redundancies from images so the machine can search for an important factor in the image.

There is an idea of a saliency map is proposed in this paper we can construct an image in the spatial domain with inverse Fourier transformation.

Residual Filter

Refer to the paper for notation information. Math is involved in this part.

We will get residual remains from images which is part that can be identified. He smoothed images with a Gaussian filter.

Last derived equation for saliency map

Saliency map

g(x) is Gaussian filter,f inverse is Fourier inverse, R(f) is spectral residual and p(f) is phase shift with Fourier transformed.

Let's look saliency map in terms of programmatic manner.

Saliencymap.py

The above code shows the Saliency map in like below

Saliency Map

This is one example of the saliency map. A saliency map plays an important role in Object Detection. In the next article, I will show how it help in object detection.

Reference: Spectral residual model in Saliency Detection: A Spectral Residual Approach by Xiaodi Hou and Liqing Zhang

--

--

Saurabh Kadam
Analytics Vidhya

Devops Professional by Job.Machine Learning ,Deep learning and 3D modelling for Hobby.