Computer Vision — Understanding GrabCut Algorithm without the Maths

Published in

Analytics Vidhya

3 min readMar 16, 2020

If you are a beginner in the field of computer vision like me, this might be suitable for you. Let’s jump right into it! Oh, and without the code too…

Object segmentation has been a tough task for computers and even for engineers to make it work on computers. Until recently(well, maybe not so recently), an algorithm was introduced to make this task feel easy. Before we go on and explain the algorithm, let’s explain at a very high level what object segmentation is.

Object segmentation is basically the process of (you’ve guessed it) segmenting an image into portions of smaller size images. More specifically, after object segmentation, the computer is able to tell how many objects there are in the image and is able to tell whether an object in an image corresponds to the foreground or the background. Enter the GrabCut algorithm. Here, we will use GrabCut algorithm to segment the image into foregrounds and backgrounds.

Here’s how it works:

You start off by drawing a rectangle on an image and this rectangle should include the subjects of the image, like a person or a dog. Subsequently, the area lying outside of the rectangle you’ve just drawn is automatically defined as the background. The data contained in the background is used as a reference to distinguish background areas from foreground areas within the defined rectangle.

To make it simple, this algorithm defines the area in the rectangle as a color distribution model using the Gaussian Mixture Model(GMM), where each pixel will be given a label to tell whether it is a foreground, background, or unknown. If you understand a little about image processing, every pixel is connected to one another by a gradient and so this model will encourage neighbouring pixels of similar color distribution to have the exact same label.

Then, we will create a graph initialized with weights, which will then be solved using the Min-Cut algorithm, to yield 2 groups of vertices. We then create a graph that will take N vertices, (N=number of pixels) and link neighbouring vertices with edges whose weights are defined by the vertices(pixels) color similarity. After doing so, we add 2 vertices(labels for foreground and background) to the graph, each of which will be linked to the N pixels according to the probability of the pixel to match a color distribution of the background or foreground.

In simpler terms, the GrabCut method puts 2 labels, one for background and one for foreground, and link all the pixels to itself based on the similarity of color distribution of each pixel to the labels.

Getting to this point would require a huge amount of research in the algorithm, so to make it simple to understand, the method will run iterations of min-cut on the graph of the image to fully define all pixels with a label since some of the pixels will initially receive an unknown label. And finally, you get a segmented image like this:

Computer Vision — Understanding GrabCut Algorithm without the Maths

Written by Ray