Watershed Segmentation — detect individual objects when heavily clustered

Samuel Sung
CodeX
Published in
7 min readSep 3, 2021

Let’s imagine that you are at the top of the Beaver Stadium, which is home to one of the prestigious college football programs in the nation — Penn State. And you want to figure out how many people attended the game without manually clicking a tally counter — seriously, that would be too much.

With the help of ongoing development in computer vision, detecting multiple people in an image is not even a real problem these days. In fact, there are numerous pre-trained models available to the public for them to easily integrate on their device without having to train the model from the gecko and still achieve decent performance in their application. That being said, it would be fairly easy to figure out an approximate number of attendees with modern technology.

But what if your friends asked you to figure it out right on the spot and want to know before the break is over. Fortunately — but not really — you have a laptop that freezes every time a heavy GPU is used. The only method that you can come up with is the stuff you learned in the image processing 101 course.

Would it be possible to estimate the number of attendees by using image processing only?

For today’s article, I will introduce a watershed segmentation, which can segment objects without the help of CNN. Then, I will demonstrate a case study on digital pathology by highlighting individual nuclei with the watershed algorithm. Finally, I will conclude with a remark on the original question — can we estimate the number of attendees by using only image processing?

Watershed Segmentation

Logic

All grayscale images can be viewed as topographic surfaces, where high pixel values indicate hills and low pixel values indicate valleys. If we start filling every valley with water, the water level will rise and, eventually, waters from different valleys will start to merge. To prevent merging, we enforce a barrier (watershed line) at the location where the water begins to merge. We continue filling the water until it reaches the peak of the hills and the resulted barriers will delineate each individual object.

Watershed Algorithm Layout. Source: Data Hacker by Strahinja Zivkovic

When objects are clearly spread out, there is a vast difference in pixel intensity among them and it would be easy to pick out objects. However, when objects are heavily clustered, a subtle change between neighbors makes it harder to delineate objects accurately.

Case Study

I will demonstrate the usage of watershed segmentation on a patch of digitally scanned biopsy tissue. The main goal is to segment each individual nuclei shown on the patch. (The code is attached in the appendix.)

A biopsy tissue.

First, we will want to convert an RGB image into a binary image in order to use the watershed algorithm. I found out that extracting the red channel is the best option since it showed the highest contrast of the nucleus from the stroma and background. Furthermore, Otsu is the best thresholding option here because we can clearly see different intensities between nuclei and otsu will find the most optimal threshold that divides the two peaks of intensity.

TIP: Normally, we want the object to be 1 and the background as 0. Sometimes, inversion is required to have the correct labels.

Morphological operation is a common practice to preprocess images. Closing is used to fill up tiny holes in the object and opening is used to remove any noise. In this case, an opening is used to remove small noises in the background. As you can see in the figure below, noises vanish effectively.

image preprocessing

In order to utilize watershed segmentation, there are three required pieces of information: 1) sure foreground, 2) sure background, and 3) unsure region.

Sure Background

From the Otsu threshold, we have a binary image where nuclei are labeled as 1 and the background as 0. However, the threshold can be imperfect and we need to be somewhat conservative when choosing a sure background. Thus, dilation is used to cover more ground space. In this way, we are definitely certain the remaining background is ‘the background’.

Tip: When dilating, make sure you find the right number of iteration to prevent over-extension and losing object true shapes.

Dilating multiple iterations on the binary image.

Sure Foreground

We can find sure foreground with erosion or distance map (recommended). Erosion is greater but small objects tend to get washed away, thus, losing essential information on the object. Since the whole purpose of this segmentation is to get every individual object, we need a better option.

Distance map is an alternative way. For those of you who are new to the concept, here is a brief explanation of what a distance map is. For every object (1’s) in the binary map is reassigned to a value equal to its minimum distance to the closest background (0's)

Example of the distance map.

Distance map is useful for defining targeted subregions. For this problem, our subregions are the nucleus. Thus, as shown below, the distance map highlights/sub-locates nucleus regions.

Result of distance map when applied to the image.

To further separate objects and solidify sure foreground, we can filter out any distance that is less than half of the max(dist) is converted to the background.

Unsure Region

Finding the unsure region is pretty much straightforward. You just subtract the sure background by the sure foreground.

And here are the three key pieces of information we need before performing watershed segmentation.

Left) Sure background. Middle) Sure foreground. Right) Unsure region.

Markers

After we have all the requirements, we can find each individual object on the sure foreground using the connected components algorithm. The function scans an image and groups its pixels into components based on pixel connectivity. All pixels within the same connected component share similar pixel intensity values and are in some way connected with each other. Once all groups have been determined, each pixel is labeled with a unique label according to the component it was assigned to.

Next, mark the region of the unknown region with zero. One tip is to assign the sure background to different numbers other than 0 since unknown values are already labeled as 0.

Left) Each color marker is assigned to different objects. Right) An unsure region is added to the markers.

Results

Finally, markers are ready to be filled up with water! And here are the water segmentation results on a couple of different patches.

We can observe that the segmentation of nuclei is pretty decent, considering that only image processing was used. However, the segmentation is not perfect. There are regions where it is over segmented and under segmented. For most parts, we can conclude that the segmentation can accurately locate nuclei and delineate nuclei into individuals.

Limitation

The downside of the water segmentation is its scope of usage. The outcome of segmentation is pretty much determined when an RGB image is converted to a binary image. If the objects are clearly differentiated from the background, the segmentation will work great. However, when objects are hardly defined and contain too much noise, the segmentation will perform poorly.

In other words, watershed segmentation is recommended on a simple image where it contains few color variances and has a sharp contrast between object and background.

Back to the question

Now that we have seen the usage of watershed segmentation on digital pathology, let's go back to the original question that still remains unanswered-can watershed segmentation be used to count the number of attendees?

To be quite honest, the algorithm is not robust enough to be used. From the demo, I emphasized the fact that a watershed segmentation requires a binary image that shows a clear distinction of objects from the background. However, inside the stadium, there are just too many color variances to construct a clear object-background image. So my answer to the questions is not yet.

Perhaps, If the game was a whiteout game, which is a type of game where the home team wears white, the performance could boost up. But again, I am making another conditional statement, which is usually not the right direction when trying to use it in a real-life application.

Hopefully, with an advanced camera, we can automatically get a binary image of a crowd and apply watershed segmentation instantaneously to estimate the number of people in the stadium.

Thanks everyone for reading this article. Cheers!

--

--

Samuel Sung
CodeX
Writer for

AI enthusiast who is currently on the quest for exploring new insights and ideas