Paper Summary — BASNet: Boundary-Aware Salient Object Detection

Aditya Chinchure
Technonerds
Published in
3 min readJun 1, 2021

BASNet is a method for salient object detection and segmentation based on the simple yet effective UNet architecture, with a focus on improving edge quality in segmentation masks. After being showcased at CVPR 2019, the method in this paper gained a lot of traction — notably because of the speed of inference on this model, at over 70 frames per second on a single GPU. An update to the paper highlights some commercial applications of this method.

A summary of the method

The BASNet method has two distinct yet similarly structured modules. Both modules follow an encoder-decoder approach, but are distinct in their capabilities. The Predict Module, in simple terms, localizes and segments the salient object in the image. It produces a rough saliency map, potentially with incorrect edges. Then, the Residual Refinement Module encodes this segmentation map at multiple levels, and learns to produce a residual (mathematically speaking, the difference between the final segmentation map and the rough segmentation map) which is added onto each level in the decoder.

The other major contribution in this paper is the introduction of a hybrid loss, including Intersection-over-Union (IoU) and Structural Similarity (SSIM) based loss functions, alongside the popular Binary-Cross Entropy (BCE) loss used to train the segmentation model. These loss values are calculated at each level of the prediction decoder as well as on the final result that the Recurrent Refinement Module produces, thereby providing multi-level supervision on the model.

What I found most interesting

I discovered BASNet through Cyril Diagne’s tweet, which demonstrated this awesome application where you can “cut” an object from the real world and paste it into photoshop. Following this, I ended up making my own version of this app for iOS, where I could paste into Google Slides through their API.

That said, from the ML standpoint, I like this model because its simplicity, and because it shows that you do not need billions of parameters to do simple tasks. The refinement module is especially interesting because I can see it being applied in different contexts, and even modified to work in other segmentation tasks, including multimodal tasks.

Why should you be excited (and why am I excited)?

If you are building any application that incorporates semantic segmentation in it, BASNet might be one of the best options, with its easy-to-deploy API and freely available code. Moreover, it has been extensively evaluated both quantitatively and qualitatively in the updated paper.

I am quite confident that newer approaches involving transformers and other methods are going to beat BASNet at semantic segmentation. At the same time, I am interested in seeing techniques from BASNet, like the hybrid loss, being incorporated into newer models.

I am writing a series of summaries of papers that I have been reading, mostly involving multimodal computer vision and NLP tasks. These summaries are in layman’s terms, and not detailed. You can find all the papers I have summarized here.

I am a student researcher at The University of British Columbia working on Vision and NLP tasks. If you are interested in these topics as well, let’s get in touch!

--

--

Aditya Chinchure
Technonerds

CS at UBC | Computer Vision Researcher | Photographer