LISA: An End-to-End Deep Framework for Instance Shadow Detection

Understanding an unconventional approach for detecting object-shadow pairs

Devshree Patel

Follow

Published in

VisionWizard

6 min readAug 13, 2020

--

“When you light a candle, you also cast a shadow” — Ursula K. Le Guin written in A Wizard of Earthsea.
Shadows are light-deficient regions in a scene, due to light occlusion, but they carry the shape of the light-occluding objects, as they are projections of these objects onto the physical world -[1]
Instance shadow detection is aimed at finding shadow instances with corresponding object instances.

1. Introduction

Computer vision and image processing fields have mainly focused on shadow-detection and shadow removal problems.
In [1], authors propose an end to end deep learning framework — Light guided Instance Shadow-object Association (LISA) to leverage the computation capability of DNNs, for instance, shadow detection.
Instance shadow detection refers to the discovery of shadow instances in the images along with the corresponding object that casts the shadow.
Detection of shadow-object pairs finds their use in a varied number of applications such as privacy protection and photo-editing tasks.
Association between shadow and object gives a clue about the light direction in the scene, bolstering applications such as relighting.
To facilitate instance shadow detection, authors prepare a dataset Shadow Object Association (SOBA) consisting of 3623 shadow-object pairs, each with three masks.
For evaluating the framework, a new metric called Shadow-Object Average Precision (SOAP) is introduced.

Figure 1. (a) Input Image (b) Output Image (c,d,e) Masks used for training (Source: [1])

2. How is LISA different than previous frameworks?

Earlier works made use of different image processing techniques like physical illumination and color models for shadow detection.
In later works, convolutional neural networks (CNNs) were introduced to learn high-level features for shadow detection.
Recently, shadow detection and shadow removal tasks were performed using end-to-end networks.
Instance shadow detection is different from general shadow detection, as it finds associated objects with shadows altogether instead of finding a single mask for all shadows.

3. Proposed technical contributions

To approach the problem of instance shadow detection, authors present three technical contributions in [1].
Firstly, the authors create a novel dataset — SOBA consisting of 1000 images and 3623 shadow-object association pairs, where each input image is accompanied by three instance masks.
Secondly, the authors develop an end-to-end framework — LISA for predicting masks and boxes of object and shadow instances.
The framework also builds boxes around shadow-object associations along with light directions.
Later, ground truth instances of object and shadow are paired to match with the predicted shadow-object association and light-directions for producing output.
Thirdly, the authors formulate an evaluation metric — SOAP for calculating the efficiency of the proposed model with existing baseline models.

Figure 2: Example predictions from the LISA framework (Source: [2])

4. Methodology and Architecture

In comparison to primitive methods of shadow detection, LISA predicts shadow instances rather than just a single mask for all shadows in the image.
LISA takes the input of a single image and outputs: box and mask for each shadow/object instance, box for each shadow-object pair, and light direction for each couple.
LISA makes use of light direction as it is usually consistent with the object-shadow associations.
The architecture of LISA includes feature pyramid network as ConvNet for extracting important semantic features from the input image.
The architecture is bifurcated into two branches: the top one predicts the mask and box for each shadow/object, and the bottom one predicts the box for shadow-object pair with possible light direction.

Figure 3: The schematic illustration of the LISA framework. (Source: [1])

For finding regions with a high probability of containing the shadow-object pairs, region proposal network (RPN) is used in the top branch of the architecture.
Next, RoIAlign is used to extract features for each proposal and leverage the box and mask heads to make predictions for the shadow and object instances.
This, in turn, is done by minimizing the loss between the supervision signals from the training data and prediction results.
The bottom branch uses RPN for generating region proposals for the shadow-object pairs and then uses ROIAlign for extracting features for each proposal.
After discovering shadow-object associations, masks for each object-shadow associations are obtained by combining the shadow and objects masks for prediction.

Figure 4: The pair-and-match procedure for pairing the predicted shadow and object instances and efficiently matching them with the predicted shadow-object associations. (Source: [1])

5. Comparative results and evaluation

Existing metrics for evaluating the efficiency of shadow detection and shadow removal models look at object instances separately.
Whereas, in case of instance shadow detection, multiple types of instances such as objects, shadows, and associations are involved.
Thus, the authors propose a new metric by considering traditional average precision and intersection over union technique.
SOAP — Shadow Object Average Precision works on three conditions based on IoU between predicted shadow/object/association and ground truth shadow/object/association.
To evaluate the efficiency of the model, the authors test it with various baseline frameworks.
The model proves to be efficient in terms of finding the light direction and in terms of performance.

Figure 5: Visual comparison of instance shadow detection results produced by full pipeline and two other baseline frameworks. [(a) Input image, (b) Baseline 1 © Baseline 2 (d) Proposed Model] (Source: [1])

Figure 6,7: Evaluation Results of the proposed model (Source: [1])

6. Applications

In [1], authors present versatile applications of their proposed model — LISA.
Some of the application scenarios include Light direction estimation, Photo editing, and privacy protection.
Instance shadow detection helps extensively to estimate the direction of light in images. Estimated light trends are shown as a virtual red post, which is consistent with the real shadows cast by objects.

Figure 8: Examples of light direction estimation (Source: [1])

Another useful application of instance shadow detection can be viewed in the photo-editing task. That is because LISA removes not only the object instances but also their associated shadows.

Figure 9: Examples in photo editing task (Source: [1])

For instance, shadow detection works on shadow-object pairs. It makes cropping and editing of images efficient in comparison to naive cut and paste. That is because the light direction is maintained.

Figure 10: Examples in photo editing task (Source: [1])

7. Conclusion and Limitations

In [1], authors create a dataset — SOBA, propose a new end-to-end framework — LISA and formulate a metric SOAP to leverage the task of instance shadow detection.
Apart from several advantages, proposed technical contributions also have some limitations.
One of the limitations is that [1] fails to consider the overlapping area between shadows of different objects.
Also, shadow-object detection can be improved as there are many unsolved problems remaining to be addressed.

Code is accessible from the link: [github]

8. References

[1] Wang, Tianyu, et al. “Instance shadow detection.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020.

LISA: An End-to-End Deep Framework for Instance Shadow Detection

Understanding an unconventional approach for detecting object-shadow pairs

Table Of Contents

1. Introduction

2. How is LISA different than previous frameworks?

3. Proposed technical contributions

4. Methodology and Architecture

5. Comparative results and evaluation

6. Applications

7. Conclusion and Limitations

8. References

Written by Devshree Patel