The #paperoftheweek 4 was: Deep Neural Networks with Box Convolutions

Authors of the paper propose to bring box filters back to computer vision in a form of a box convolution layer. That layer computes sums over boxes of arbitrary size in a constant time thanks to pre-computed integral images. This layer is differentiable with respect to height, width and offset of a box, therefore all its parameters are learnable with a backpropagation algorithm. The advantage of such layer is that it can achieve arbitrary big receptive fields while maintaining computational efficiency and a small number of trainable parameters thus less likely to overfit.

In an experiment with ENet, a semantic segmentation model, authors show that replacing its blocks with box convolution based blocks not only makes the model smaller and faster but, actually, more accurate.

Box convolutions could potentially improve any neural architecture that would benefit from larger receptive fields of its units. Initially, the authors implemented them in Torch 7 (Lua), PyTorch implementation of box convolution work layers in progress.


“Box filters computed using integral images have been part of the computer vision toolset for a long time. Here, we show that a convolutional layer that computes box filter responses in a sliding manner can be used within deep architectures, whereas the dimensions and the offsets of the sliding boxes in such a layer can be learned as a part of an end-to-end loss minimization. Crucially, the training process can make the size of the boxes in such a layer arbitrarily large without incurring extra computational cost and without the need to increase the number of learnable parameters. Due to its ability to integrate information over large boxes, the new layer facilitates long-range propagation of information and leads to the efficient increase of the receptive fields of network units. By incorporating the new layer into existing architectures for semantic segmentation, we are able to achieve both the increase in segmentation accuracy as well as the decrease in the computational cost and the number of learnable parameters.”

For or more details and a good read, check out the paper:

The article was written by Evgeniy Mamchenko, Deep Learning Engineer at Brighter AI Technologies.