Point Transformer (paper review)
Review of paper by Hengshuang Zhao¹, Li Jiang², Jiaya Jia², et al, ¹University of Oxford, ²The Chinese University of Hong Kong, 2020.
Originally published in Deep Learning Reviews on January 3, 2021.
The authors develop a neural attention layer for working with 3D point data and implement it in a Point Transformer network that shows new state-of-the-art results (in some cases by a significant margin) on a number of standard 3D benchmarks.
What can we learn from this paper?
That self-attention works very well for 3D data due to the latter being essentially a sparse unordered set of points, resulting in permutational invariance and the absence of a strong local structure that makes convolutional approaches effective for 2D images.
Prerequisites (to better understand the paper, what should one be familiar with?)
- Neural attention
- 3D point clouds
Discussion
It is well-known that in the case of dense 2-dimensional data (such as images), modern state-of-the-art approaches for standard ML tasks (object detection, instance segmentation, semantic segmentation, etc) are mostly based on convolutional networks that…