Point Transformer (paper review)

Stan Kriventsov
Deep Learning Reviews
5 min readJan 4, 2021

--

Review of paper by Hengshuang Zhao¹, Li Jiang², Jiaya Jia², et al, ¹University of Oxford, ²The Chinese University of Hong Kong, 2020.

Originally published in Deep Learning Reviews on January 3, 2021.

The authors develop a neural attention layer for working with 3D point data and implement it in a Point Transformer network that shows new state-of-the-art results (in some cases by a significant margin) on a number of standard 3D benchmarks.

What can we learn from this paper?

That self-attention works very well for 3D data due to the latter being essentially a sparse unordered set of points, resulting in permutational invariance and the absence of a strong local structure that makes convolutional approaches effective for 2D images.

Prerequisites (to better understand the paper, what should one be familiar with?)

  • Neural attention
  • 3D point clouds

Discussion

It is well-known that in the case of dense 2-dimensional data (such as images), modern state-of-the-art approaches for standard ML tasks (object detection, instance segmentation, semantic segmentation, etc) are mostly based on convolutional networks that…

--

--

Stan Kriventsov
Deep Learning Reviews

Software/ML Engineer at Google. Founder of Deep Learning Reviews: https://www.dl.reviews. Former pro chess and poker player.