Understanding Deep Associative Embedding in Convolutional Neural Networks

An elegant method to group predictions without labeling

Shuchen Du
CodeX

--

Photo by Gian D. on Unsplash

In some tasks of computer vision and deep learning, we need to predict all the results first and then split the results to several individual results.

A common task in this spirit is pose estimation for multi-people, in which the key points for all the people in the image are predicted first and then split into individual pose as the final prediction (Fig. 1).

Fig. 1: Pose estimation for multi-people [Newell et al.]

Another task in this spirit is object detection via detecting and grouping paired key points of objects to form the bounding box, which is a relatively new approach to form the bounding box compared to the R-CNN line of works [Ren et al.]. In their new approach [Law et al.], the bounding box for each object is represented by two key points (upper-left and bottom-right). During inference, the key points for all objects in the image are predicted first and then splitted into individual key-point-pairs to get the final prediction (Fig. 2).

--

--