Understanding Deep Associative Embedding in Convolutional Neural Networks

An elegant method to group predictions without labeling

Published in

CodeX

4 min readApr 21, 2020

In some tasks of computer vision and deep learning, we need to predict all the results first and then split the results to several individual results.

A common task in this spirit is pose estimation for multi-people, in which the key points for all the people in the image are predicted first and then split into individual pose as the final prediction (Fig. 1).

Fig. 1: Pose estimation for multi-people [Newell et al.]

Another task in this spirit is object detection via detecting and grouping paired key points of objects to form the bounding box, which is a relatively new approach to form the bounding box compared to the R-CNN line of works [Ren et al.]. In their new approach [Law et al.], the bounding box for each object is represented by two key points (upper-left and bottom-right). During inference, the key points for all objects in the image are predicted first and then splitted into individual key-point-pairs to get the final prediction (Fig. 2).

Understanding Deep Associative Embedding in Convolutional Neural Networks

An elegant method to group predictions without labeling

Written by Shuchen Du