Human pose estimation is a hard problem → since there are not a lot of data. (here → the authors generate realistic synthetic data). (the resulting images → used for training a deep neural network → and gives much better results).
CNN → needs a lot of large scale data → especially the case for human pose estimation. (optical flow → synthetic data was used to improve performance). The problem with synthetic data is the domain shift problem. (here the authors does → motion captures).
Wow → pretty interesting method → go from 2D to 3D → very cool approach. (and this method actually works!).
3D pose classification → is hard since there are not a lot of data in the wild → augmentation is a great method to overcome this problem. (also this method of augmentation does not seem to be completely new).
We can see how the synthesis engine operates → transformation of the 2D image → probability map of the pose → very interesting. (so it starts from the 2D joints → very cool! → and there is some form of transformation happening after the 2D query image). (however, the generated images have some artifacts → need a method to smooth the images → the authors create a novel method of doing this as well).
Human pose estimation → classification problem → DeepPose and Alexnet are used → and training this kind of classifiers need a lot of training data.
And it seems like each image can have multiple possible classes.
Data set in this space is also limited → there are few data set in which have a correct 3D label while others might not have a label. (traditional data augmentation is used for 2D images).
Quite competitive results → this means that the synthetic data generation method actually produces reasonable images. (models trained on indoor images → does not do well on pose classification on outdoor images).
Fine tuning of VGG 16 → increases performance.
Synthetic 3D image generation → a reasonable approach for increasing performance.