3D body recognition — Lets recognize!

Episode 3: Data preprocessing

We have done a very important thing in the last episode. Namely, found a suitable way to convert 3D bodies into images which we will be able to feed to the Neural Network (NN).

The plan looks like this:

We have completed step 1 in the previous episode, so now we will focus on steps 2 and 3.

Lets start from the model creation.

You can see the RWMP layer in the architecture. According to the DeepPano paper, its purpose is to make recognition to be invariant against 3D body rotations around the principal axis. Technically RWMP is just a row-wise MaxPooling.

After the model is ready and compiled, we read the data, shuffle it and create ImageDataGenerator with only rescaling. Note that data was divided in advance by train, validation and test in the ratio of 70:15:15. Since our images are synthetic and they represent 3D bodies, there is no space for the data augmentation., because

  • Color augmentation is not suitable because images are grayscale.
  • Horizontal flipping will be negated by the RWMP.
  • Vertical flipping means putting body upside down.
  • ZCA whitening can’t be used because of synthetic nature of the images.
  • Random rotations will cut off valuable corner information and I am not sure about what do they mean in terms of 3d body transformations.

So I could not think out any data augmentation that could be applied here.

It’s time to train the model.

Let’s look at the results.

Training categorical accuracy and validation categorical accuracy

As you can see, the model achieved 92% accuracy during validation and 95% during training. So there is no overfitting. The model also gives the overall accuracy equals to 0.895 on the test dataset.

Classification report:

Confusion matrix for the test dataset

We can also play around with the model.

table_0224.png from snippet above

The results seem quite promising. Looks like everything is okay except that some tables were wrongly recognized as dressers. I am not sure why is this happen. It could be one of the improvement steps for the future.

Let’s list possible improvements.

  • Take material, texture and geometrical size into account. This would lead to the non-sequential model.
  • Improve the balancing of the dataset or at least use class_weights. Generation model (VAE for example) can be used to make better balancing.
  • Add more classes.
  • Create a metamodel based on the panoramic view and different representation, voxels for example. This can be computationally expensive.

That’s it for now, the publication is finished and I hope you have enjoyed it. Please find the entire code here.

--

--