Implementing a virtual try-on network using deep generative models

Aman Shenoy
cverse-ai
Published in
4 min readDec 26, 2019

For the previous five months, I had the opportunity to intern at Couture.ai as a Data Science Intern. I worked on mostly research-oriented projects with the Data Science team. This post will mark the end of my internship here and will be a summary of my experience and work during these 5 months. It has been a brilliant learning experience, where I was able to learn about very recent technologies and how they could be used in a commercially feasible and practical manner while working within reasonable constraints. This not only helped me to learn about these technologies but also taught me to be able to optimize, to be able to get the best results quickly and easily.

My projects mainly revolved around potential use cases for generative networks in fashion. Broadly speaking, the end goal of my project here was to implement a virtual try-on network — essentially taking in-shop clothing and a person image as input to give output as an image of a person wearing those clothes. The model implemented has been done with a focus on tops, with complete apparel transfer being potential future work.

The Segmentation Algorithm

For this, we initially needed to be able to implement a segmentation algorithm. Even though open-source state-of-the-art models could have been used to implement this, we stuck with robust image processing techniques for segmentation. With the idea being to localize the face and understand the skin color of the model from the face image to be able to divide an image into hair, clothes, skin, and background.

Geometric Matching Module

Once we have the clothing segment, we can now geometrically compare this clothing segment to the in-shop clothing. Our goal is now to be able to learn transforms on the in-shop clothing to make it as geometrically similar to the model clothing. To visually describe this one can refer to the image below. The example is a grid of six images. Top left is the in-shop clothes, top right being the clothing segment of the model, the top middle being the transform (bottom left) applied on the in-shop clothes.

The above examples are generated during training and hence the in shop clothing and the model clothing are the same. This can also lead to an easier qualitative assessment. The network architecture to learn this transform is briefly described below.

We call the learning of this transformation as the Geometric Matching Module, as it matches the in-shop clothing to the current clothing trying to get them to match geometrically. Some of the results after training are shown below.

Try-on Module

The instinctive approach to imposing the new clothing now is to simply paste it over the image, but as one can see this will cause problems due to overlap with hair and hands, and the previous clothing stays, making it look very unrealistic. The solution to this was the try-on module, where we implement an encoder-decoder network to smoothen out the image.

This gives a smoothened image that looks much more realistic then the results we would have if we were to paste the image over the model. The article has avoided any in-depth description of the work done and for a thorough description of the model and training strategy, its advisable to read the paper here.

Conclusions

The project described above was one of the many things I worked on in my internship at Couture. Every one of my projects led to immense learning and taught me the importance of being a quick learner. Almost all of the concepts used throughout the internship were new to me and I had to get an in-depth understanding relatively quickly. Overcoming this was not only enjoyable but also instilled confidence in me.

I am thankful to all my mentors in all the projects for guiding me and giving me a direction to think in. I am especially thankful for the creative freedom given to me to be able to make changes that I am confident in, and for being a part of my experience here for the last 5 months.

--

--