Deep Learning in Fashion Industry

Published in

Analytics Vidhya

10 min readApr 21, 2021

In the recent past, the fashion industry has emerged as one of the crucial industries for the global economy. The trends change every second and the clothing industry has proved itself to be one the most creative realms. People around the world, irrespective of their financial statuses are willing to spend money to stay in trend. With the advent of the internet and handheld devices, customers can easily shop on the go. While people keep up with fashion trends, machine learning is changing the trends in the fashion industry. Every brand, however big or small utilize machine learning methodologies to increase their growth by luring in customers and always staying ahead of the trend. Daily there are systems keeping track of every sale and the upcoming trends and this gives the companies a vast amount of knowledge as to what a user is interested in. At the same time, there is a lot of cross-platform fashion data accumulated by data shared by influencers on social media, fashion data shared by brands, and much more. People are very concerned about what style is in trend, how they look in new trends, and how they can elevate their confidence and personality; it makes more sense for the brands to utilize the accumulated data and provide apt recommendations to the users. One of the major driving forces for the fashion industry to use machine learning is that customers like personalization.

In this article, we will look at the available approaches in Deep learning and Computer vision techniques that help brands to stay on top. Also, we will see how aesthetics helps in gaining customer attention. AI-inspired deep learning models can exactly tell which style fits better to a particular customer and also predict how well a new design will be accepted by the consumers.

The fashion research can be classified into Low-level fashion, Middle-level fashion, and High-level fashion. The following table depicts the classification,

Now, we shall look in-depth about various classification of fashion research,

Low-level fashion recognition: Fashion recognition is nothing but computing and processing fashion images at pixel level. It can be further divided into,

Clothing/Human parsing

Graphical models: Focuses on constrained parsing problems and handles low-level inconsistencies with a small scope. This method includes models like Superpixel labeling, integrated system of clothing co-parsing, weakly supervised fashion parsing, and MRF-based color and category inference module.
Non-parametric models: This does not require a lot of preconceived knowledge but relies on segmentation and pose estimation. It includes models like Nearest neighbor style retrieval, Deep quasi-parametric human parsing framework, and Semi-supervised learning.
Parselets representation: Parselets are used as basic building blocks to overcome the inconsistencies between pose estimates and clothing. This uses Deformable Mixture Parsing Model and Simultaneous human parsing pose estimation.
CNN models: Convolution neural network works great for clothing/human parsing, but it does ignore micro context pixels and macro context between semantic parts. This method uses Contextualized CNN architecture, Active Template Regression, and Self-supervised structure-sensitive learning.
Adversarial models: This model reduces semantic inconsistency and local inconsistency in results by using adversarial networks. It uses Macro-Micro Adversarial Network (MMAN) for human parsing.

2. Landmark detection: Initially, this method used a regression model but then it was found later that the regression model is very non-linear and not easily optimized. So, a knowledge-guided fashion network that was extended over the neural network and global-local embedding module was used for accurate landmark prediction. The models used here are, three-step deep fashion alignment framework, Deep landmark Network, Knowledge guided fashion network, and Global-local embedding module.

Middle-level fashion understanding: Clothing attributes are a compact representation of information representing people. Clothing attributes are way more than basic color and pattern. These attributes include a collar, material, length, and many more. These attributes can be used for recommendation and analysis. But apart from the attributes, fashion styles come from how individuals assemble each outfit. This represents the individual character and aesthetic.

Clothing attribute prediction

Single task learning: This method focuses on a particular domain in fashion while learning about the attributes. It uses CRF based approach, Random forest approach, and Augmented deep CNN.
Multi-task learning: This method performs multiple tasks like learning clothing attributes and landmark detection simultaneously. Few models used here are Special-aware concept representations and end-to-end deep CNN.
Transfer learning: This method bridges the gap between images from various domains while learning clothing attributes. It uses Transfer learning model, and deep model built on Faster R-CNN model.

2. Fashion style prediction

Supervised learning: Fashion style can be predicted using SVM using a set of handcrafted features, using discriminative style features by training deep feature extraction network, and lastly by constructing a classification network built on semantic space for clothing styles.
Unsupervised learning: By training polylingual topic models on outfit data, the model learns the similarities between various elements and styles in fashion.

High-level fashion applications: These applications are built with the help of low-level fashion recognition and middle-level fashion understanding. It is used for fashion recommendations, fashion retrieval, fashion image synthesis, fashion compatibility, and fashion data mining.

Fashion retrieval

Cross-scenario retrieval model: One of the methods here is to use deep learning techniques like WTBI and match an online shop item with a real-world item and vice versa, which means it can be unidirectional or bidirectional. Also, a dual attribute-aware ranking network(DARN) and Deep bi-directional cross triplet embedding algorithm are used for cross scenario retrieval.
Interactive retrieval model: In this method, users can give an image as input for the search to get back several products from all over the world which is similar to the input image.

2. Fashion recommendation

Complementary recommendation model: Given a piece of clothing, this method recommends the right piece of complimentary clothing to go with it.

An example that depicts complimentary cloth matching for given bottom

Personalized recommendation model: Here, a user receives a personalized recommendation based on the items interacted, users’ opinions, feedbacks, and evolving trends. Few of the models used are Tensor decomposition and Tensor factorization.
Scenario-oriented recommendation model: In this method users are recommended clothes purely based on the occasion presented.

An example on how scenario-based recommendation works

Generative model: Here, the users are not only recommended existing items but also generate new fashion clothing which is an improvement over the existing piece of clothing, based on users’ preferences. A few of the models used here are the Neural co-supervision learning framework and Deep image generation neural network.

An example for outfit recommendation with improvement

3. Fashion compatibility

Pairwise compatibility learning: This learning category takes in one of the fashion items as input and searches for all other compatible items from different categories.
Outfit compatibility modeling: This learning category searches through all categories and finds best-suited items to form a compatible outfit i.e., this model tries to measure the compatibility of the whole outfit.

4. Fashion image synthesis

Pose guided generative model: New fashion outfits are generated by keeping the clothing the same as the input with an arbitrary pose. Few of the recent methods proposed are novel architectures or losses to improve the results.

Text guided generative model: Outfit images are generated based on textual narrative on how an outfit looks like. This uses feature-wise linear modulation to translate visual features with natural language processing.

Example for text guided fashion outfit generation

Virtual try-on model: As the name suggests, this method transfers a clothing item in an image on the image of a person to virtually view how the outfit looks on the user. VITON and VTNFP models are used for this purpose.

Fashion design model: This method takes in the design images and the cloth material image as input from the user and generates an image of the complete end product.

Generating outfit image by using outfit design and material image

Let us now look into the three pillars of machine learning in fashion industry: Recommendation, Aesthetics and Personalization.

Recommendation systems have certainly become one of the most widely used information filtering systems. In the fashion industry as well, the simplest methodology used is clothes retrieval and recommendations. These systems completely rely on the preferences and ratings given by the user. It is important to note that, recommendation systems just suggest new trends and clothes to the user but do not consider the fashion experts who have more in-depth understanding about a specific style, the right color contrast, or even the sophisticated knowledge obtained by having gone through a dressing course. While a recommendation system fails to consider all of the above factors, an automatic fashion composition system based on deep learning encompasses all of the key factors. The deep learning approach for an outfit composition problem consists of an end-to-end system of encoding visual features through the deep convolutional network which takes in a fashion outfit and predicts user engagement levels. In this method, the outfit is composed by scoring the outfit candidate based on its appearance and other metadata. To make the system context-aware, a multimode Deep learning framework is used; this leverages context from the image itself and it is found to outperform the single model approach. The following architecture depicts a scoring model for an outfit that will be composed and recommended to the user.

Few other methodologies used in fashion recommendation systems are,

Implicit Feedback based
Based on Weak Appearance Feature
Semantic Attribute Region Guided Approach
Complementary Recommendations Using Adversarial Feature Transformer
Neural Compatibility Modeling

Aesthetics and Fashion go hand in hand. The visual appearance of a piece of clothing plays a vital role in the fashion industry. Apart from describing clothing in terms of words called aesthetic words; party wear, casual, formal, and more, there are visual features that describe the clothing too. One of the approaches is to bridge the gap between the two by formulating a novel three-level framework of visual features, image-scale space, and aesthetic word space. Then Art field image scale-space is used as an intermediate layer; a 2-dimensional image scale-space acts as an intermediate layer. This layer has a deep understanding of the aesthetic words. This approach is based on a theory proposed by Kobayashi where there are 180 keywords in 16 categories of aesthetics and each of these words has coordinate values in image scale space. Euclidean distance is used to determine which aesthetic word aligns with the image.

Examples for outfit image and aesthetic words mapping

Few other methodologies used in fashion aesthetics are,

Brain-inspired deep network
Minimalistic approach
Neuroaesthetics

Personalization is a key factor in the growth of the fashion industry. Personalization promotes individuality and uniqueness. Every customer would like the one in a million experience. Personalized outfit generation has to gain momentum among the fashion brands. Here, the user preferences regarding individual items and outfits are considered and a new outfit is suggested to the user. A transformer encoder-decoder architecture approach is followed which says every item in an outfit will have a different interaction weight with other items in the same outfit. As and when the user clicks on items, the system starts creating an outfit by adding one item at a time. If more than one similar item is clicked, then the item’s interaction weight is considered and the item with better weight is kept the other is discarded. The process continues until the outfit is complete.

Architecture for personalized outfit generation

Few other methodologies used in personalization are,

Generative Adversarial Training
Personalization in Unstructured Data
Item-to-Set Metric Learning Approach

Conclusion: With more and more researchers invested in the world of fashion and its ever-growing demand to utilize AI, computer vision, and deep learning; the scope for more advanced and sophisticated approaches increases to enable fashion brands to provide excellent customer service and at the same time stay on top of the fashion industry. In unprecedented times like the present, the virtual shopping experience has gained utmost importance. AI and deep learning can help fashion manufacturers with better processes for manufacturing and inventory management. The fashion industry is one such industry where a vast amount of data accumulates every minute of every day and research in this field will benefit the customers and research community.

References

[1]Gong, Wei and Laila Khalid. “Aesthetics, Personalization and Recommendation: A survey on Deep Learning in Fashion.” ArXiv abs/2101.08301 (2021): n. pag.

[2]Gu, Xiaoling et al. “Fashion analysis and understanding with artificial intelligence.” Inf. Process. Manag. 57 (2020): 102276.

[3]All of the images in this article have been taken from the papers mentioned above.

Deep Learning in Fashion Industry

References

Written by Raghava Urs