3 Sources of knowledge for machine learning solutions to the Fashion Industry

Published in

AMARO

6 min readDec 21, 2018

Written by Meigarom Lopes — Data Scientist at AMARO

Introduction

The fashion industry is a very alluring industry to work with Machine Learning. The speed of fashion trends and the customer’s desire to dress the latest fashion outfits requires from companies products with short life cycles, high fashion content, competitive prices and purchasing convenience. All these factors contribute to turn common machine learning problems in really challenging ones, which demand a lot of research and study from the Data Scientists. Those sources are not so obvious to find, most of the machine learning use cases are developed to traditional markets such as Energy, Financial Services, Healthcare, Manufacturing, Media and Retail. That being said, in this post, I will show three sources of knowledge that you can rely on to help you solve challenging projects in the data science area for the fashion industry.

First source: The Academia, scientific papers done about the fashion industry

The first source is the scientific papers from the academia. There is plenty knowledge and hard-working on it. Researchers usually spend years on the same work. They usually have really well-defined problems where they can test and validate hypothesis, try different techniques, implement hybrid approaches, fine-tune models, analyse the cause of error and propose improvements to the current state-of-art machine learning models. All these tasks recurrently applied for months result in high quality, reliable and stable new techniques.

One of the hardest improvements in the fashion business is related to sales forecast. It is one of the trickiest challenges due to the characteristics of sales which are fast and authentic. It’s velocity is because it follows a fashion trend which is very changeable and unique due to the fact that customers don’t usually dress the same outfit over and over again for a very long time and a fashion look hardly will be repeated in the future, even if it was a completely success in sales.

The article called “A hybrid sales forecasting system based on clustering and decision trees” from Sebastien Thomassey and Antonio Fiordaliso (http://bit.ly/2rRcHRX ) propose a forecast system based on clustering and classification techniques. In summary, the journal suggests to cluster the fashion products by sales performance first, and then raise some descriptive criteria from each cluster in order to describe it under fashion attributes and at last, it creates a classifier to assign new products to one of those product clusters. The main assumption is that products from the same cluster might have similar sales behavior since they are fashion correlated.

Also, scientific papers can help you comprehend some characteristics and constraints of the challenge. For example, in this paper entitled “Sales Forecasting in Apparel and Fashion Industry: A Review” from Sebastien Thomassey ( http://bit.ly/2rOCyKp ), the author presents an overview of the fashion context, the apparel supply chain, the requirements for sales forecasting like time horizon, life cycle, aggregation by topology of products, seasonality, exogenous variables and so on. Even if you have expertise in this field, the reading is really worth, it might cover something that you haven’t considered in your sales forecast project.

The next source of knowledge is closer to our daily routines, I am pretty sure that data scientists keep themselves informed by reading blog posts.

Second source: Blog posts, work done by peers.

Blogs are a great way to share knowledge among professionals in the field. There are many data scientists that share discoveries, experiences and good practices in blog posts, which are great learning sources. Sometimes books and scientific papers demand more concentration, patience and hard-work to be understood. On the other hand, blog posts follow simpler patterns, they are usually more informal, use simple language and discuss either about technical issues and how to solve them or management and career.

However, blog posts in a fashion context are rarely scientific journal to find and consume. For this reason, we should look at the posts written for common challenges in the market with different eyes. We need to read them, understand the proposal of the problem to be solved, the methods or arguments that the author discuss in the text, possible issues to deal with, the final solution and bring all the knowledge inside of the fashion context adapting it based on the constraints and characteristics of the project.

In order to solve the sales forecast project in the fashion context, we can extract relevant knowledge from analogous challenges. There are many Kaggle competitions that challeng data scientists and enthusiasts to forecast future sales or events. For example, the Two Sigma competition (http://bit.ly/2LvfZ6B) challenges competitors to predict stock price performance based on the content of news, Grupo Bimbo Inventory Demand competition (http://bit.ly/2LsWsnr) asks Kagglers to develop a model to accurately forecast inventory demand based on historical data and the Rossmann Store Sales competition ( http://bit.ly/2LuYvav ) is challenging to predict 6 weeks of daily sales for 1,115 stores located across Germany.

All work done over these competitions, the exploratory analysis, the modelling, the error analysis, the conclusions and the final results are saved in a code repository called Kernel in the Kaggle platform. The best part is that it’s public available, you can check the full work done by brilliant Data Scientists and learn a lot from them. Once you passed through all these codes, you can adapt all the knowledge to the fashion context.

The last benefit that you can get from the post is a new perspective from the same solution. Sometimes, you are trying to solve a problem using a solution and you get stuck on your results, no matter what you do to improve the solution, the results remain the same. In this case, looking from a different perspective can leverage your results. For example, in this blog post (http://bit.ly/2LsK8mU) from Mario Filho which is Kaggle Grandmaster, he provide a different perspective to solve sales predictions. In few words, he proposed to build a single model to predict multiple time series of sales, you can think about each time series as sales of a product, instead of training a model for each time serie. It’s a different point of view that can move your result to a different level.

The last source of knowledge is the most important one.

Third source: Trust on your own Data Science skills

YOU are the most important source of knowledge. All your background gathered along your journey, built upon college studies, graduation programs, specialization courses, online training, personal projects and independent studies play an important role here.

Data scientists should do their homework before checking the external knowledge. I listed here the 3 things that I consider important to do before looking for help:

Study the theory

Make sure that you understand the theory if you haven’t already. From the scope of the project you can infer which type of machine learning problem you are handling and if it is supervised, unsupervised or reinforcement. Once you decided that it’s a supervised one, for example, you can classify it as a classification, a regression or a time series problem. And finally, based on your judgment, you can prepare yourself.

Study the domain

Try to understand at least the basic from the business domain. Be curious, explore the time with your stakeholders to learn as much as you can. Don’t be afraid to ask questions, make sure that you understand the problem, the requisites, the expectation of the results and key definitions. You can always google the business and get benchmarks about what similar companies are doing to solve the same problem that you are just about to start.

Start fast, keep it simple

Modelling is just one part of the solution, you have more modules to implement and connect to turn on the prediction pipeline. In the first version of the model, implement a end-to-end pipeline, get the data, clean and prepare the data, extract features, train the model and test it, lastly, understand the current results and define improvements for the next iteration.

It is really important to deploy your first model in production, get few results and gain a lot of experience and feedback from the model and the process itself, in order to improve it for the next iterations.

Conclusion:

The fashion industry have been using data to leverage its results, improve its supply chain and deliver more and more personalized products right on time of the fashion trend. These conditions reflect the constraints of the environment where the data science will work. Build, train, test and deploy machine learning models has been really challenging and one of the reasons is that there are not a lot of research and projects shared in this field. However, exactly this characteristic makes the role of the data scientist awesome, it demands that the professional be hard-working and keep studying in order to be able to extract the knowledge from very theoretical scientific journals and apply it to the fashion context, to be capable to adapt results from projects done in other fields and adapt it. Finally, make you a professional able to solve any type of complex problem.

3 Sources of knowledge for machine learning solutions to the Fashion Industry

Written by AMARO