Applying deep tech to enhance lifestyle retail

Published in

Scalia

6 min readOct 10, 2017

One of the biggest hurdle for retailers to switch from offline to online is the amount of content required. When running a physical store, one only needs 2 attributes to populate a point of sale (POS) system: the product name and the product price. That’s because customers can ‘touch and fell’ products but online, such proximity with the assortment vanishes. That’s the reason why product content is so important in modern retail, as it becomes your primary selling arguments.

Unfortunately, product data management happens to be very time-consuming. It usually requires the combination of multiple operations such as data collection, consolidation, standardization and enrichment, controlling.

At Scalia we want to drastically simplify the product data preparation process by owning all of these steps. Luckily, applying the latest technologies can bring a huge help to some part of it. Here are 3 deep techs we have been using so far, with examples and tricks to apply them properly to the lifestyle sector.

Part 1 : Teach correctly your kid a.k.a your neural network

The first deep tech used is in fact neural networks. They can be assimilated to kids in learning process to talk : the same word has to be repeated many times with images in support so the kids retain it. Neural networks like the name indicates it, are built in the same way as our brains : data is sent to them so they can be trained. The result they return is checked and if it is false, the weights on the branches that do the connections are modified. Just like if you correct the kid when making a mistake.

However if you make an error yourself by calling something with a different name the kid is sure to be wrong.

The same principle applies to neural networks : if they are trained with datasets that include errors they will surely not return the expected result.

So the trick to have a good accuracy rate from your neural network is to train it with an optimal dataset. In our case, we work with products from e-commerce thus we can correctly think that most of the pictures are taken on a one-color-background. We can also think that there are two kinds of pictures : either a model wears the clothing item or the item has its picture taken on a common blank wall.

Conclusion : you have to build a clear dataset, but how to do so ?

Choose what classification you want to do. Let’s say you want to classify gender : you are going to fill a folder with pictures of women and a second folder with pictures of men.
You use the data inside these two folders to split it into two folders : one for training and one for validating. Mostly it is a 80–20 distribution.
You check that the pictures are in the right folder and that’s it, you can train your neural network.

This procedure has given us quite effective results with accuracy rates between 85% and 92% on labels such as gender, model position, the clothing position (top, bottom or full) for instance.

→ Tip : it is very useful to split in several datasets when you have very few data (thousands of pictures in each folder is regarded as few since a machine learning dataset is around several millions). You can find a very well constructed tutorial on the Keras blog.

Part 2 : Find some tricks when part 1 is inefficient : computer vision to the rescue

Even though most of the last results were satisfying, the method showed limits when trying to use it to label on a more precise level. For instance, we wished to be able to automatically detect the sleeves length. Neural networks work to recognize patterns on the picture : the shape of an arm or of legs as an example. But for more specific details, either you can largely widen the dataset size or it doesn’t work.

As we couldn’t gather millions of images of sleeves we spinned towards OpenCV. It is a very famous library written in C++ but also available in Python with a lot of tools for computer vision. More details on it can be found here.

Let’s keep the wish to detect sleeves length. The trick was to say that if the item is short-sleeved the model has more exposed skin than when wearing a long-sleeved shirt. Hence, we focused on determining the size skin on a picture.

There are two ways to proceed : either a face is on the picture and then it is quicker to find the skin color and to detect all skin on the picture or there is no face and it is a bit longer since the algorithm has to find the skin tone.

Once the skin is detected, the output picture is a binary one where all white pixels are skin ones. All is left to do is to count the pixels.

But even with a pixels number how can it be said if is long-sleeved or short-sleeved ? What if the picture is a close-up which means there are many more white pixels than for a wide shot ?

→ Tip : Rather than only counting the pixels, a ratio is more accurate. So when calculating the ratio of the white pixel by the whole size of the picture, the result is more homogeneous and can be used. After different tries, it is possible to determine levels : under a certain number it has no sleeves, and while increasing it passes through short sleeves until long sleeves.

However it has to be noted that skin-tone clothes are still quite a problem since they mislead the results.

Part 3 : Words have always been very useful so why would’nt we use them ?

Finally some features are very hard to detect only with a picture like the item composition : even with a professional point of view and the item in the hands, it is almost impossible to say it is 20% cotton, 30% polyamide etc.

But most of the time, product descriptions go along the product itself and it is full of interesting data. And there are again several ways to retrieve the information.

If the product description is in the form of a paragraph with complete sentences then it is possible to use the third deep tech that was evoked : Natural Language Processing (NLP). Just as the part 1 when we stored pictures into folders to train the neural networks, here a large quantity of sentences is required to train the algorithm into recognizing sentence construction. NLP is the wide word that contains many different things. In this precise case what is interesting is the Named entity recognition (NER). It is used to look after named entities in a text and to classify it in categories such as usually, names of persons, places or what we are interested in. It can then be extended to classify these named entities into interesting categories for the actual work such as colors, composition, size.

To be able to work it requires a large number of sentences pre-labeled which means the algorithm is told where in the sentence is the named entity and to which category it belongs to be trained and then be able to recogize them itself.

The problem is that many product descriptions are not real sentences but rather bullet points lists since only a few people would read 10 lines just to know the composition of a sweatshirt. And this NER solution is completely inefficient on bullet point lists.

→ Tip : One solution that can be though about is to use a dictionary. In fact the names of colors or of clothes materials are quite limited. So what can be done is to build several dictionaries : “colors dictionary” or “materials dictionary” and then to scan the sentence. If any of the stored word are found in the sentence they are then extracted and returned as interesting data.

At Scalia, we believe that deep tech is only a mean to an end. When they happen to be relevant we use them, when they don’t we work around. But more importantly, given that we are still at an early stage of AI, it’s important to keep in mind that none of these techniques are fully reliable. For these reasons, we cross check outputs from various technics before enriching our customers’ data.

Applying deep tech to enhance lifestyle retail

Written by Clémence Lévecque