Add computer vision features to e-commerce app

How to plan computer vision features and choose the right provider

Published in

Vize — custom image recognition blog

4 min readJun 27, 2017

Earlier, I wrote a post about the difference between general and custom computer vision platforms. Today I would like to focus on real world use-case. Lets dive into image recognition features planning.

Imaginary use-case

We are running a small business to sell colourful socks. We want to add a “Socks Matching Engine” feature into our app. Customers will upload a picture of two different socks and app tells them how cool the combination is. This is sometimes called a visual search. This way we are going to gain a respect of Hollywood’s fashion policy and increase the sales!

Plan visual image recognition task

Before we start let’s think about this.

Vision Model I — “Fashion Advisor”

Machine learning is all about having labeled data. At first, we start collecting pairs of images from our customers. Once we have enough pairs we can label each pair to say how two socks are fitting each other. With such a labeled data we can train machine learning model to serve our customers. This is clean machine learning way to make the engine but does not provide us with many business data. We do not know how many images of red and dotted socks are customers taking. The goal is to sell more socks and this way we do not follow the goal.

Vision Model II— “Parameter Extractor”

What we want instead is a model designed to extract information from images. We will focus on several parameters:

Pattern (dotted, striped, winter, summer)
Color
Sock type (ankle length, quarter length, crew length)

Each customer’s image is evaluated and labeled. This provides us with information about what are the favourite colours, patterns and types of socks of each customer. That’s great because we can now customise the next newsletter to fit your customer’s style.

At this point, matching can be as simple as adding few rules saying that blue and orange socks go together, striped goes with dotted and so on. This, of course, is a hack that does not bring much value into fashion field but it will work at the beginning.

We can also align categories with our e-shop categories and recommend customers similar socks to these they already have. When you have enough images collected we are ready to build a “Fashion advisor” model. We will also keep data extraction models to help us understand the customers and make clever suggestions.

Finding the right providers

Now we know what functions we are looking for:

Extract colour
Extract pattern
Extract sock type
Custom fashion advisor model in future

The most important is model accuracy. There is not a solution that can provide a 100% accuracy because your customer’s images are going to be so much different. Reaching 80–90% accuracy is great!

Extract color

It is easy to define a colour using general model. Google Vision or Amazon Recognition should work in this case. We can test a color extractors with drag and drop demos they provide and find the best solution. We then group colors into 15 categories.

Extract pattern

This might be the hardest part to recognize. I recommend training custom model for patterns because we want every image to have a pattern label. General models can detect strong patterns but do not provide pattern for every search. Having 20 pattern categories means we need to only about 400 images of socks for custom model training. More information about custom vision dataset is mentioned in this post.

Type of sock

General vision can also detect the type of sock. I tested few socks and got mostly “outdoor shoe” result which is not very accurate. I prefer to spend one more hour on getting images from my e-shop database and sorting them into classes rather than having blank spaces in my image recognition engine. Using a custom model also leads to higher accuracy on classification.

Having three parameters extracted from an image and saved in the database, we are now able to create the matching logic.

Fashion advisor model provider

We need a model that takes two images and labels describing how they fit each other (in %). We will not find API provider for this. However, at this stage, we have the idea of “Socks Matching Engine” proven so we don’t mind spending some money on a custom visual solution.

Summary

Most of the computer vision task is more complex than what one provider can deliver. It often needs some insight into business and so it is necessary to take a time and think about the goals and path to our goals. Even with easy to implement solutions like Google Vision we will need a backend programmer to make a good “Socks Matching Engine” feature work. Experimenting with different service providers is the way to reach the best for your customers.