Recommendation engine for e-commerce — Similarity Algorithm
Recommendation engines based on Computer Vision have more impact on e-commerce business metrics and improve customer experience
With the success of supervised learning, CNN, high computing power and open source libraries, the field of Computer vision (CV) have reached to a level where many of the human task are imitated by computers.
In this article I will explain how we at Brillio built next generation recommendation engine for e-commerce industry
These Visual Similarity Recommendation Engines works in same way as human act while shopping. This helps in much better cloth discovery experience and improves business metric.
I presented paper on this at Indian Institute of Ahmedabad April 2019. One can read fine details in the paper.
Motivation — Good old days of shopping
During our childhood, festivals were exciting event for many reason but the excitement of getting new clothes was always supreme. I used to go with my parents in our tier-2 city. It had, during those days, traditional shops where there would be many person to show clothes to customer. We always had idea what clothes (jeans, shirts etc) we want to buy but it was never decided what exactly within shirts. In fact we always went to buy something new and for that we had to explore the market.
Offline shopping - world without algorithm
The shopkeeper starts by asking us what type of shirts we want. He gets some idea about our taste. (in online world this is major problem — cold start). Then he starts showing different types of shirt in his inventory. This is generally called as ‘Exploration’ phase during which he keeps the variation high. Then after seeing sufficient items we get an idea about his inventory and we select few clothes which we like. At this point of time we ask shopkeeper to show clothes ‘on the lines of chosen ones’, meaning clothes which are ‘Similar’ to them on broader sense. Since Shopkeeper has Knowledge about its stock, he starts showing selectively. At this point he has reduced the variance (variety).
And then comes the time when our selection boils down to few items. At this stage we almost always ask shopkeeper to show all variation of these few items, meaning ‘similar’ ones at granular level. Its human behaviour to see all option before making decision.
Shopping in Online world - role of algorithms
In Online world, the work of shopkeeper is taken care by Algorithms, designed to fulfill demands of various ‘Phases’. Shopkeeper work is to retrieve clothes from inventory based on user choices and finally have conversion for business. Same goes with algorithms. They have to retrieve ‘relevant’ clothes from huge inventory so that user finds items which he/she likes and buys it.
Components of Visual similarity algorithm
Building algorithm for above usecase has many steps. Getting to the final set of steps involved a lot of R&D. One can read the paper about our various experiments. Below I describe the most efficient model.
- Approach: My core approach has been to learn embeddings which captures the notion of similarity. I achieved this using CNNs that learn visual feature. It is not plain CNN model with Softmax loss which only captures coarse grain features.
- Notion of Similarity — Data Preparation I: The biggest challenge in this model building was that the concept of similarity is subjective. Different person may or may not agree that two clothes looks similar. In such scenario dataset preparation is with respect to data creator which does not generalizes. To handle this, we followed ranking paradigm in which the objective is to rank clothes based on similarity. To understand, consider we have three clothes (A,B,C) and we compare A with B & A with C. Now it more likely that two person agree that A looks more similar to B than C because now the subjectivity has been reduced by comparison with respect to an anchor A.
- Data Preparation II — The training is based on Triplet (A,B,C) approach with ranking loss. Input is three images (anchor, positive, negative). To form a Triplet, an anchor image a is selected, and a positive image is randomly selected from a Set of 200 positive images. This Set is formed by the following process: Each BMS identifies 500 nearest neighbours to anchor image A and top 200 from the union of all BMS is taken as the set of positive images. Union of all BMS image ranked between 500–1000 is taken as sample set and from this an image is randomly selected for negative image. BMS is a basic similarity model, like simple model which ranks clothes on the basis of color, pattern etc. Their accuracy need not be that accurate. We use these models to programatically generate the training data and then manually verify it.
- CNN architecture: Below diagram shows architecture of similarity algorithm for learning fine as well as coarse grain features. It has three branches, one with deep architecture (focus on coarse grained) and other two with shallow architecture which focuses on fine grained features.
5. Embeddings: Training model this way gives us the feature vector which we call embeddings. These embeddings capture the visual similarity. We extract embeddings for all image from trained model and store that in database.
6. Finding Similar clothes: To find similar clothes to query images, We used open source python Annoy library for finding nearest neighbors. It works a bit different from KD-Trees (trees are slow for large set of feature).
Above algorithm gave us the best result. Its deployment in production involved a lot of components and trade offs between accuracy and speed. It requires altogether another post.
For any question or suggestion please leave a comment.
Model Training Tricks: Check out my post on NN training trick. You will be amazed to see the result when you implement any of it.