Moving Beyond Meta for Better Product Embeddings

Published in

Tata 1mg Technology

8 min readOct 19, 2020

Recommender systems are essentially information filtering systems which suggest relevant items from the complete catalogue of items a user might find useful. Majority of the recommendation problems are solved using either Collaborative Filtering or Content Based Filtering approaches. The choice of the solution depends a lot on the underlying use-case that the items suggestions are needed for. For instance, on the product page of an e-commerce platform, the “Products Similar to” or “Similar Products” widget is a use-case where content-based filtering solutions have been relatively better positioned to utilise item meta information like text description, categorical tags and make relevant suggestions.

In this article we’ll explore a novel hybrid solution for the above mentioned recommendations use-case — Similar Products. We go beyond item meta-information and use temporal user-item sequences to gain insights from user intentions and are able to get better recommendations through the proposed solution.

Similar Products List

Similar product list on the 1mg product page

On the product page we wish to serve a list of alternative items to aid the user in exploration, which are similar to the product the user is viewing. We have a huge inventory of approximately 300k products from which we aim to serve the top 10 most similar products the user will find most useful as alternatives for each product. We need product embeddings to be able to quantify similarity between products(using cosine similarity) to identify the top 10 similar products for each product.

We use cosine similarity to find the normalized dot product. It is efficient to evaluate similarity for sparse vectors.

Solutions Explored

Up until very recently, we were only focused on Content based product embeddings:

TF-IDF(Explicit Embeddings): Explicit embeddings crafted using TF-IDF on description and other meta-info like product tags, categories and heuristics(iteratively filtered overtime).

BERT Score(Bert Token Embeddings+Token IDF weights): Embeddings formulated using pre-trained BERT models applied over description text and meta-info; combining a pre-trained language model(BERT) embedding and IDF weights to compute similarity.

Although both these methods are pretty easy to implement, they have a few limitations:

TFIDF produced embeddings of length 30K which made it computationally expensive to work with.
TFIDF and BERT needed further hueristics to evaluate the final list of similar products.
BERT could only be used for products with a description.
Description exceeded length limit of pre-trained BERT model.
Meta-info is inconsistent and sparse.
Not the most intuitive embedding technique as the BERT language model was pre-trained on tasks not relevant to our use case.

Another very important piece of information, which is beyond the scope of the meta-info at hand, is implicit user perception of the product; eg. A user looking at multi-vitamins with the intent of living a healthier lifestyle usually looks at lifestyle brands such as GNC, HealthVit, 1mg etc., whereas a user who is prescribed a multi-vitamin due to deficiency looks at pharmaceutical brands such as Sanofi. This can only be encoded if we observe user-item interactions.

We needed to look beyond the run of the mill Content based techniques, in light of the limitations mentioned above. This led us to explore new embedding techniques which we will explore in this article; firstly Prod2Vec, and then a hybrid of Prod2Vec and Content based embeddings called MetaProd2Vec+.

Moving Beyond Meta: Prod2Vec

We extend a well known architecture Word2Vec to learn these embeddings, and we call this method Prod2Vec. We treat products as word tokens and use user sequences(analogical to sentences) to learn product embeddings. The most publicised and revered features of Word2Vec was the ability of the model to encode semantics of the words; which is exactly what we need for similar products(similarity can be viewed as a measure of semantic coherence). We learn these product embeddings, without taking meta-info into consideration.

Implementation

As mentioned in steps a, b and c in the figure above; we fabricate user journeys rather than using actual user journeys. To combat non-homogeneity, we use Markov chain methodology which relies only on present state to generate the next state.

We calculate transition probabilities between products, using unique user count.

probability of going from product i to j. M is unique user transitions. 0 if no user has navigated to product i to j.

For each product(i) we generate N sequences of length L where the first product in each sequence is product(i) and the next product is generated by sampling using Markov chain from the transition probability distribution of product(i) and this in-turns forms our corpus to be fed to the Word2Vec(Skip Gram) architecture(step d).

We learn embeddings by predicting context tokens from a target tokens.

We treat the products as word tokens and feed it into the model to learn embeddings for all the products.

Limitations

Although Prod2Vec impressed initially at capturing a product’s context in terms of recall, Prod2Vec struggled with long tail coverage of products — Recommendations for new or less visited products are usually popular products as Prod2Vec optimises recall.

Inclusion of Product Meta-Information in the generation of the product embeddings can do much better in getting relevant suggestions for Similar Products

MetaProd2Vec+

An improvement on this method is proposed by MetaProd2Vec. This method encodes the product’s meta-info along with the sequence context to achieve the product’s embedding, which helps in dealing with the problem mentioned above since we have meta-info powering our embeddings and hence handling long-tail products with low co-occurrences.

Our hybrid method leverages past user interactions with items and their attributes to compute low-dimensional embeddings of items. Specifically, item metadata is injected into the model as side information to regularise the item embeddings.

This architecture is an improvement on the model in the MetaProd2Vec paper by using weighted-meta-info, instead of un-weighted concatenation of meta-info.

SI 0 is the one hot encoding for the sequence graph leading to the item embedding.
SI 1…SI n are the sparse meta-info features leading to the meta-feature embeddings.

Apart from model architecture, the implementation is identical to Prod2Vec.

Exploratory, Comparative and Performance Analysis

We perform an intensive EDA for our MetaProd2Vec+ embeddings, where we analyse the most important meta-information features aided by the weights assigned to each categorical features.

We also compare our Prod2Vec and MetaProd2Vec+ embeddings before releasing it into production using both, metrics to evaluate performance and anecdotal evidence on long tail cases.

We finally show performance gain observed in production as we progressed in our journey to achieve better embeddings, and hence better performance.

Exploratory Analysis

This model architecture has enabled us to analyse the importance of different meta info features in the final embedding. For majority of the products the item embedding itself holds the highest weightage, although Therapeutic Use and Use both hold high importance as well(Use and Therapeutic Use for the user is basically defined by the ingredients of the product and hence the high significance). This enables us to better understand the significance of these features, and how these features are perceived by the users.

Use and Therapeutic Use play a major role in defining the product embedding

We plotted our product embeddings on the cartesian space and observed the clustering for the products belonging to the category of ‘Personal Care’, to find that they are quite well segregated on basis of their use, which confirms the previous observation with regards to high significance of Uses in defining the embeddings.

Comparative Analysis

Before pushing a model to production, we compare a few metrics between various implementations.

We will be comparing our MetaProd2Vec+ recommendations with tfidf, bert and Prod2Vec recommendations. We will be comparing usage_relevance and explorability.

usage_relevance: We have several proprietary tags such as Uses, Categories etc. which are used to label our products; we use the coherence of the Uses of an item with the Uses of the similar products recommended for that item to compute an usage relevance ranking using the mathematical formulation of NDCG.
explorability: How many distinct products are we able to recommend to the user; since we don’t want to limit the explorability of our inventory. Too high explorability might be counter-productive as it might come at the cost of relevance.

So we will need a model which performs well at both the metrics.

*bert — bert-score similar list
*tfidf — tfidf similar list
*p2v — prod2vec list
*mp2v+ — metaprod2vec+ list

p2v and mp2v+ outperform tfidf and bert in terms of inventory exploration.

mp2v+ comes out a clear winner with a median value of 0.86 for NDCG@5, 0.83 for NDCG@30 and 0.93 for NDCG@50.

Anecdotal evidence showing better recommendations for long tail products using **MetaProd2Vec**

Performance Analysis: A/B and final release CTR metrics

The predictions after evaluating similarity between the product vectors were shown on the Similar Products widget on each Item page.

TF-IDF was the first algorithm which was used to vectorise products and give a 6.5% CTR.
Prod2Vec embeddings improved the CTRs by 15.4% beyond those achieved by TF-IDF Embeddings.
MetaProd2Vec learns from noth the sequence of interactions of users with products, and the metadata. These embeddings helped in achieving a net improvement in CTR by 35% beyond those achieved by TF-IDF Embeddings.

Performance in production(CTR is the click through rate).

In Conclusion

On any e-commerce platform with a massive inventory, more likely than not the meta-info and product content/description is going to be sparse and inconsistent. MetaProd2Vec+ is a state of the art technique to encode your inventory capable of weighing the significance of different tags and features in the meta-info for the embeddings to further enable you to understand user’s perception of items driving user-item interactions. Also an interesting direction to be explored might be the usage of these embeddings in sequential models to generate real time personalised dynamic recommendations; similar to the usage of Word2Vec embeddings in language models; where products and word tokens act analogous to one another.

References

Tianyi Zhang (2019). BERTScore

Flavian Vasile (2016). Meta-Prod2Vec

Jizhe Wang (2018). Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba