How to Recommend similar products to any user?

4 min readFeb 24, 2019

Let’s start with basic understanding of problem

If you are going to buy a new shirt. Let’s assume sales person had shown you only one shirt. But you expect from him to show similar shirts which you may like. This makes costumer to have multiple choices on the product he wants to buy and using this store can provide good customer experience to their customers.

Let’s deep dive into the problem.

If we observe any e-commerce website. When we search for any product. It shows multiple search results. At bottom of each result it displays separate blocks displaying similar products and recommended products for you. Usually these recommendations are did using two famous techniques, Content based Filtering and Collaborative Filtering.

Collaborative Filtering uses Matrix Factorization techniques. I will go through only Content based Recommendations which is very easy to understand and interpret.

Content Based Filtering

Content based Filtering utilizes a series of discrete characteristics of an item in order to recommend additional items with similar properties(Wiki Definition).

Let’s continue with same example of shirt.

You are searching for Black Hoodie in any e-commerce website. It displays similar items which can have same Black Hoodie with Half-Sleeves, different color other than black and black hoodie with different brand.

We are going to use Title, Brand and Size of shirt for our problem.

This problem can be solved using both NLP and Distance Measure.

In NLP, we will use very basic technique called Bag-of-Words.

Bag-of-Words

Bag-of-Words model is a simplifying representation used in Natural Language Processing and Information-Retrieval. This model is used not only for text, it is also used for Computer Vision problems.

Example:

Here we will use the example which is in Wikipedia. Let’s assume we are having two texts sentences

John likes to watch movies. Mary likes movies too.
John also likes to watch football games.

Bag-of-Words model converts two text sentences into two lists.

“John”, “likes”, “to”, “watch”, “movies”, “Mary”, “likes”, “movies”, “too”

“John”, “also”, “likes”, “to”, “watch”, “football”, “games”

Later it converts each Bag-of-Words as a JSON Object.

It takes unique words from whole corpus and form a n-dimensional vector where n is unique number of words in whole corpus. At last it looks like

We can employ the same for our data. In this case our data is Title of all products, Brand of product etc.,

we can build separate Bag-of-Words model for Title and Brand. Later we can use both by combining.

If we want to use this technique, we don’t need to build it from scratch. We already have predefined libraries.

If you are doing it in python Refer: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html#sklearn.feature_extraction.text.CountVectorizer

If you are doing it in R Refer: https://rpubs.com/MajstorMaestro/256588

We are done with preparing data for solving problem. But how we are going to solve this problem?

Using Distance between points formula!!!!!!!!!

Distance between two points:

We are going to solve the problem using distance formula.

Let’s first see the formula.

The Euclidean distance between points p and q is the length of the line segment connecting them. In Cartesian coordinates, if p = (p1, p2,…, pn) and q = (q1, q2,…, qn) are two points in Euclidean n-space, then the distance (d) from p to q, or from q to p is given by the Pythagorean formula:

Using this formula we can find distance between any n-dimensional vector. Previously we have used wiki example for Bag-of-Words model

Now, take one more sentence “John likes to watch basketball games”

If you ask anyone what’s the similar text for this new document. Anyone can easily say the similar sentence for this new document is “John also likes to watch football games”. This is because the distance between these two texts are almost less.

By seeing above snapshot, we can conclude sentence 2 is very similar to new sentence than the sentence 1.

In the same fashion, We can recommend products based on the distance measure. As less the distance is as similar the products are.

This is how content based filtering works. We can employ more complex NLP techniques like TF-IDF, Word2vec and Glove. We can give weights as input to our model based on our business requirement like instead of recommending similar products only based on title, we can recommend some of them using Title, brand of product and product price.

Thanks for your valuable time.

How to Recommend similar products to any user?

Content Based Filtering

Bag-of-Words

Distance between two points:

Written by Dinesh Boggarapu