A Model-Based Approach to Build a Recommendation Engine

Team AV
Analytics Vidhya
Published in
7 min readJul 25, 2019

This is a beginner’s guide for IBM’s Recommendation Engine challenge. This article is written by IBM’s Rajesh K Jeyapaul.

Recommendation engines are ubiquitous right now. Businesses around the world are creating entire strategies around these systems to improve and enhance the customer experience. It’s proven to be a winning move for organizations.

Knowing how a recommendation engine works and building one is a must-have skill for a data scientist.

Machine Learning is an optimization problem and so is building a Recommendation Engine. Recommendation Engine has the flexibility of going with a memory-based (non-model) like user-based and item-based similarity approach but the collaboration between the user and item can get into a model-based approach as well.

This might be a well-known fact since there is a lot of content available discussing the same. In this article, I would like to introduce the basic optimization that happens in the model-based Collaborative Filtering, especially the Matrix Factorization method and help you understand how a particular dataset can be fit into this approach.

To start with we will understand why Machine learning is an optimization problem.

Machine Learning — an optimization problem

Given the below dataset, can you find the co-efficient value of B0 and B1 for the given equation?

y=f(x)
Weight = B0 +B1*Height

You are allowed to make assumptions and are encouraged to start the journey with a random value(s). Take 5 seconds and proceed further.

Did you get the value for B0 and B1?

Well, I hear someone saying B0 =0 and B1=5. Appreciate your start…as I mentioned you are allowed to make assumptions, in this case, B0=0 and start

the journey with any random value, in this case, B1=5. Shall we try fitting this into the given equation?

Let’s take height=180

Weight = 0 + 5 * 180 = 900

Well, the corresponding weight is 86, not 900. So the next step would be optimizing the B1 value. We could try with 4 or 2 or any other random value. Let’s take B1=0.5 and fit into the algorithm again.

Weight = 0 + 0.5 *180 = 90

Aha, now we are close to the actual weight, 86. Now the error is reduced as well (90–86).

This is the optimization journey in machine learning. Finding the optimal co-efficient value, called weights, for, in this case,the given equation, in this case, the linear equation, results in the better performance of the model.

Wait, your journey is not over yet…can you try to fit your co-efficient value B1=0.5 for the below dataset?

Weight = 0+ 0.5 (3800) = 1900, which is not the actual weight 200…what happened..?

Well, this would be the actual behavior if we have some weird dataset like this. Hence this shows that the data preprocessing plays a key role in making the model more effective. In this case, we can consider the above datapoint as Outlier and discard it.

In general, the data points that we deal with will have this behavior and consider to be non-convex nature.

You can further proceed to fine-tune the B1=0.5 value.

This journey of trying to reach the optimal value could be called as “Stochastic Gradient Descent”(SGD)

Having understood Machine Learning as an optimization problem, we will turn our focus to understand similar optimization in the Recommendation system.

Role of Collaborative Filter in Recommendation Systems

Collaborative Filtering deals with the past behavior of the user-item relationship. For example, the explicit feedback like star ratings, comments, preference through thumbs up / down and some of the implicit feedback like purchase history, mouse movement, etc.

If the data is complete and has a visible pattern, then it is easy for us to predict and recommend. For example, take the following scenario:

If I ask you to predict the rating given for Movie 4 (M4), it is easy:

But take the following scenario:

This could be relatively easy as well, since it has a pattern to it:

Hence, we could consider the rating as:

But we know that our data is messy and behaves in the following manner:

It is nothing but a sparse matrix and that is why we are coming back again to solve this problem through optimization, this non-convex nature of data to be converted to convex.

By the way, this sparse matrix is because of our behavior only while making any online purchase. Next time to avoid this issue, try to give a rating for all the products you buy (just kidding!).

Well, the answer to this issue is “Matrix Factorization”. Collaborative Filter has both Matrix Factorization and Vector Decomposition ways to approach this problem.

Let's discuss the Matrix Factorization part. What is matrix Factorization?

We will go in the reverse direction to understand this, how to factorize a complete Rank Matrix and then how through optimization we can reach the target Rank.

Take the following Rank Matrix, User-Item Rating:

To factorize this, we need to create two feature-based “factors” common to user and Item. Can you guess what features can be considered for a Movie Rank Matrix? probably Comedy and Action movie?

Here we create the User-based ranking for those 2 features:

Here we get the movie based ranking for those 2 features. Let's see the complete picture of Matrix Factorization.

No surprise to see that the Rank Matrix is been re-built using those two feature-based matrices through a cross product.

User 1 (U1) rating for Movie1 (M1) = 3 = 1*3 + 0*1 =3

Now, how to derive those feature-based matrices? We have to go back to our ML optimization technique:

  • Start somewhere by taking a random value
  • Take assumptions

Let's see the below scenario where we have started with the random value:

Here we have taken the random value and try to reach the target and further optimize it based on the error improvisation.

In this case, the User1 (U1) rating to Movie1(M1) is 3, our target but the actual value is 1.92.

Optimization helps us to reduce the error in reaching the optimal value. In this way, the sparse matrix can be approached to build an optimal Rank matrix.

Also, during factorization, we can make one for the factor matrix as stable, say User factor and make the Item factor modified. This process is reversed making us get an alternate optimization approach called Alternate Least Square (ALS).

The journey does not stop here. Given a dataset as below, where there are no Rankings, what would be the starting point? How to build a Rank matrix?

Can we create Rank based on the frequency of items purchased (Quantity) ? or popularity based? or a dummy matrix to start with?

One of the approaches to solve such a problem is to create a Rank matrix and proceed either with Model-based CF or a memory-based CF to predict the top “n” items for the user.

The details of creating Rank for the Items purchased by each user is provided in the link below: This will help you to create a baseline model and enables you to submit the code in the ongoing practice problem. Details are below:

--

--

Team AV
Analytics Vidhya

This is the Official Handle of Analytics Vidhya. In case of queries, reach out to us at editor@analyticsvidhya.com