Week 4— MOOC Recommendation

Muhammet Ali Şentürk
AIN311 Fall 2022 Projects
5 min readDec 18, 2022

What is up ladies and gentlemen, how is it going? Welcome back to our blog :)

This week, we have come a long way in this project. That was a very dense week. We have considered some different approaches to solve the problem. And at the end of the article, we have a surprise for you :) Without further ado, let’s start.

By the way, if you want to remember what we have done so far, check these two links (Week 2 — MOOC Recommendation, Week 3— MOOC Recommendation)

— Course Recommendation Using kNN —

Last week, we did some preprocessing such as filtering the undesired users and deploying the utility matrix. And lastly, we provide our program to recommend some courses. But these are not a part of machine learning so far. So with this week’s progress, we entered the field anymore. The first approach is -as you may guess from the title- using the kNN algorithm.

Everyone who is interested in machine learning knows kNN right? It is simple to understand what is behind it and implement it. It basically memorizes the train data and makes a prediction for the test data by considering the distances between each test sample and the train samples. Therefore it is slow on the test set as well as fast on the training set. Our approach is turning the utility matrix into a real matrix. Although we call it that, it is not a real matrix. At this stage, it is a pivot table.

A little part of the utility matrix

After we convert it into a real matrix, it is ready to be fitted by kNN. While deploying kNN, we set the value of the metric argument as “cosine”. By doing that, it should calculate the similarity of rating vectors. So let’s look at some recommendations for a randomly chosen course.

Course recommendations with kNN

The sample user has taken the Node.Js course. So what do we expect to be recommended? Of course, it is highly possible to recommend a web development course since Node.Js is used for mainly the back-end development area. Our algorithm recommended this type of course which has the same instructor. This is another detail we need to evaluate. Looks good.

— Matrix Factorization—

Another idea is matrix factorization. According to Wikipedia: ”Matrix factorization is a class of collaborative filtering algorithms used in recommender systems. Matrix factorization algorithms work by decomposing the user-item interaction matrix into the product of two lower dimensionality rectangular matrices”. Matrix factorization methods are usually outperformed because it discovers the hidden relations between users and courses. We used the single value decomposition(SVD) model for matrix factorization. Since there are 12 significant variables total for both user and course, the number of components is 12. After fitting the model, we calculated the correlation coefficient for every course pair. This step aims to get the highest coefficient scores. The highest scores correspond to the top recommendations.

Course recommendation with SVD

Again we have the same course to compare with kNN. When we look at the top recommendation, there is a noticeable difference. It seems like this algorithm found the hidden relationship between Node.Js and AWS. We are saying hidden because the kNN algorithm was not able to do such a recommendation. That is the difference. Maybe it could do it but as we said, we are talking about the top recommendation. It means that the most related course is recommended at the top. SVD finds the AWS course more related than React course.

— 🥳🎉Surprise🎉 🥳—

As we promised, our surprise is ready:) So are you ready? We can hear you saying yes. Then, there it is!

scikit-learn surprise
  • The name SurPRISE (roughly :)) stands for Simple Python RecommendatIon System Engine.

But… This was not an expected result 😕” Cheer up, this is a new approach we can consider to solve the problem.

The last approach is using the scikit-surprise module. This toolkit allows us to try different collaborative filtering methods and compare them with each other. It also has a built-in cross-validation method to select the best options. We decided to give it a try since it offers practicality to creating recommendation systems. Here are the RMSE and MAE values for SVD and KNN algorithms tested over a set consisting of more than two hundred thousand rates.

surprise for kNN
surprise for SVD

This module is a surprise for us as well. We still try to understand it and how to utilize it better in the upcoming weeks. Also from now on, we will start to do some different applications that will gain our project uniqueness.

So yeah, that is all for this week. Until we meet again, stay awesome, take care, and bye.

Authors

References

https://en.wikipedia.org/wiki/Matrix_factorization_(recommender_system)

https://surpriselib.com/

--

--