Recommender System using Collaborative Filtering in Pyspark

An introduction to Collaborative Filtering and implementation in Pyspark using Alternating Least Squares (ALS) algorithm

Angel Das
Geek Culture

--

Photo by Glenn Carstens-Peters on Unsplash

Introduction

“A recommender system makes a prediction based on user’s historical behavior.” Have you ever noticed how Netflix manages to recommend you some action thrillers once you have binged watch Fast & the Furious Saga or Spotify recommending you a list of romantic music once you start listening to Ed Sheeran? This boils down to recommendation systems’ capabilities and their ability to predict a set of items based on your past preference.

Two common approaches used involve Content-Based and Collaborative Filtering-Based recommender systems. The content-based algorithm uses characteristics of an item to come up with the recommendation rather than relying on your interaction or a feedback loop from other users. Say you ended up watching Harry Potter and the Goblet of Fire, released in 2005; hence your following recommendation will be a movie from the same year, i.e., released in 2005. There are specific challenges with content-based algorithms, e.g.,

“You wanted to purchase a watch from Amazon and ended up buying a Fast Track watch. Would you be interested in…

--

--

Angel Das
Geek Culture

Data Science Consultant at IQVIA ANZ || Former Data Science Analyst at Novartis AU, Decision Scientist with Mu Sigma || Ex Teaching Associate Monash University