What is Item-Based Filtering? An Applied Example In Python
Creating movie recommender
Hi. In this story, we will try to understand what item-based filtering is and we will see an applied example in Python.
You can access the Kaggle notebook that I created for this story from here.
Actually, item-based filtering is a type of collaborative filtering technique. Sometimes we can see this technique as “memory-based”. Recommending the items which have a similar “liked” structure with X item is in the basics of the mentality.
We will create a matrix like below.
In this matrix, rows represent users, columns represent items and the intersection cells represent is the liked counts. For example, User1 didn’t like Item4 and Item1 is liked 10 times. But in this technique, we will focus on items more than users.
We will use this dataset. I’ve already imported it into my workspace.
I’m going to import
pandas and read the data. After that, I’ll merge the 2 datasets. As aforementioned, we need to use items (movie) names and items liked counts. Because this filtering technique uses these 2 features and creates recommendations by using the items’ liked structure. Therefore I merged these datasets.
import pandas as pdmovie = pd.read_csv('movie_lens_dataset/movie.csv')rating = pd.read_csv('movie_lens_dataset/rating.csv')df = movie.merge(rating, how="left", on="movieId")
Creating User-Item Matrix
Actually, the main point is here in item-based filtering. We need to create the matrix that the filtering technique will use.
Firstly, I’m going to choose movies that have more than 1000 comments counts.
comment_counts = pd.DataFrame(df["title"].value_counts())rare_movies = comment_counts[comment_counts["title"] <= 1000].indexcommon_movies = df[~df["title"].isin(rare_movies)]
I don’t want the filtering technique get affected from the movies that have lower comments.
Now I can create the user-item matrix. Actually, it’s just a pivot table. This table holds user ids in rows; the items titles in columns and the rating counts (liked counts) in intersection cells.
user_movie_df = common_movies.pivot_table(index=["userId"], columns=["title"], values="rating")user_movie_df.shape>>> (138493, 3159)user_movie_df.head(10)
Actually, we completed the biggest challenge point. Other things are just a few line codes.
I’m going to choose a movie and select its rating values from our pivot table.
movie_name = "Matrix, The (1999)"# getting the ratings of the choosed movie
movie_name = user_movie_df[movie_name]movie_name
What we said above, this filtering technique calculates the selected item’s (movie’s) liked (rating) structure with other items (movies). Therefore, I need to calculate
moive_namecorrelations with others.
movie_name variable is holding the selected movie’s liked structure. So I can calculate the correlations by using this variable.
movie_name = "Matrix, The (1999)"movie_name = user_movie_df[movie_name]user_movie_df.corrwith(movie_name).sort_values(ascending=False).head(10)
Yes, we completed item-based filtering. We see the recommended movies for “Matrix, The (1999)” above.
If you go to the Kaggle notebook, you can see some helpful functions for recommending movies.
Hopefully, you enjoyed this. Also for reading about the other recommending techniques, you can visit my profile.