Embracing the Future: Building a Simple Recommender System with Python

Mubariz Khan
3 min readMar 29, 2024

In the era of information overload, recommender systems have emerged as essential tools for filtering out the noise, providing users with personalized content, products, or services. Whether it’s suggesting new movies on Netflix or products on Amazon, recommender systems enhance user experience and engagement. This tutorial will guide you through creating a basic recommender system using Python. By the end, you’ll have a foundational understanding of how these systems work and a springboard into more complex algorithms and applications.

Prerequisites:

  • Basic understanding of Python programming
  • Familiarity with pandas and NumPy libraries
  • An environment to run Python code (Jupyter Notebook recommended)

Setting Up Your Environment:

First, ensure you have Python installed on your system. Then, install the necessary libraries using pip:

pip install numpy pandas scikit-learn

The Dataset:

For this tutorial, we’ll use a simplified movie rating dataset. It contains users, movies, and their ratings. You can create this dataset yourself or use an existing one like the MovieLens dataset for more complexity.

Step 1: Data Preparation

Load your dataset using pandas:

import pandas as pd

# Example dataset structure
data = {
'User': ['Alice', 'Bob', 'Cindy', 'Dan', 'Alice', 'Bob', 'Cindy'],
'Movie': ['Matrix', 'Matrix', 'Matrix', 'Matrix', 'Inception', 'Inception', 'Inception'],
'Rating': [5, 6, 7, 8, 9, 10, 6]
}

df = pd.DataFrame(data)
print(df)

Step 2: Understanding the Data

Before diving into recommendations, let’s understand our data better:

# Print unique users and movies
print("Unique Users:", df['User'].unique())
print("Unique Movies:", df['Movie'].unique())

# Calculate average movie rating
average_ratings = df.groupby('Movie')['Rating'].mean().reset_index()
print(average_ratings)

Step 3: Building a Simple Recommender

We’ll start with a very basic recommender that suggests movies based on their average ratings:

def recommend_movies(user, num_recommendations=3):
# Assuming the user hasn't seen the movies already
user_seen_movies = df[df['User'] == user]['Movie']
unseen_movies = average_ratings[~average_ratings['Movie'].isin(user_seen_movies)]

# Recommend the top N movies
recommendations = unseen_movies.sort_values(by='Rating', ascending=False).head(num_recommendations)
return recommendations

print(recommend_movies('Alice'))

Step 4: Enhancing the Recommender with Collaborative Filtering

While our simple recommender works, it doesn’t personalize recommendations. Let’s introduce a basic form of collaborative filtering to recommend movies similar users liked:

from sklearn.metrics.pairwise import cosine_similarity

# Create a user-item matrix
user_item_matrix = df.pivot_table(index='User', columns='Movie', values='Rating')

# Compute similarity between users
user_similarity = pd.DataFrame(cosine_similarity(user_item_matrix.fillna(0)),
index=user_item_matrix.index,
columns=user_item_matrix.index)

def collaborative_recommend(user, num_recommendations=3):
# Find similar users
similar_users = user_similarity[user].sort_values(ascending=False)[1:] # exclude self

# Aggregate ratings by similar users
recommendations = pd.Series(dtype=float)
for similar_user, score in similar_users.iteritems():
similar_user_ratings = user_item_matrix.loc[similar_user].dropna()
for movie, rating in similar_user_ratings.iteritems():
if movie not in user_item_matrix.loc[user].dropna().index: # User hasn't rated the movie
if movie not in recommendations:
recommendations[movie] = rating * score
else:
recommendations[movie] += rating * score

return recommendations.sort_values(ascending=False).head(num_recommendations)

print(collaborative_recommend('Alice'))

Conclusion:

Congratulations! You’ve just built your first recommender system. While simple, this tutorial lays the groundwork for exploring more sophisticated algorithms like Matrix Factorization and Deep Learning-based recommenders. The field of recommender systems is vast and continually evolving, offering endless opportunities for learning and innovation.

Next Steps:

  • Explore different similarity metrics (e.g., Pearson correlation).
  • Implement a Matrix Factorization technique.
  • Dive into more advanced models like neural networks for recommendation.

Remember, the key to mastering recommender systems — or any technology — is continuous experimentation and learning. Happy coding!

--

--

Mubariz Khan

Perpetual Student, An AI/ML engineer/ Data Scientist, Grad Student @ Illinois Tech.