If You Can’t Measure It, You Can’t Improve It !!!
How to Build a Scalable Recommender System for an e-commerce with LightFM in python.
Online shopping has changed quite a lot in the past few years. Online stores like Amazon, treats their customers at a more personal level. They understand your interests in certain products based on your online shopping activities (viewing items, adding items to your cart, and eventually making a purchase). For instance, you go on Amazon and search for an item and then you click on some of the search results. The next time you visit Amazon, you can see that there is a specific section that recommends similar products for you based on what you searched for the last time. The story does not end here and as you interact more with the online store, you receive more personalized recommendations including “Customers who bought this item also bought” and it shows you a list of items that are bought frequently together. Also some stores send promotional emails offering products that are targeting customers who are more likely to purchase those items.
Recommender Systems are one of the mostly utilized application of Machine Learning. Since the goal here is to focus on how to build the Recommender system using LightFM package and provide clear metrics to measure model performance, I will only briefly mention different types of Recommender Systems. For more details about Recommender Systems, I suggest watch this short video by Siraj Raval and check this article by Chhavi Aluja. There are three types of Recommender Systems:
- Content-Based
- Collaborative Filtering (Item-Based, User-Based, and Model-Based)
- Hybrid Methods (adding Content-Based to Collaborative Filtering)
My initial goal was to build a Hybrid model since it can incorporate content-based to collaborative filtering and is able tackle the cold-start problem with the pure collaborative filtering recommender systems. However, there are not many publicly available good datasets that have both metadata for items or users and the ratings interactions. So I decided to first work on a collaborative filtering model and understand different aspects of the Recommender Systems and in my next article, I am going to build a Hybrid model.
Approach:
Why LightFM ???
while researching about Recommender Systems, I came across many relevant great projects, However one thing missing was the lack of a clear metric to evaluate the performance of the model. I believe that if you cannot evaluate the performance of your model by providing clear metrics, it might be hard to convince your readers that the model (Recommender System) is working well enough. Therefore, I chose LightFM because it provides clear metrics like the AUC score and Precision@K that can help evaluate the performance of the trained model. this can be very useful while working towards building better models and achieving higher accuracy.
Depending on the use case or the type of the problem we are going to solve, choosing between Precision@K and AUC score could be tricky. we are going to use AUC score in this case since it measures the quality of the overall ranking and can be interpreted as the probability that a randomly chosen positive item is ranked higher than a randomly chosen negative item.
“LightFM is a Python implementation of a number of popular recommendation algorithms for both implicit and explicit feedback, including efficient implementation of BPR and WARP ranking losses. It’s easy to use, fast (via multithreaded model estimation), and produces high quality results.” LightFm Documentation.
Data
For this project, we are going to implement a pure collaborative filtering model based on Matrix Factorization approach using the Book- Crossing dataset. we are going to take advantage of an amazing data cleaning and preprocessing for this dataset by Chhavi Aluja in her post on towardsdatascience.com. Let’s have a look at the data:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from datetime import datetime, timedelta
from sklearn import preprocessing
from lightfm import LightFM
from scipy.sparse import csr_matrix
from scipy.sparse import coo_matrix
from sklearn.metrics import roc_auc_score
import time
from lightfm.evaluation import auc_score
import pickle
import re
import seaborn as snsbooks = pd.read_csv('BX-Books.csv', sep=';', error_bad_lines=False, encoding="latin-1")
books.columns = ['ISBN', 'bookTitle', 'bookAuthor', 'yearOfPublication', 'publisher', 'imageUrlS', 'imageUrlM', 'imageUrlL']
users = pd.read_csv('BX-Users.csv', sep=';', error_bad_lines=False, encoding="latin-1")
users.columns = ['userID', 'Location', 'Age']
ratings = pd.read_csv('BX-Book-Ratings.csv', sep=';', error_bad_lines=False, encoding="latin-1")
ratings.columns = ['userID', 'ISBN', 'bookRating']
books.drop(['imageUrlS', 'imageUrlM', 'imageUrlL'],axis=1,inplace=True)
books.loc[books.ISBN == '0789466953','yearOfPublication'] = 2000
books.loc[books.ISBN == '0789466953','bookAuthor'] = "James Buckley"
books.loc[books.ISBN == '0789466953','publisher'] = "DK Publishing Inc"
books.loc[books.ISBN == '0789466953','bookTitle'] = "DK Readers: Creating the X-Men, How Comic Books Come to Life (Level 4: Proficient Readers)"
books.loc[books.ISBN == '078946697X','yearOfPublication'] = 2000
books.loc[books.ISBN == '078946697X','bookAuthor'] = "Michael Teitelbaum"
books.loc[books.ISBN == '078946697X','publisher'] = "DK Publishing Inc"
books.loc[books.ISBN == '078946697X','bookTitle'] = "DK Readers: Creating the X-Men, How It All Began (Level 4: Proficient Readers)"
books.loc[books.ISBN == '2070426769','yearOfPublication'] = 2003
books.loc[books.ISBN == '2070426769','bookAuthor'] = "Jean-Marie Gustave Le Cl�©zio"
books.loc[books.ISBN == '2070426769','publisher'] = "Gallimard"
books.loc[books.ISBN == '2070426769','bookTitle'] = "Peuple du ciel, suivi de 'Les Bergers"
books.yearOfPublication=pd.to_numeric(books.yearOfPublication, errors='coerce')
books.loc[(books.yearOfPublication > 2006) | (books.yearOfPublication == 0),'yearOfPublication'] = np.NAN
books.yearOfPublication.fillna(round(books.yearOfPublication.mean()), inplace=True)
books.loc[(books.ISBN == '193169656X'),'publisher'] = 'other'
books.loc[(books.ISBN == '1931696993'),'publisher'] = 'other'
users.loc[(users.Age > 90) | (users.Age < 5), 'Age'] = np.nan
users.Age = users.Age.fillna(users.Age.mean())
users.Age = users.Age.astype(np.int32)
The above block of code is regular data cleaning to make sure that the data is in the right format before use it for the input in our model. moreover, we need to make sure that all the rows in ratings dataframe represent the data from users and books dataframes. next, ratings must only include valid rating scores (1–10) and we should get rid off all rows that has zero as the rating value.
ratings_new = ratings[ratings.ISBN.isin(books.ISBN)]
ratings = ratings[ratings.userID.isin(users.userID)]
ratings_explicit = ratings_new[ratings_new.bookRating != 0]
and here is the distribution of rating values:
one last thing we can do to complete our data preprocessing is to assign a threshold for the number of users that have rated books and also for the number of books that have been rated by users. in other words we must have a minimum rating count number for the users and books. I think it would be good to have only users who rated at least 20 books and books that have been rated at least by 20 users.
counts1 = ratings_explicit['userID'].value_counts()
ratings_explicit = ratings_explicit[ratings_explicit['userID'].isin(counts1[counts1 >= 20].index)]
counts = ratings_explicit['bookRating'].value_counts()
ratings_explicit = ratings_explicit[ratings_explicit['bookRating'].isin(counts[counts >= 20].index)]ratings_explicit.shape
(217729, 3)
Training our model:
At this step, we are going to train our model. but this problem is different. in a collaborative filtering model, we look for latent features that characterizes each item (book) based on the user-item interactions, and we look for the affinity of each user to each latent feature discovered. this process is done through Matrix Factorization. first we need to split our data (ratings_explicit) into training and testing sets. This is where things get tricky. it is clear that the pure collaborative filtering approach cannot tackle the cold-start problem. therefore, the train and test split must be done so instances of user or books in the testing set must have remaining instances in the training set:
def informed_train_test(rating_df, train_ratio):
split_cut = np.int(np.round(rating_df.shape[0] * train_ratio))
train_df = rating_df.iloc[0:split_cut]
test_df = rating_df.iloc[split_cut::]
test_df = test_df[(test_df['userID'].isin(train_df['userID'])) & (test_df['ISBN'].isin(train_df['ISBN']))]
id_cols = ['userID', 'ISBN']
trans_cat_train = dict()
trans_cat_test = dict()
for k in id_cols:
cate_enc = preprocessing.LabelEncoder()
trans_cat_train[k] = cate_enc.fit_transform(train_df[k].values)
trans_cat_test[k] = cate_enc.transform(test_df[k].values)# --- Encode ratings:
cate_enc = preprocessing.LabelEncoder()
ratings = dict()
ratings['train'] = cate_enc.fit_transform(train_df.bookRating)
ratings['test'] = cate_enc.transform(test_df.bookRating)n_users = len(np.unique(trans_cat_train['userID']))
n_items = len(np.unique(trans_cat_train['ISBN']))train = coo_matrix((ratings['train'], (trans_cat_train['userID'], \
trans_cat_train['ISBN'])) \
, shape=(n_users, n_items))
test = coo_matrix((ratings['test'], (trans_cat_test['userID'], \
trans_cat_test['ISBN'])) \
, shape=(n_users, n_items))
return train, test, train_df
The function, informed_train_test(), returns the coo matrix for the training and testing set plus the raw training dataframe for later evaluation of the model. Let’s have a look at how to fit our model and evaluate its performance:
train, test, raw_train_df = informed_train_test(ratings_explicit, 0.8)start_time = time.time()
model=LightFM(no_components=110,learning_rate=0.027,loss='warp')
model.fit(train,epochs=12,num_threads=4)
# with open('saved_model','wb') as f:
# saved_model={'model':model}
# pickle.dump(saved_model, f)
auc_train = auc_score(model, train).mean()
auc_test = auc_score(model, test).mean()print("--- Run time: {} mins ---".format((time.time() - start_time)/60))
print("Train AUC Score: {}".format(auc_train))
print("Test AUC Score: {}".format(auc_test))--- Run time: 4.7663776795069377 mins ---
Train AUC Score: 0.9801499843597412
Test AUC Score: 0.853681743144989
as expected the AUC score for the training set is close to 1 and we got AUC score of 0.853 in our testing set, not too bad. the parameters for training the LightFm model was tuned using random search. GridSearh was too expensive to run so I decided to use forest_minimize() in scikit-optimize package to tune the parameters. more details about the function for tuning parameters is in the github page of this article.
I know I mentioned before that pure collaborative filtering is expected to do poorly for recommending items to the new users who have not had any interaction with the items (cold-start problem). in the examples presented in LightFM documentation page, they have also shown that pure collaborative approach cannot achieve satisfactory results to recommend movies to the new customers using the movielens dataset, however, I was curious to test it myself for book-crossing dataset. And surprisingly the results show this dataset responds well to the cold-start problem with a relatively good AUC score!!! to train the model in this case, the only difference is that we split the dataset into training and testing sets randomly. it means that the probability to have common user-item interactions in trainig and testing set are completely random:
import scipy.sparse as spdef _shuffle(uids, iids, data, random_state):shuffle_indices = np.arange(len(uids))
random_state.shuffle(shuffle_indices)return (uids[shuffle_indices],
iids[shuffle_indices],
data[shuffle_indices])def random_train_test_split(interactions_df,
test_percentage=0.25,
random_state=None):
"""
Randomly split interactions between training and testing.This function takes an interaction set and splits it into
two disjoint sets, a training set and a test set. Note that
no effort is made to make sure that all items and users with
interactions in the test set also have interactions in the
training set; this may lead to a partial cold-start problem
in the test set.Parameters
----------interactions: a scipy sparse matrix containing interactions
The interactions to split.
test_percentage: float, optional
The fraction of interactions to place in the test set.
random_state: np.random.RandomState, optional
The random state used for the shuffle.Returns
-------(train, test): (scipy.sparse.COOMatrix,
scipy.sparse.COOMatrix)
A tuple of (train data, test data)
"""
interactions = csr_matrix(interactions_df.values)
if random_state is None:
random_state = np.random.RandomState()interactions = interactions.tocoo()shape = interactions.shape
uids, iids, data = (interactions.row,
interactions.col,
interactions.data)uids, iids, data = _shuffle(uids, iids, data, random_state)cutoff = int((1.0 - test_percentage) * len(uids))train_idx = slice(None, cutoff)
test_idx = slice(cutoff, None)train = coo_matrix((data[train_idx],
(uids[train_idx],
iids[train_idx])),
shape=shape,
dtype=interactions.dtype)
test = coo_matrix((data[test_idx],
(uids[test_idx],
iids[test_idx])),
shape=shape,
dtype=interactions.dtype)return train, test
Now Let’s have a look at how the AUC score is different in the random train-test split:
train, test = random_train_test_split(ratings_matrix)start_time = time.time()
model=LightFM(no_components=115,learning_rate=0.027,loss='warp')
model.fit(train,epochs=12,num_threads=4)
# with open('saved_model','wb') as f:
# saved_model={'model':model}
# pickle.dump(saved_model, f)
auc_train = auc_score(model, train).mean()
auc_test = auc_score(model, test).mean()print("--- Run time: {} mins ---".format((time.time() - start_time)/60))
print("Train AUC Score: {}".format(auc_train))
print("Test AUC Score: {}".format(auc_test))--- Run time: 8.281255984306336 mins ---
Train AUC Score: 0.9871253967285156
Test AUC Score: 0.6499683856964111
It is expected that you get an AUC score around 0.5 for randomly splitting the data but we can see that we are doing much better than just flipping a coin to recommend items to the new users or new items to the current users since we are getting 0.649 for our AUC score. I leave further analysis of this behavior to the readers of this article.
Application:
Let’s assume that the books in out dataset are the items we are selling and the users in the dataset are actually the intended customers. one thing that we can change about the rating values in order to make the case closer to an e-commerce online store is to decrease the range of values from (1–10) to (7–10). the interactions of the customers can be summarized into: A- viewing items, B- clicking on items, C- adding items to their shopping cart, and D- making a transaction to purchase items. therefore, by reducing the rating values to the top 4 ratings (7,8,9,10) in this case, we can have a closer simulation to the mentioned item-customer interactions.
There are three main scenarios this Recommender System can be utilized for an e-commerce application. I am not going to include the actual code for the functions that provide the result for the next sections here on the article but they will be in the main jupyter notebook in the github repo.
- The most common scenario is typical recommendation to a specific customer based on his/her interactions (viewing and clicking on the items):
user_dikt, item_dikt = user_item_dikts(user_item_matrix, books)similar_recommendation(model, user_item_matrix, 254, user_dikt, item_dikt,threshold = 7)Items that were liked (selected) by the User:
1- The Devil You Know
2- Harlequin Valentine
3- Shout!: The Beatles in Their Generation
4- Sandman: The Dream Hunters
5- Dream Country (Sandman, Book 3)
6- Assata: An Autobiography (Lawrence Hill & Co.)
7- The Golden Compass (His Dark Materials, Book 1)
8- The Fellowship of the Ring (The Lord of the Rings, Part 1)
9- The Hobbit: or There and Back Again
10- Harry Potter and the Sorcerer's Stone (Book 1)
11- Something Wicked This Way Comes
12- Martian Chronicles
13- Animal Farm
14- 1984
15- The Dark Half
16- Harry Potter and the Goblet of Fire (Book 4)
17- Harry Potter and the Prisoner of Azkaban (Book 3)
18- Harry Potter and the Prisoner of Azkaban (Book 3)
19- Harry Potter and the Chamber of Secrets (Book 2)
20- Harry Potter and the Chamber of Secrets (Book 2)
21- The Bonesetter's Daughter
22- The Wolves in the Walls
23- Stardust
24- Martian Chronicles
25- American Gods: A Novel
Recommended Items:
1- The Lovely Bones: A Novel
2- Harry Potter and the Order of the Phoenix (Book 5)
3- The Catcher in the Rye
4- The Da Vinci Code
5- Harry Potter and the Sorcerer's Stone (Harry Potter (Paperback))
6- Red Dragon
7- Interview with the Vampire
8- Divine Secrets of the Ya-Ya Sisterhood: A Novel
9- Sphere
10- The Pelican Brief
11- Little Altars Everywhere: A Novel
12- To Kill a Mockingbird
13- Coraline
14- The Queen of the Damned (Vampire Chronicles (Paperback))
15- The Hours: A Novel
we recommended 15 items (books) to the user with ID#: 254, based on similar User-Item interactions.
- The second most common scenario is when you are planning to inform the customer about up-selling ( selling additional items that complement the items purchased) and cross-selling ( selling items in other independent categories the customer might be interested in) options. explicit examples are: “ Frequently bought together”, “ Customers who viewed this item also viewed …”. for this task, I had to increase the threshold of users and books ratings from 20 to 200 so I can get a smaller ratings dataframe. finding similar items requires creating item embedding for all the items in the dataset and it can take a huge portion of the memory (RAM). I tried to run it several times but I was getting LowMemory error so if you have enough RAM on your machine, you can try it with lower thresholds:
item_embedings = item_emdedding_distance_matrix(model,user_item_matrix)
also_bought_recommendation(item_embedings,'B0000T6KHI' ,item_dikt)Item of interest :Three Fates (ISBN: B0000T6KHI)
Items that are frequently bought together:
1- Surrender to Love (Avon Historical Romance)
2- Landower Legacy
3- Ranch Wife
4- Sara's Song
- The third application of this Recommender System that can help improve customer’s experience and increase the sale is when you have a store and you decide to run a promotional campaign by recommending a product to specific users who are more likely to buy that item:
users_for_item(model, user_item_matrix, '0195153448', 10)[98391, 5499, 136735, 156214, 96473, 83443, 67775, 28666, 115929, 42323]
we recommended 10 users (IDs) who are more likely to be interested in the item (book) with ISBN # : 0195153448. the next step might be sending promotional emails to these users to see if they are interested in the mentioned item.
Final Thoughts:
It is important to note that in general, Collaborative Filtering approach needs enough data (user-item interactions) in order to get a good result. your questions and comments are highly appreciated.
here is the link of the github repo for this project:
References:
https://github.com/aayushmnit/cookbook/blob/master/recsys.py
https://towardsdatascience.com/my-journey-to-building-book-recommendation-system-5ec959c41847