Building Recommendations for Articles With IBM
Implementation of various recommendation systems’ algorithms for recommending articles on the IBM Watson Studio Platform.
Recommendation Systems are one of the most popular data science algorithms that drives much of the revenue for a company such as Netflix, Amazon and Google.
Recommendation systems are basically algorithms used to predict a rating or a preference that a user might attach to a particular item. The item can be a product, a movie, an article and so on.
There are various types of algorithms used in building such Recommendation Systems. Some of the popular ones are as follows:
- Rank Based
- Collaborative Filtering
- Matrix Factorization
- Content Based
- Knowledge Based
In this article, the first three algorithms are discussed along with their implementations using the data from IBM Watson Studio Platform.
Exploratory Data Analysis
First, let’s analyze the data. The data is from IBM Watson Studio Platform containing the information about the user and the articles that he has interacted with. The email is encrypted to ensure privacy
Let’s see how many articles does a user reads generally by plotting a histogram. As it can be seen, most of the user read up to 10 articles with a mean value of around 9 articles per user.
The total number of articles on the IBM Platform are 1051 out of which 741 have been interacted with at least once. The total number of users and user-article interactions are 5148 and 45993 respectively.
Finally, a function is used to convert the encrypted email addresses to user ids for easier processing.
Rank Based Recommendations
Rank Based Recommendation Systems uses information about particular items, in this case, articles, to recommend the most popular items to a user. This is helpful primarily in the case when we don’t have any prior knowledge about the user’s preferences
So, whenever a new user comes and wants to see what he/she can read, the recommendation system recommends the top articles having the highest ratings or interactions.
In our case, we don’t have ratings for whether a user liked an article or not. We only know that a user has interacted with an article. In these cases, the popularity of an article can really only be based on how often an article was interacted with.
For this, the following function will be used to group the interactions based on article ids and sort those based on their interactions with various users. We can further select the number of articles that we want to recommend to a user.
User-User Based Collaborative Filtering
There are different types of collaborative filtering algorithms. Mainly user-based and item-based. In this project, I will be using User-Based collaborative filtering for recommending articles to the users
User-Based Collaborative filtering is based on the idea that people with similar characteristics share similar taste. In our case, it means that if two users have interacted with similar articles in the past, we can recommend the articles seen by one user to the other as it makes sense that they will like similar articles in the future too.
For this, we will first create a user-item matrix where the rows and columns will be the user and article ids respectively. If a user has interacted with an article, we will place ‘1’ there and ‘0’ otherwise.
Now we will use this, to find the similar users for a given user id based on the articles they have interacted with. Then the articles that are seen by the similar users but not the given one are recommended. To improve the consistency, the users and articles with more interactions are chosen before choosing the ones with fewer interactions.
The following function gives the recommendations and makes use of other functions defined in the jupyter notebook. Below is a short definition of the various functions used here:
get_top_sorted_users:It returns the users that have interacted with the same articles. It is sorted by first the similarity and then the number of interactions a user has.
get_user_articles:It returns the articles that a user has interacted with
- Finally, the function loops over the similar users and recommends the articles sorted by the articles having the most interactions.
Matrix Factorization is basically a class of collaborative filtering algorithms. It works by decomposing the user-item matrix into the product of two lower dimensionality rectangular matrices. The algorithm used here is Singular Value Decomposition (SVD). To understand how SVD calculates these matrices, see this video
To use SVD to make recommendations, let us first explore the concept of latent features. Latent features are those features that aren’t actually observed in the data but can be inferred based on the relationships that occur. For instance, an article might be about Artificial Intelligence and a user may have interacted with a lot of Artificial Intelligence related articles. Thus, Artificial Intelligence is the latent feature here.
SVD transforms the user item matrix into three separate matrices. First one, U which basically shows the relation between the users with the latent features. For instance, how does user 1 feel about Artificial Intelligence articles? V transpose shows the relation between the articles and the latent features. The last matrix is Σ which basically is a diagonal matrix of length equal to number latent features and basically contains the weight associated with each latent factor on the diagonal
To calculate these matrices in Python, we will use the built in SVD function available in numpy
u, s, vt = np.linalg.svd(user_item_matrix)
Normally, we don’t use all the latent features in these matrices and try to calculate the optimal number of features that can be used to re-create the user-item matrix as shown in the graph below:
To get a recommendation, we just calculate the dot product using these matrices for the given user.
In this article, we discussed three popular methods for building recommendation systems:
- Rank Based recommendation systems are used for recommending the top items in cases when no prior information about a user is available
- Collaborative Filtering is used to recommend items from users that share similar characteristics
- Matrix Factorization decomposes the user-item matrix into three matrices for making recommendations. Normally, a variation of SVD known as Funk-SVD is used.
To see the full code and additional analysis regarding the project, view the code available on my GitHub here