Recommendation System for E-commerce customers— Part I (Content-based Recommendation)

Raymond Kwok
Analytics Vidhya
Published in
7 min readJul 10, 2019

This blog covers the content based approach for recommendation, explained with codes.

Part I: Content-based

Part II: Collaborative

Part III: Hybrid

Introduction

Recommendation systems are widely used in the industry to understand the customer behavior and recommend products. In this and the next two articles in the series, we will explore different types of recommendation systems — content-based, collaborative filtering, and hybrid. Lets first understand the basic difference between the three.

Content-based, Collaborative and Hybrid

Given that you have some items to recommend, if you know both very well, or in other words if you have enough data of your customers and the items, the content-based approach should give you pretty safe recommendations. However, they may bound to what you already know your customers like.

On the other hand, if you don’t know so much about the two sides, then the collaborative filtering approach will give you some luck by training a customer embedding and an item embedding. In this case the their characteristics are determined by an algorithm, and not by your knowledge.

Unlike content-based approach which recommends items similar to what the customer has liked in past, the collaborative filtering approach recommends items liked by customers whose choice is similar to yours. This approach is powerful but it asks for interaction history between a customer and an item, without which it will come into a so-called ‘cold-start’ problem for that particular customer or that particular item, meaning that you could not educatedly predict someone without knowing even a thing of that person. In that case, finally, a knowledge-based approach would do the job to find out something of the newcomers by you sending out questionnaire to collect their personal preference with their consent. There is no magic, but how much you know.

Therefore, depending on the level of knowledge about your customers, you could design a strategy to make the most of your data. There are difference in implementation, of course, among these approaches, and one could even combine the findings from these approaches to build a hybrid recommender. While one could come up with as many ways of implementation as creativity leads to, I will show one of mine, together with briefly talking about each (content-based, collaborative, and hybrid) approach. Before elaborating my codes, I will give a paragraph to the data I am going to train with.

Data

The dataset used in this blog is downloadable from Kaggle. It contains events that a customer view / add-to-cart / purchase an item, and the category hierarchy for each of the items. The ~235000 items are categorized in ~1300 categories and these categories are further grouped into some ~400 parent categories. In this work, I will give category recommendations to customers, instead of item ones, as the size of categories could be handled in limited resources and data, and it is enough for demonstrating the techniques.

Content-based approach

This requires in-depth knowledge about the recommended objects, and customers' preference or ratings over them. The objects’ features need to be defined, and measured beforehand. In the example of an article, they could be the name of writer, the theme, the language, and so on. These could be combined and give good, and distinguishable descriptions for each object. If a customer consistently gives high rates to one type of articles, then the content-based will be able to deliver very similar suggestions.

We are about to dive deeper with the customer-category data, are you ready?

(upper) customers’ ratings over 4 categories — shoes/vegetables/chocolates/tables. (lower) relations between the categories to the parent category they belong to.

Firstly, we could tabulate customers’ ratings, and the descriptions of the objects into two matrices (tables). This is illustrated by the partial tables in the above.

Machine learning processes numbers, so the upper table will be transformed into the following matrix.

Converting stars into numbers. Customers in different rows, and categories in different columns.

While the lower table is already numeric, in principle, it could already be used for calculations. However, because it is in my case that one category only corresponds to one or only a few parents, resulting in a sparse matrix that the parents are not generalizing the descriptions of category nicely. Therefore, I would transform the lower table into the following denser embedding matrix.

Converting object matrix into an embedding. Categories in different rows, and parent-categories in different columns.

The difference between the original human-readable objects matrix, and this embedded matrix are that, (1) the number of columns are greatly reduced, meaning you need less memory to play with it, and (2) the meaning of the embedding’s columns become abstract and thus less interpretable. The difference number 2 is certainly not an intention, but it is a direct consequence of the difference number 1. To understand this, we should take a quick look of how the embedding are produced from the original matrix.

Note: you may skip the following section if you would directly use the original object matrix without making any embedding

Mathematics of embedding

A process to split a giant object matrix into two much slimmer embeddings.

The making of the embedding is a mathematical process to split the original giant object matrix (with C x P parameters) into two slimmer embeddings with bias terms (with C x E + E x P + C x 1 + 1 x E parameters). The parameters in the right hand side are variable that subject to minimizing the discrepancy between the left and the right. In other words, we are finding the best set of parameters for the right hand side so that it can resemble, at the best, the original matrix. Since they are so arranged to be resembling each other, the category embedding is thought to be capturing the essence of the categories in reduced dimensions, or we are compressing the description of a category from its relation to hundreds of parents into a few embedding features. This explains the difference number 1 in the last paragraph, and the fact that the embedded features are a compression of human-understandable features, they could not be interpreted directly with words without very dedicated analysis and thus the difference number 2 follows. I will discuss about the code to implement this optimization process in the article for collaborative filtering.

Back to content-based

With the following two matrices, or data, we could start making recommendations. I would explain the algorithm in a step-by-step manner.

Data for recommendations. This is just a partial view of the corresponding matrices, which are expected to be very large.

Step 1: For each customer, multiply his/her rating of a category to that category’s embedded features, so that a high rating towards a category will be magnified more.

Magnifying categories’ features by ratings. You will get one lower table for each customer.
# Code for step 1
import numpy as np
# vis_cat_mat: matrix for visitors' rating to categories, 2d array
# cat_par_mat: normalized embedding from category-parent relations, 2d array
step1 = np.stack([ vis[None,:].transpose() * cat_par_mat\
for vis in vis_cat_mat])
# result in a 3D array. 0th axis: customer, 1st: category, 2nd: embedded features

Step 2: For each customer, sum (or average) the embedded feature values over different categories. This will give you a customer’s summarized, preferred embedded features.

sum over categories for each customer. You will get one lower table for each customer.
step2 = step1.sum(axis=1)# result in a 2D array. 0th axis: customer, 1st: embedded features

Step 3: For each customer, normalize the summarized feature values by dividing each with the sum of squared values . This is a preparatory step for the next.

dividing each value by the sum of the square of all values. You will get one such table for each customer
step3 = step2/np.square(step2).sum(axis=1, keepdims=True)# result in a 2D array. 0th axis: customer, 1st: embedded features

Step 4: For each customer, dot the summarized embedded features to each category’s embedded features. Since both have been normalized beforehand, this dot operation gives us a measure of the cosine similarity between the customer’s preference and the category. The cosine similarity ranges from -1 to 1, with 1 meaning the most similar, and -1 the most dissimilar. Therefore, if an category has a value of 1, meaning it is very similar to the customer’s preference, then recommending it is likely to be favored by the customer.

dot together the preference and category. If they are similar, it will yield a value close to 1. You will get one such table for each customer.
step4 = np.stack([(v[None,:]*cat_par_mat).sum(axis=1) for v in step3])# result in a 2D array. 0th axis: customer, 1st: similarity

Step 5: sort the resulting similarities and find the best k categories for making k number of recommendations.

Summary

This article discussed about the algorithm for a content-based recommender. It requires an object matrix that gives good descriptions over all objects, and once a customer’s preference or ratings towards some of the objects are available, then the customers’ preference could be ‘summarized’ by this algorithm to find out other similar objects for recommendations.

The collaborative and hybrid approaches are discussed in separate articles. To show the difference in these approaches applying on my data (which is quite limited), I plotted the percentage of customers who visited an recommended category WITHOUT really notifying them the recommendations. The x-axis refers to the number of recommendation made. The y-axis is the percentage of customers. While it is that with less number of recommendations, the percentage is lower, which sounds bad, however, depending on the business case, a recommender is not just to make precise prediction, but also to broaden the customer’s horizon by making reasonable suggestions that might be unheard by them. This plot will be discussed in the hybrid article.

--

--