Recommendation System for E-commerce customers — Part II (Collaborative filtering)

Raymond Kwok
Analytics Vidhya
Published in
7 min readJul 11, 2019

The second article in the series covers the theory and codes for implementing the collaborative approach for building an e-commerce recommender system.

Part I: Content-based

Part II: Collaborative

Part III: Hybrid

Problems with Content-based approach

In the last post we discussed about the content based approach for recommending items. To give an overview, it make use of existing data about the customers and the items to rank recommended items in customers’ favor.

But when it is not possible to collect adequate information about both the customers and the items for using the content-based approach, maybe because customers and items grow too quickly and are growing not under control, then collaborative filtering could make its contribution here. Lets understand how it works.

Collaborative filtering approach

Unlike content based approach, the collaborative filtering approach exploits interaction history between any customers and items to recommend one to another. Therefore, this approach can work without deep understanding of each customer and each item. But we still need some event history to tell how likely one likes (or is like) another. Let us start working with the data.

Note: As explained in the first article, categories, instead of items, are recommended to customers.

Interaction History: In each row, a customer either views/adds-to-cart/purchases an item of a category

From the events database, it is easy to extract a table like the above, listing who interacted with what. Assuming that a customer always likes the interacted category (which is a weak assumption, and a stronger one would probably be that a customer likes it if purchased it), a customer-category interaction matrix like the following could be built.

Customer-Category interaction matrix in its raw form

If we normalize each row, then the dot product of any two rows (customers) tells us how alike they are, and if they are very similar, then it is possible to recommend an unseen category to one customer that the other customer likes too. For example, Brenda and Terry have similar interests and Rosanna is close to them, maybe we could try suggest some ‘chocolates’ to Rosanna? This may sound not convincing, but what if there are more columns for a more comprehensive description of each customer?

Remember we are only understanding them in three aspects, but as we learn them better, our recommendations could make more sense. Now, we could quickly conclude the essential idea of this implementation of a collaborative filtering — to suggest what a similar customer likes. However, we would further this approach by developing a customer embedding and a category embedding.

Embeddings

This original customer-category interaction matrix describes a customer as what categories they like. However, a customer might only has chance to get to know about a very small portion of all categories, limiting their expression of preference over all categories in our recommendation system. One way to get around this is to generalize the categories, or find most representative features that are in common between categories, and see how customers respond to these features. They are the embedded features that I would refer to in the discussions below.

You may wonder how we could generalize them into embedded features. The idea is to make use of all customers’ interaction history. For example, and let us forget about my very small customer table in the above for now, that Aaron likes fruits, vegetables, and ingredients, but he has never tried noodles in our record. However, in our full customers circle, 85% who like the former three also like noodles, then maybe we could combine the four into one embedded feature, and gives 1 as the feature’s value to those 85% of customers, and a 0.6 to Aaron. What are being combined here is not the nature that they are all food, but customers’ behaviors — similar customer likes similar categories.

Of course, one could always try to interpret the meaning behind an embedded feature to something human could understand, however, there is no guarantee that there exists a clear explanation, and as the customers’ behavior shift, what is embedded within an embedded feature would also change. Therefore, analysis to interpret the meaning is not a concern of this discussion.

We could not examine all customer and category interactions by ourselves, but rely on some mathematical algorithms to transform the interaction matrix into embeddings. Many times they come as a pair, and here there will be one customer embedding that gives us the relations between customers and embedded features, and one category embedding, that gives us the relations between categories and the embedded features.

There are multiple algorithms to do this, such as SVD, and WALS, and any other dimension reduction techniques. Among all, I will talk about using a Neural network.

Training embeddings with Neural Network

Before diving into the codes, we will first come across the concept behind. Our original interaction matrix is the giant one in the left hand side of the above formula. It has C number of customers and P number of categories, totaling C x P number of parameters with most of them zeros. The NN is an optimization process which breaks the interaction matrix into two major components — the two embeddings, together with some bias terms.

In total, the embeddings and the bias have way less number of parameters than the interaction matrix, and that depends on the value of E, which is the embedding size — this tells us how many embedded features we think is enough to capture or generalize out the most important properties of customers’ behaviors.

What the NN does is to keep tuning the parameters in the right hand side of the formula so that the discrepancy between both side is as small as possible — the smaller it is, the better the right hand side captures the left. Now let us look at the codes.

from keras.layers import Input, Embedding, Dot, Reshape, Dense, Add, Activation
from keras.models import Model
# input1_dim: total number of customers
# input2_dim: total number of categories
# emb_size: embedding size - How many embedded features to generalize customers' behavior
input1 = Input(name='customer', shape=[1])
input2 = Input(name='category', shape=[1])
# Embedding
input1_emb = Embedding(name = 'customer_emb',
input_dim = input1_dim,
output_dim = emb_size)(input1)
input2_emb = Embedding(name = 'category_emb',
input_dim = input2_dim,
output_dim = emb_size)(input2)
# Bias
input1_bias = Embedding(name = 'customer_bias',
input_dim = input1_dim,
output_dim = 1)(input1)
input2_bias = Embedding(name = 'category_bias',
input_dim = input2_dim,
output_dim = 1)(input2)
merged = Dot(name='dot', normalize=True, axes=2)\
([input1_emb, input2_emb])
merged = Add(name='add')\
([merged,input1_bias,input2_bias])
merged = Reshape(target_shape=[1])(merged)output = Activation('sigmoid')(merged)model = Model(inputs=[input1, input2], outputs=output)

The codes builds a model equivalent to the following formula.

CB: Customer Bias vector. PB: Category Bias vector. CE: Customer Embedding matrix. PE: Category Embedding matrix. c: customer. p: category.

To train the model, the input1 and input2 accept respectively lists of customers and corresponding categories. Because these are existing interactions, the model parameters are then tuned by the optimization algorithm to output 1 or close to 1 for them. Interactions that do not exist should also be represented so the model can be taught to output 0 or close to 0 for them. The embedding terms are what we need, and the bias tells us the systematic difference among customers, and that among items, which may be exploited in other ways.

The trained embedding parameters could be extracted from the model , normalize both along the axis of embedded features, and use the result for predicting how likely a customer would welcome a category among all.

import numpy as np# customer_embedding: 2d array. 0th axis: customers, 1st: embedded features, normalized along 1st axis
# category_embedding: 2d array. 0th axis: categories, 1st: embedded features, normalized along 1st axis
# matrix multiply the two embedding so it calculates the cosine similarity between each customer and every categories.
customer_category = np.matmul(customer_embedding,
category_embedding)

Summary

We have discussed an implementation of collaborative filtering, and how to construct a model that accepts raw history data and gives us the embeddings.

The last hybrid approach which makes use of the findings in the content-based and collaborative filtering will be presented in the next article. To show the difference in these approaches applying on my data (which is quite limited), I plotted the percentage of customers who visited an recommended category WITHOUT really notifying them the recommendations.

The x-axis refers to the number of recommendation made. The y-axis is the percentage of customers. While it is that with less number of recommendations, the percentage is lower, which sounds bad, however, depending on the business case, a recommender is not just to make precise predictions, but also to broaden the customer’s horizon by making reasonable suggestions that might be unheard by them. This plot will be discussed in the hybrid article.

Thank you for reading the article and for following the series so far. Please leave your comments and suggestions below.

--

--