Recommender Systems: Optimized!

Published in

Moosend Engineering & Data Science

7 min readJun 24, 2019

Good day everyone and welcome to another one of my quests!

The Recommender Systems quest continues with vigor. I’ve shown you the way past the Product Clustering Isles, through the endless waves of Named Entity Recognition, the Information Enhancement, the Validation Framework and how to build it and, at this point, I am going to show you how I improved the performance of my model.

The brief explanation is the following: Taking a closer look into eCommerce purchase patterns, I discovered multiple “places” where me and my crew could sail and improve our results. In other words, we had multiple points we needed to take care of.

There was only one way to take care of that: By fracturing all of our data and compiling shop clusters and, afterwards, using a scoring formula that represents the interests of the user with a score and not a “yes” or “no” answer.

What is more, I decided to calculate the decay of interest and remove the bias that exists in interactions and add these results to the mix.

The Shop Clustering Approach

How many shops can you count in one single journey? I’ll be honest with you, I’ve got no idea, seeing as there are so many different kinds of shops out there!

Although there was a significant decrease of the matrix after applying Product Clustering, that was not enough.

I had to reduce the size of the interaction matrix that we had somehow, in order to make it more manageable. So, I did the only thing that made sense: I ran a clustering algorithm in shops. This helped the crew and myself classify the most similar ones into categories (or groups, if you will).

This clustering method can have a great impact on the product recommendation this engine produces and it proved to be a great way to differentiate the clusters of shops buyers purchase products from, using a different modus operandi and optimize each cluster separately.

Every shop and product each one of them sells was represented as a vector. Then, the crew and I decided to create a sparse matrix. In order to do that, we decided to combine all shop vectors and try to find the ones that were the most similar. Those were combined in couples.

After that, we started building the distribution of shop clusters.

And we used three “compasses”, three different ways to measure the similarities without getting lost in the process:

The vector similarity was used to calculate the Euclidean distance.
The percentage of product similarity in each couple of shops was used to calculate how many shops could “learn” from each other.
The product interactions each shop could provide to others were used so that we could share the customer-product interaction information data.

More Formulas: The Interest Score Approach

Much like a crew is represented by the color of the ship’s sails, each and every customer is represented by their product interactions in a 2D sparse matrix. We call that matrix, “R”.

Its rows represent the customers. Its columns, on the other hand, represent the products as vectors.

Views, add-to-cart, and purchases are key customer interests, but they can be measured inaccurately as equal. How is that possible, you ask?

Well, all of us have viewed a product by accidentally clicking on it. On the other hand, I don’t think that anyone can buy a product by mistake. And maybe, just maybe you can view a product that is of no actual interest.

Therefore, we came up with a life-saver. A formula to individuate the interactions and define the score.

The formula is pretty simple: Every view scores one point (3.5 points is the maximum amount), every “add-to-cart” scores 4 points (4.5 maximum) and every purchase scores 5 points. And here you have it:

Using these techniques, we managed to help our scoring formula get ready to receive the data and measure and balance every product interaction. This created the so-called “interest-score”.

And then we were ready to fill the customer-product cell with the calculated value.

Interest Decay: What Happens Now?

There are many types of product views, but some depend solely on specific time-frames. This means that they’re only needed at a very specific point in time and therefore, couldn’t be characterized as a recurring need.

A fine example of that, would be the time one will be searching for a wedding gift. They’ll google “champagne glasses” today, but this search will be irrelevant after they attend that wedding.

We needed our model to be able to understand the fact that personal interest changes and the dominant interest differs from time to time. In order for our machine to “learn” that new path, we had to transform the customer-product interest score.

We almost crashed the ship in our attempt to teach the machine. It wasn’t easy. But after several weeks of working on this and only this, we came up with the “Exponential Decay” formula.

This formula pretty much managed to monitor how a customer interacts with various products and measure how interest changes through time.

The “Exponential Decay Formula” was applied to transform every interest score in the matrix we built.

Of course, the rate of decay was (and still is) different for every shop cluster. Customers have different buying behaviors for different shops. For example, a pirate will purchase rum way more often than… Well, all of you who are reading this article.

Or maybe you can think of how often you buy things from a supermarket, as opposed to the frequency with which you buy shoes. The frequency is completely different.

Biased Customers: How To Tackle The Issue

Up until now, I believe that everything I’ve described sounds a little easy-perhaps too easy to have taken us that long to master.

There’s something you’ll need to keep in mind, though: Every person has different needs. Needs that we had to take into account while building our model, in order to keep it from crashing and burning.

What we needed to do, essentially, was to keep all the valuable pieces of information, while eliminating the differences.

So, check out the thought process:

We’ve got Product “A”, which appeals to both James and Nick. James and Nick give the product a three out of five score.

However, James grades the products he’s interacted with, with an average of 2.8. Nick, on the other hand, grades with an average of 4.

So, even though both have given Product “A” the same grade, the grade itself means something different for either one of them.

James is a sophisticated user and has higher standards, therefore he scores the products with lower grades. That being said, a 3 out of 5 review is considered to be a good one for James.

James’ standards are higher (and he’s a little tight-handed and strict when it comes to reviews).

Nick is the exact opposite: He’s generous and kind and doesn’t mind handing out good reviews. Therefore, giving it a three out of five, means that the product has a poor performance score.

We had to remove the bias from our interaction terms. This is, actually, a matrix normalization technique that helps us remove personal interests, make unbiased calculations and add the customers’ extracted interest at the end of the matrix factorization process.

So, here’s what we’ve got:

We created the “R” matrix, using customer-product interactions with the decay scores.

Then, we removed the bias from the “R” matrix:

Decompose the R — Debiased matrix

Then, we calculated the dot product of “P” and “Q” matrix and added the bias we removed before decomposition.

Pretty simple, eh? Nah, it most definitely wasn’t.

Takeaway

In the next article, I am going to present my crew’s final finding: How the final model we implemented works, and how we can compare it to the previous models we made.

Read on and stay tuned for more adventures!