A recommendation system doesn’t have to be a complex black-box to work well. It is possible to build an effective and explainable tool combining just a couple of basic statistical principles and business knowledge.
📋 Article Contents 📋
📍 1. An Overview on Recommendation Systems
- Why Recommendations?
- Deep Learning for Collaborative Filtering
- Graph-Based Recommendation
- The Need for a Business-Oriented Approach
📍 2. A Flexible and Explainable Approach to Recommendation
- The Scenario
- Customer Fingerprint
- Product Fingerprint
- Customers Clusters
- Top Products List
📍 3. Final Toughts and Improvements
1. An Overview on Recommendation Systems
“35% of Amazon’s revenue is generated by its recommendation engine”
It’s very likely that you heard this announcement and that’s one of the elements that contributed to create a big hype around recommendation systems: the idea of a “magic” tool which can improve sales up to 35% is so exciting!
One could easily understand why many decision makers, turned on by this information, wish to take advantage of this powerful instrument in their companies. It totally makes sense, but when it comes time to take action they are held back by some reasons.
In order to understand their feelings and doubts, it should be considered that many different recommendation algorithms exist and the choice of the most suitable mainly depends on four elements:
- 💡 Clarity of the result for non technical-people
- 🔮 Effectiveness in finding valuable rules
- ⚙️ Computational effort
- 💼 Possibility to add business-specific logics to the algorithm
Deep Learning for Collaborative Filtering
A first kind of R.S. (described by Matteo in this article) exploits the power of neural networks to construct an embedding vector of each product (i.e. a vector of values describing it). These methods are usually effective and, after a first intensive training phase, they don’t require a big effort to be used. The point is that, for a business-user, they are black-boxes and the constructed embeddings can be meaningless for humans.
Another kind of R.S. (presented in this other article by Matteo) takes advantage of a graph database to easily calculate relations between elements. These methods update in realtime and let the analyst build rules based on its own domain knowledge. The pain point here could be the inabilty to find latent patterns and hidden rules.
The Need for a Business-Oriented System
Definitely, both these approaches can be really effective at predicting good suggestions for customers, but none of them can give answers to all the fears of many decision makers:
- The lack of control over a black-box tool and the difficulty in understanding why it is performing well or bad
- The impossibility of adding their business experience into the algorithm to improve the results
- The need to find hidden rules, or statistically confirm their feelings and intuitions.
Luckily, an interesting solution is given by a third kind of R.S., that balances statistics and business knowledge. This approach requires a daily pre-process of data and that makes it the most intense in terms of computational effort. On the other hand, it permits the definition of complex business-based heuristics and creates an embedding vector with a clear and pre-defined business meaning.
2. A Flexible and Explainable Approach to Recommendation
The following article by IBM Research Division, published in early 2000, proposes a flexible approach to recommendation, which is customizable both with statistical and business choices.
Personalization of Supermarket Product Recommendations
We describe a personalized recommender system designed to suggest new products to supermarket shoppers. The recommender…
As a classical recommendation systems, the final goal of the algorithm is to provide a list of “suggested products” to users, in order to increase their purchases and satisfaction.
Generally speaking, the algorithm can be splitted into four logical steps:
- 👤 Creation of a Customer Fingerprint
- 📦 Creation of a Product Fingerprint
- 👪 Creation of Customers Clusters
- 🧾 Creation of the Top Products List to recommend for each Customer
To put in a nutshell, the products list for each customer contains the top-sold products of her cluster with the fingerprint very similar to hers.
The use case discussed in the article refers to a pilot project implemented by a supermarket retailer in the UK.
The goal is to create a list of suggested products for each customer enrolled in the project. Obviously the list must increase the customer spending, without proposing “sensitive items” (e.g. tobacco, health products…) and previously purchased objects.
The method described below is easily generalizable to other retail contexts, since the only prerequisite it exploits is a hierarchical product taxonomy. Actually, products are divided across G=99 classes and each class is subdivided into fewer than 100 subclasses, generating a total of S=2302 product subclasses. (E.g. Petfoods [Class] → Canned Cat Food [Subclass] → Friskies Liver 250g [Product]).
Customer Fingerprint 👤
The first step of the algorithm provides a customer fingerprint, based on her spending habits.
The absolute spending Cₘₛ of customer “m” across all products contained in the s-th subclass is simply obtained by aggregating raw transactions over the previous fixed period (E.g. last three months).
This value should be normalized, in order to find a standard measure of the customer’s interest in each subclass relative to other subclasses. Being C*ₘ = ∑ₛCₘₛ her total spending over the period, the fractional spending is:
This value still needs a processing, since commonly purchased subclasses (such as water or fresh vegetables) will tend to dominate the fractional spending. The solution consists in taking the ratio of the individual customer’s fractional spending in a subclass to the mean value for this subclass taken over all other customers:
So, finally, each customer “m” gets a vector C′′ₘ of S entries, where the s-th element measures the strength of her interest in the s-th product subclass. That’s the customer fingerprint.
Product Fingerprint 📦
The second step of the algorithm constructs the products fingerprint. The main difference w.r.t. the embeddings found by a neural network is that these vectors are business-meaningful by design.
As for the customers, the result will be a vector Pⁱ for each product “i”, where each entry Pₛⁱ reflects the “affinity” between product “i” and subclass “s”. The reason why fingerprints of customers and products are of the same dimension, is that this makes them easier to be compared using standard similarity measures (such as cosine projection, the one chosen by the authors).
Now you are probably thinking “Ok, this makes sense, but how to choose the right value for each Pₛⁱ ?”. The solution proposed by the paper is the following:
The last step of the algorithm will clarify the reason behind these values, but first the term ‘strong association’ deserves an explanation.
The paper exploits the Association Rules method to measure relations between product classes or subclasses. More precisely, just simple associations are computed (containing a single item in both the body and the head of the rule), so that a subclass S₁ is said to be ‘strongly associated’ with the subclass S₂ if the rule S₁ ⇒ S₂ is ‘pretty relevant’.
Again, a clarification is necessary: there is no a marked distinction between ‘good’ and ‘bad’ associations. Reading the paper you will find the combination of support, lift and confidence chosen by the authors, but this depends on the case.
Customers Clusters 👪
Thanks to fingerprints extracted at the first step of the algorithm, it is now possible to group customers into clusters.
The authors stress that these groups are much more useful for recommendation then the previous used clusters (based on purely demographic information derived from questionnaires).
Indeed, thanks to the meaning of customers fingerprints, the result should ever be groups of similar customers in terms of spending habits, no matter which clustering algorithms is chosen.
Top Products List 🧾
The last step of the algorithm is the one that actually outputs the products list, and can be summarized as follows:
Consider a customer Alice, assigned to “Cluster 1”. First of all, a list of candidate products for Alice is composed taking the most popular products among other members of Cluster 1.
Then, excluding products already bought by Alice, each product in the list gets a score of its affinity to Alice’s interests (i.e. the similarity between the two fingerprints). Because of the choice of Pₛⁱ values (using association rules), Alice might result similar to products in category she never considered before, and this is exactly one of the keys of recommendation.
Finally, only top-ranked products are suggested to Alice, and if needed we can break the ties using some heuristics (such as ‘always favour the product which ensures the greatest margin’).
Final Thoughts and Improvements
As you may have noticed, this was just a brief presentation of the huge work proposed by IBM Research Division, but I hope this let you grasp the idea.
In Quantyca, we started from this baseline to propose custom algorithms, with a much more refined customer fingerprint and the usage of business-heuristics to break the ties and to expand the product fingerprint.
I believe this can be a really effective solution, both because of its explainability and because it can be tuned as you like.
I hope you’ve found it interesting, so let me know what you think and feel free to get in touch on Linkedin! 😄