Recommender System for the product in its early stage

How to make Recommender System for an MVP or the first version of the product in practice

Ena Zunic-Cejvanovic

Published in

Ministry of Programming — Technology

6 min readMay 8, 2020

Recommender Systems help users to discover relevant items. They are unavoidable in our daily online journeys:

e-commerce (e.g. Amazon);
online music, video and music platforms (e.g. YouTube, Netflix);
social media friends and contents suggestions (e.g. Facebook).

There are three major systems: content-based, collaborative filtering and hybrid methods. They usually assume that there is already some data available. In the case of content-based methods, the algorithm will pick items with similar content to recommend users, based on what they like. Collaborative filtering methods use past interactions recorded between users and items to produce new recommendations. The examples of past interactions include the actions like clicking, watching, purchasing, liking, rating, etc. Hybrid methods are a combination of the previous two.

A typical issue for collaborative filtering systems is the Cold start problem. For new users/items with no or little information, there can’t be accurate recommendations. So, what if you build your product from the very beginning? What if in the development stage you need to make item suggestions to your users in a short period of time for the MVP or Version 1? You don’t have a lot of data and you don’t have any historical actions of your users. Well, don’t panic, there is always a way!

Gift Suggestions

My team’s task was to make Gift Suggestions — a recommender system that helps a user to pick the most suitable gift for her/his client.

According to our available data sources, we decided to use two string matching algorithms combined with certain score thresholds and score calculations. Data sources for the client’s interests, client’s restrictions, product’s matchings and product’s restrictions are:

User’s manual input of her/his clients’ information, and
Chat’N’Swipe survey.

The mentioned data we use for the matching algorithm to make gift suggestions. The matching algorithm contains two string matching algorithms. We used a content-based recommender system approach. Due to the time deficit, we decided to make the algorithm in JavaScript so we can integrate the algorithm with Node.js back-end part. Node Package Manager (npm) contains the suitable libraries that include desired matching algorithms. We get training, validation and test data using data generator for products and products’ attributes in general.

The decision flow what the recommender system should include.

Chat’N’Swipe

Chat’N’Swipe is a survey that helps a user to gain a client’s personal info. It helps us to know any client’s dietary restrictions or allergies, beverages and food likings and interests. There are relationships between swipe cards.

For example, if a client doesn’t like coffee (swipe left), other coffee-related swipe cards don’t follow.

Matching algorithms

Two string matching algorithms are used:

Fuzzy String Matching, and
String similarity algorithm.

Fuzzywuzzy string matching

Fuzzywuzzy string matching is a process of finding strings that approximately match a pattern. Applications: spell-checking, DNA analysis and detection, spam detection, plagiarism detection, etc. This matching algorithm uses the Levenshtein distance.

The Levenshtein distance between two words is the smallest number of single-character edits (i.e. insertions, deletions, or substitutions) required to change one word into the other.

There are different scoring Fuzzywuzzy methods:

fuzz.ratio method calculates the edit distance between some ordering of the token in both input strings;
fuzz.partial_ratio method takes in the shortest string and matches it with all the sub-strings of the larger strings;
fuzz.token_sort_ratio method attempts to account for similar strings that are out of order.

For example:

The similarity score of two strings “Catherine M Gitau” and “Catherine Gitau” by using the methods of ratio() or partial_ratio() would be 91 and 100.

But, if we switched the order of the words in the second string, the ratio() method for “Catherine M Gitau” and “Gitau Catherine” would give score 55. Also, then partial_ratio() would give score 60. Thus, fuzz.token_sort_ratio is useful and it would give score 94.

String similarity algorithm

String similarity algorithm finds the degree of similarity between two strings, based on Dice’s Coefficient.

For example, to calculate the similarity between night and nacht, the set of bigrams in each word are {ni, ig, gh, ht} and {na, ac, ch, ht}.

The similarity between night and nacht example.

Each set has four elements, and the intersection of these two sets has only one element: ht.

The number of character bigrams found in both strings is 2.

The number of bigrams in both words is 4.

Inserting these numbers into the string similarity formula, we get the string similarity matching score: s = (1 · 2) / (4 + 4) = 0.25.

Correlations of the input parameters

Input parameters are the product’s matchings and restrictions and the client’s interests and dietary restrictions or allergies.

First, we check if the client’s and the product’s restriction strings or the client’s restriction and the product’s matching strings are matched. Matching means the algorithm gives the score over the certain threshold score. In this case, we exclude the product from the gift suggestions. The mention threshold score is the smallest score value that is enough for the algorithm to recognize certain strings, even if there are some typos. We found the threshold scores by testing the algorithm on several data sets.

On the other hand, if the client’s interests and the product’s matchings strings or the client’s interests and the product’s restrictions strings are matched, these scores are summed and included in the final matching score. Since the matching results depend on the product matching and product restriction algorithm, there is a guide for the proper Product tags.

Product tags

Product tags for some Coffee product.

“[M]” corresponds to the matching product attributes, while “[R]” corresponds to the product’s restriction attributes.

Gift suggestions — examples in the app

Products with the largest matching scores are the first suggestions. If there are no matching products for the user’s client, we show the most popular items.

Gift suggestions (second and third photo) for John Appleseed who likes cooking and art, but has tree nut and peanuts allergy (the first photo).

Well, the algorithm works! Simplification, smart resources use and good presentation are often very useful.

This article focuses on the recommender system for the product in its early stage. Please, keep in mind that this recommender system might not work for some later versions of a product. Collaborative filtering addition might be necessary.

Thanks a lot to Emin Laletovic, who reviewed this article.

References:

faker

Generate massive amounts of fake contextual data

www.npmjs.com

Fuzzy String Matching in Python

Introduction to Fuzzywuzzy in Python

towardsdatascience.com

Sørensen-Dice coefficient

The Sørensen-Dice coefficient (see below for other names) is a statistic used to gauge the similarity of two samples…

en.wikipedia.org

…

Ministry of Programming is a supercharged startup studio specialized in building startups and new products💡 We were voted in the top 1000 fastest growing companies in Europe by Financial Times. Twice.

We offer product management, design, development, and investment services to support entrepreneurs and startups towards product success.

Building your next startup? We would love to hear more. If you want to work with us on your startup feel free to reach out at — https://ministryofprogramming.com/contact/