Recommender systems

With the incredible growth of World Wide Web, and the great number of products and diversity of services proposed, buyers need to be guided through their experience, and companies need to sell more and more items. Therefore, we naturally find that Recommender systems are the most successful and most popular applications of data science in business, since it help companies to stick to customers preferences and maximize their ARPU.
Recommender systems (RS) are self-explanatory algorithms that leverage historical data to recommend or suggest a particular product, service, person or entity, by inferring correlation strength between them. RS is used by all industries domains to improve their customer-centric approach and better serve their clients, by:
- Boosting products sales using next best offer techniques for Up-selling, cross-selling and Market Basket Analysis for hypermarkets and E-commerce websites (Walmart, Carrefour, Amazon, Ebay, …)
- Guessing customers preferences: news, music, video, movies, … (Spotify, Youtube, Netflix, …)
- Assessing people affinity and adequacy: friendship, dating, job recruiting, … (Facebook, Match, linkedIn)
What are input data of Recommender system:

Recommendation systems types:

I. Memory-based RS:
Memory-based RS calculate similarity between users and items using neighborhood techniques (similarity measures). The most common ones are Euclidian distance, Cosine distance and Pearson correlation coefficient.

Memory-based RS algorithms are:
· Collaborative filtering: basic form of RS that uses only “users ratings of items“
o Item-based CF: Base recommendation on relationship between items to identify those who are bought together, we can summaries the algorithm in 3 steps:
- Transform user-item ratings into a matrix
- Calculate cosine similarity for each pairs of items to populate item-to-item similarity matrix
- Predict rating to find the top similar items to the non-rated items by active user and recommend them

o User-based Filtering: Base recommendation on relationship between users (people who have the same preferences will buy the same items), we can summaries the algorithm in 3 macro steps:
- Build a matrix of things for each user [items/(bought, viewed, rated)]
- Calculate similarity scores between users using Euclidean distance
- Provide recommendation based on the scoring given by similar users on the items not rated by the active user.
· Content-based RS: in addition to “users ratings of items“, it uses also “item properties” to build an Recommendation system that recommend items to customer X that are similar to previous items rated highly by X, in a 3 macro steps:
- Build Item-profile in vector form where vector elements are features that can be binary or float (Examples: Movies features [action, romance, horror, Fantasy, Drama], Series features [actor1, actor2, actor3, actor4, ..], home appliance features [color, electric/manual, usage, size])
- Infer User-profile by adding the vectors of the items purchased by the user and computing the average vector = User-profile
- Calculate cosine similarity between user-profile A and Item-profile b : cos(Ѳ)= A.b/||A||.||b||. The smaller Ѳ the more likely user A will like item b.

- Context-aware Filtering: in addition to “users ratings of items“ and “item properties”, it takes the context information into account, such as time, location, weather, persona, social media and so on, to provide a better recommendations. In the image below, we add weather as a context information.

- Hybrid filtering: Combine collaborative filtering and content /context-based methods, to build a more robust RS, like the weighted method, which is a linear combination of weighted RS.

Memory-based Shortcomings:
Memory-based Recommendation systems has performed very well over the years, however, they have 2 main shortcomings, so they are :
- Computationally expensive: user-item or item-item matrix are loaded in-memory for similarity calculations
- Cold Start problem: fails to recommend to the first-time users and items.
II. Model-based recommender system:
model-based techniques use training stage to learn parameters or patterns using optimization algorithm like gradient descent without having to use the whole dataset each time. Instead of calculating the whole matrix to estimate user preferences, we use approximate nearest neighbor search by adopting machine learning algorithms such as neural network as a function approximator. Therefore, it offers the benefits of both speed and scalability.
Conclusion:
Thanks to the good results and benefits of recommender systems, large companies invest a lot in research field to improve more and more the accuracy of there recommendation engines, which seems to be a very promising.
