Real-World Challenges And Their Solutions — Part 12
A Practical Approach To Recommendation Systems
Table of Contents :
- Introduction and Recommendation Framework
- Evaluating Recommendation Systems
- Content-Based Recommendations
- Neighborhood Based Collaborative Filtering
- User and Item Based Collaborative Filtering
- KNN Recommendations
- Matrix Factorisation
- Deep Learning — Introduction
- Restricted Boltzmann Machines
- AutoRecs
- Amazon DSSTNE and Sage Maker
- Real-World Challenges and Solutions
Not only for Recommendation Systems but in general, we can keep on going to tell about real world challenges. Some of the most common and important chanllenges have been listed below with their solutions.
We will be discussing,
- Cold Start Problem
- Stoplists
- Filter Bubbles
- Transparency and Trust
- Outliers
- Gaming the system
- Implicit data — Explicit problems
- International laws and Markets
- Dealing with time
- Value aware recommendations
1. COLD START PROBLEM
Cold start problem arises when we have new users or items coming in. We cannot predict for new users as we don’t have their data to predict upon. Similarly, in case of new items, we cannot predict whom to recomment it to.
- New user (what to recommend?)
- New items (Whom to recommend?)
NEW USER COLD START SOLUTIONS
i. Use implicit data
ii. Use browser cookies to identify users’ sessions
iii. Geo-IP
- Recommending similar items for similar geography may seem an absurd idea but it is better than recommending nothing.
- Moreover, think about people buying more sweatshirts in north
than equators or think about seasonal products
iv. Recommend top sellers / do promotions
v. Interview the user
- Was used by Netflix in its early days
- Pinterest nailed this art
NEW ITEM COLD START SOLUTIONS
i. Just don’t worry about it
- The item may show up in search results/someone may search it because of its promotions
- In practice, perfectly reasonable. But some researchers proposed other
solutions listed below
ii. Content-based Recommendations
- Using it alone is not a good idea
- Augment user behavior data with this — Solves the cold start problem right there
iii. Map content attributes to hidden features found by matrix factorization/ deep learning and uses user behavior data as support. It is a complicated idea (eg. learnAROMA)
iv. Random Exploration
- Attempt to use random slots for recommendations to gather more information about the user
- Not really a good approach
2. STOPLISTS
“It is easier to offend people without intending to even do so”
Stoplists are used to not recommend unwanted or bad items to the users. There are topics too touchy for us to deal with.
In 2006, Walmart’s website paired ‘Martin Luther King’ with lots of different movies including movies like ‘Planet of the Apes’
- Many didn’t like it
- Walmart had to apologize and donate a significant amount of money
- Walmart scrapped its entire Recommendations system (Probably people might have lost jobs!)
Based on some keywords (in title or description or category) our recommendation system shouldn’t even know the items exist! (Avoid them to get into training process)
Note: You don’t want to end up on front page of newspaper like Walmart!
Some keywords used in stoplists are related to,
- Adult content
- Vulgarity
- Legally prohibited topics
- Terrorism / Political extremism
- Bereavement / Medical
- Drug use
- Religion
Some believe that Google’s recommendation system was “played” in the case of Donald Trump — Idiot example
3. FILTER BUBBLES
- Problems that arise when you show people things that appeal to their ‘existing interests’!
For example, once, I watched videos from a channel on YouTube so much that YouTube started recommending me videos only from that channel. - It is called filter bubble because the content to be presented is filtered such that it keeps them ‘within a bubble of pre-existing interests’
SOLUTION
‘Extra’ diversity can help them get out of the bubble
4. TRANSPARENCY AND TRUST
Transparency and Trust are inter-related.
- Make sure your users are familiar to at least some of your top N recommendations.
- Allow transparency to gain more trust — “Why you recommended an item” must be made visible to users.
- But when dealing with Matrix factorization (Latent Features) or Deep Learning, it can’t be communicated to the user why an item was recommended
- Transparency is a good thing but leads to more work
5. OUTLIERS
Not only restricted to Recommendation Systems but it is a general problem. Results are as good as data you feed into the model. Filter out outliers that might skew your results in unnatural ways.
- What if a bot tries to “play” with ratings impacting recommendations significantly?
- Not always malicious behavior, it might be a web crawler polluting your data or your own internal tools
- People who review items for a living may also affect our recommendations
- Institutional buyers
6. GAMING THE SYSTEM
- If recommendations cause greater purchase, makers of items look for ways to game your system into recommending their item more / not recommending others’ items.
- Hacker may do it for the sake of just doing it out of amusement
- Eg. Google Bombs
SOLUTIONS
i. Only consider people who spent real money on an item. Voting with wallets is a strong indication of interest
ii. If no purchase data, you can still take precautions
—For example, use star review from people who purchased/consumed items.
iii. If you allow people to rate items they actually haven’t seen/used — you are open to attack!
iv. Using click data should be your last choice. As it is trivially easy to fake it.
“Always be wary of click-data”
Even if not from bots, click-data has its own set of problems.
7. IMPLICIT DATA — EXPLICIT PROBLEMS
Data similar to click-stream data or clicks on images which have been produced because of user’s consumption of services is termed as Implicit data.
- Implicit data is fraught with problems
- Be extremely skeptical about building a recommendation system that solely depends on click data It is experienced advice that click-data is
— Susceptible to gaming
— Has human behavior problems
It was clearly mentioned by Frank Kane, one of Amazon’s pioneers in the field that “If you ever build a system that recommends products based on product images that people click on when they see them in the online ad; I promise you that what you build will end up as pornography detection system. There won’t be anything you can do about it even if you try to filter sexual content, you will be lead to things with sexual anatomy!” This is a piece of experienced advice and he mentioned that he has seen this happen more than once!!
NEVER BUILD RECOMMENDATION SYSTEM PURELY BASED ON IMAGE CLICKS
IMPLICIT DATA IN GENERAL
- Tends to be of low quality
- Unless backed by purchase/actual consumption it will mess up recommendations(Eg. YouTube uses ONLY implicit data)
CLICKSTREAM DATA
- Unreliable signal of interest
- What users click and what users buy is completely different!
8. INTERNATIONAL LAWS AND MARKETS
- Take cultural or geographical differences in account while recommending items
- Use filters based on licensing or availability
- Take privacy laws into account as well
9. DEALING WITH TIME
“Temporal effects of value”
Effects of time is an under-represented topic in recommendations system. For example, recommending Christmas items just before Christmas or summer wear in summer draws huge amounts of profit.
- Open topic in the research area. This is a good topic to publish papers on
- You can’t easily or generally take ‘Recency of Rating’ into account. Netflix confirmed that these kinds of temporal dynamics are more important. For example, someone’s tastes collected yesterday are better indications of that person’s taste compared to his tastes collected the previous year.
- Weigh ratings by their age by some sort of exponential decay. This leads to improvement of quality of recommendations. Otherwise, alternatively, use Rating Recency as a training feature on its own in addition to the rating itself.
- Time/place (collected via Geo/IP) also affect recommendations. Historical rating data has a bias towards past
For example, Netflix that don’t take time into account will recommend old shows or movies instead of hot new ones their users want to see - If items are time-sensitive, use Rating Recency
10. VALUE AWARE RECOMMENDATIONS
Recommendation Systems exist for increasing profits. You may be asked to optimize Recommendation System for pure profit instead of relevance. This leads us to our new problem — What to concentrate more on? Profits or relevance?
SOLUTION:
- Use profitability as a tie-breaker(“Position Bias”)
- Optimizing too much for profit can be a backfire!