Real-World Challenges And Their Solutions — Part 12

Published in

Fnplus Club

7 min readJul 29, 2019

A Practical Approach To Recommendation Systems

1. COLD START PROBLEM

Cold start problem arises when we have new users or items coming in. We cannot predict for new users as we don’t have their data to predict upon. Similarly, in case of new items, we cannot predict whom to recomment it to.

New user (what to recommend?)
New items (Whom to recommend?)

NEW USER COLD START SOLUTIONS

i. Use implicit data

ii. Use browser cookies to identify users’ sessions

iii. Geo-IP

Recommending similar items for similar geography may seem an absurd idea but it is better than recommending nothing.
Moreover, think about people buying more sweatshirts in north
than equators or think about seasonal products

iv. Recommend top sellers / do promotions

v. Interview the user

Was used by Netflix in its early days
Pinterest nailed this art

NEW ITEM COLD START SOLUTIONS

i. Just don’t worry about it

The item may show up in search results/someone may search it because of its promotions
In practice, perfectly reasonable. But some researchers proposed other
solutions listed below

ii. Content-based Recommendations

Using it alone is not a good idea
Augment user behavior data with this — Solves the cold start problem right there

iii. Map content attributes to hidden features found by matrix factorization/ deep learning and uses user behavior data as support. It is a complicated idea (eg. learnAROMA)

iv. Random Exploration

Attempt to use random slots for recommendations to gather more information about the user
Not really a good approach

2. STOPLISTS

“It is easier to offend people without intending to even do so”

Stoplists are used to not recommend unwanted or bad items to the users. There are topics too touchy for us to deal with.

In 2006, Walmart’s website paired ‘Martin Luther King’ with lots of different movies including movies like ‘Planet of the Apes’

Many didn’t like it
Walmart had to apologize and donate a significant amount of money
Walmart scrapped its entire Recommendations system (Probably people might have lost jobs!)

Based on some keywords (in title or description or category) our recommendation system shouldn’t even know the items exist! (Avoid them to get into training process)

Note: You don’t want to end up on front page of newspaper like Walmart!

Some keywords used in stoplists are related to,

Adult content
Vulgarity
Legally prohibited topics
Terrorism / Political extremism
Bereavement / Medical
Drug use
Religion

Some believe that Google’s recommendation system was “played” in the case of Donald Trump — Idiot example

3. FILTER BUBBLES

Problems that arise when you show people things that appeal to their ‘existing interests’!
For example, once, I watched videos from a channel on YouTube so much that YouTube started recommending me videos only from that channel.
It is called filter bubble because the content to be presented is filtered such that it keeps them ‘within a bubble of pre-existing interests’

SOLUTION

‘Extra’ diversity can help them get out of the bubble

4. TRANSPARENCY AND TRUST

Transparency and Trust are inter-related.

Make sure your users are familiar to at least some of your top N recommendations.
Allow transparency to gain more trust — “Why you recommended an item” must be made visible to users.
But when dealing with Matrix factorization (Latent Features) or Deep Learning, it can’t be communicated to the user why an item was recommended
Transparency is a good thing but leads to more work

5. OUTLIERS

Not only restricted to Recommendation Systems but it is a general problem. Results are as good as data you feed into the model. Filter out outliers that might skew your results in unnatural ways.

What if a bot tries to “play” with ratings impacting recommendations significantly?
Not always malicious behavior, it might be a web crawler polluting your data or your own internal tools
People who review items for a living may also affect our recommendations
Institutional buyers

6. GAMING THE SYSTEM

If recommendations cause greater purchase, makers of items look for ways to game your system into recommending their item more / not recommending others’ items.
Hacker may do it for the sake of just doing it out of amusement
Eg. Google Bombs

SOLUTIONS

i. Only consider people who spent real money on an item. Voting with wallets is a strong indication of interest

ii. If no purchase data, you can still take precautions
—For example, use star review from people who purchased/consumed items.

iii. If you allow people to rate items they actually haven’t seen/used — you are open to attack!

iv. Using click data should be your last choice. As it is trivially easy to fake it.

“Always be wary of click-data”

Even if not from bots, click-data has its own set of problems.

7. IMPLICIT DATA — EXPLICIT PROBLEMS

Data similar to click-stream data or clicks on images which have been produced because of user’s consumption of services is termed as Implicit data.

Implicit data is fraught with problems
Be extremely skeptical about building a recommendation system that solely depends on click data It is experienced advice that click-data is
— Susceptible to gaming
— Has human behavior problems

It was clearly mentioned by Frank Kane, one of Amazon’s pioneers in the field that “If you ever build a system that recommends products based on product images that people click on when they see them in the online ad; I promise you that what you build will end up as pornography detection system. There won’t be anything you can do about it even if you try to filter sexual content, you will be lead to things with sexual anatomy!” This is a piece of experienced advice and he mentioned that he has seen this happen more than once!!

NEVER BUILD RECOMMENDATION SYSTEM PURELY BASED ON IMAGE CLICKS

IMPLICIT DATA IN GENERAL

Tends to be of low quality
Unless backed by purchase/actual consumption it will mess up recommendations(Eg. YouTube uses ONLY implicit data)

CLICKSTREAM DATA

Unreliable signal of interest
What users click and what users buy is completely different!

8. INTERNATIONAL LAWS AND MARKETS

Take cultural or geographical differences in account while recommending items
Use filters based on licensing or availability
Take privacy laws into account as well

9. DEALING WITH TIME

“Temporal effects of value”

Effects of time is an under-represented topic in recommendations system. For example, recommending Christmas items just before Christmas or summer wear in summer draws huge amounts of profit.

Open topic in the research area. This is a good topic to publish papers on
You can’t easily or generally take ‘Recency of Rating’ into account. Netflix confirmed that these kinds of temporal dynamics are more important. For example, someone’s tastes collected yesterday are better indications of that person’s taste compared to his tastes collected the previous year.
Weigh ratings by their age by some sort of exponential decay. This leads to improvement of quality of recommendations. Otherwise, alternatively, use Rating Recency as a training feature on its own in addition to the rating itself.
Time/place (collected via Geo/IP) also affect recommendations. Historical rating data has a bias towards past
For example, Netflix that don’t take time into account will recommend old shows or movies instead of hot new ones their users want to see
If items are time-sensitive, use Rating Recency

10. VALUE AWARE RECOMMENDATIONS

Recommendation Systems exist for increasing profits. You may be asked to optimize Recommendation System for pure profit instead of relevance. This leads us to our new problem — What to concentrate more on? Profits or relevance?

SOLUTION:

Use profitability as a tie-breaker(“Position Bias”)
Optimizing too much for profit can be a backfire!