(Souce: Joshua Hook)

Not only for Recommendation Systems but in general, we can keep on going to tell about real world challenges. Some of the most common and important chanllenges have been listed below with their solutions.

We will be discussing,

  • Cold Start Problem
  • Stoplists
  • Filter Bubbles
  • Transparency and Trust
  • Outliers
  • Gaming the system
  • Implicit data — Explicit problems
  • International laws and Markets
  • Dealing with time
  • Value aware recommendations

1. COLD START PROBLEM

Cold start problem arises when we have new users or items coming in. We cannot predict for new users as we don’t have their data to predict upon. Similarly, in case of new items, we cannot predict whom to recomment it to.

(Source: Yuspify)
  • New user (what to recommend?)
  • New items (Whom to recommend?)

NEW USER COLD START SOLUTIONS

i. Use implicit data

ii. Use browser cookies to identify users’ sessions

iii. Geo-IP

  • Recommending similar items for similar geography may seem an absurd idea but it is better than recommending nothing.
  • Moreover, think about people buying more sweatshirts in north
    than equators or think about seasonal products

iv. Recommend top sellers / do promotions

v. Interview the user

  • Was used by Netflix in its early days
  • Pinterest nailed this art

NEW ITEM COLD START SOLUTIONS

i. Just don’t worry about it

  • The item may show up in search results/someone may search it because of its promotions
  • In practice, perfectly reasonable. But some researchers proposed other
    solutions listed below

ii. Content-based Recommendations

  • Using it alone is not a good idea
  • Augment user behavior data with this — Solves the cold start problem right there

iii. Map content attributes to hidden features found by matrix factorization/ deep learning and uses user behavior data as support. It is a complicated idea (eg. learnAROMA)

iv. Random Exploration

  • Attempt to use random slots for recommendations to gather more information about the user
  • Not really a good approach

2. STOPLISTS

“It is easier to offend people without intending to even do so”

Stoplists are used to not recommend unwanted or bad items to the users. There are topics too touchy for us to deal with.

(Source: Juggler decisions)

In 2006, Walmart’s website paired ‘Martin Luther King’ with lots of different movies including movies like ‘Planet of the Apes’

  • Many didn’t like it
  • Walmart had to apologize and donate a significant amount of money
  • Walmart scrapped its entire Recommendations system (Probably people might have lost jobs!)

Based on some keywords (in title or description or category) our recommendation system shouldn’t even know the items exist! (Avoid them to get into training process)

Note: You don’t want to end up on front page of newspaper like Walmart!

Some keywords used in stoplists are related to,

  • Adult content
  • Vulgarity
  • Legally prohibited topics
  • Terrorism / Political extremism
  • Bereavement / Medical
  • Drug use
  • Religion

Some believe that Google’s recommendation system was “played” in the case of Donald Trump — Idiot example

3. FILTER BUBBLES

(Source: Immediatetrust.co.uk)
  • Problems that arise when you show people things that appeal to their ‘existing interests’!
    For example, once, I watched videos from a channel on YouTube so much that YouTube started recommending me videos only from that channel.
  • It is called filter bubble because the content to be presented is filtered such that it keeps them ‘within a bubble of pre-existing interests’

SOLUTION

Extra’ diversity can help them get out of the bubble

4. TRANSPARENCY AND TRUST

Transparency and Trust are inter-related.

  • Make sure your users are familiar to at least some of your top N recommendations.
  • Allow transparency to gain more trust — “Why you recommended an item” must be made visible to users.
  • But when dealing with Matrix factorization (Latent Features) or Deep Learning, it can’t be communicated to the user why an item was recommended
  • Transparency is a good thing but leads to more work

5. OUTLIERS

Not only restricted to Recommendation Systems but it is a general problem. Results are as good as data you feed into the model. Filter out outliers that might skew your results in unnatural ways.

  • What if a bot tries to “play” with ratings impacting recommendations significantly?
  • Not always malicious behavior, it might be a web crawler polluting your data or your own internal tools
  • People who review items for a living may also affect our recommendations
  • Institutional buyers

6. GAMING THE SYSTEM

  • If recommendations cause greater purchase, makers of items look for ways to game your system into recommending their item more / not recommending others’ items.
  • Hacker may do it for the sake of just doing it out of amusement
  • Eg. Google Bombs

SOLUTIONS

i. Only consider people who spent real money on an item. Voting with wallets is a strong indication of interest

ii. If no purchase data, you can still take precautions
—For example, use star review from people who purchased/consumed items.

iii. If you allow people to rate items they actually haven’t seen/used — you are open to attack!

iv. Using click data should be your last choice. As it is trivially easy to fake it.

“Always be wary of click-data”

Even if not from bots, click-data has its own set of problems.

7. IMPLICIT DATA — EXPLICIT PROBLEMS

Data similar to click-stream data or clicks on images which have been produced because of user’s consumption of services is termed as Implicit data.

  • Implicit data is fraught with problems
  • Be extremely skeptical about building a recommendation system that solely depends on click data It is experienced advice that click-data is
    — Susceptible to gaming
    — Has human behavior problems

It was clearly mentioned by Frank Kane, one of Amazon’s pioneers in the field that “If you ever build a system that recommends products based on product images that people click on when they see them in the online ad; I promise you that what you build will end up as pornography detection system. There won’t be anything you can do about it even if you try to filter sexual content, you will be lead to things with sexual anatomy!” This is a piece of experienced advice and he mentioned that he has seen this happen more than once!!

NEVER BUILD RECOMMENDATION SYSTEM PURELY BASED ON IMAGE CLICKS

IMPLICIT DATA IN GENERAL

  • Tends to be of low quality
  • Unless backed by purchase/actual consumption it will mess up recommendations(Eg. YouTube uses ONLY implicit data)

CLICKSTREAM DATA

  • Unreliable signal of interest
  • What users click and what users buy is completely different!

8. INTERNATIONAL LAWS AND MARKETS

(Source: Forbes)
  • Take cultural or geographical differences in account while recommending items
  • Use filters based on licensing or availability
  • Take privacy laws into account as well

9. DEALING WITH TIME

“Temporal effects of value”

Effects of time is an under-represented topic in recommendations system. For example, recommending Christmas items just before Christmas or summer wear in summer draws huge amounts of profit.

  • Open topic in the research area. This is a good topic to publish papers on
  • You can’t easily or generally take ‘Recency of Rating’ into account. Netflix confirmed that these kinds of temporal dynamics are more important. For example, someone’s tastes collected yesterday are better indications of that person’s taste compared to his tastes collected the previous year.
  • Weigh ratings by their age by some sort of exponential decay. This leads to improvement of quality of recommendations. Otherwise, alternatively, use Rating Recency as a training feature on its own in addition to the rating itself.
  • Time/place (collected via Geo/IP) also affect recommendations. Historical rating data has a bias towards past
    For example, Netflix that don’t take time into account will recommend old shows or movies instead of hot new ones their users want to see
  • If items are time-sensitive, use Rating Recency

10. VALUE AWARE RECOMMENDATIONS

Recommendation Systems exist for increasing profits. You may be asked to optimize Recommendation System for pure profit instead of relevance. This leads us to our new problem — What to concentrate more on? Profits or relevance?

SOLUTION:

  • Use profitability as a tie-breaker(“Position Bias”)
  • Optimizing too much for profit can be a backfire!

--

--