How recommender systems make their suggestions
A short introduction to information filtering on the web
Information saturation isn’t new. In the early ’90s, “users [were] being inundated by a huge stream of incoming documents due to [the] increasing use of electronic mail”. So four researchers at the Xerox Palo Alto Research Center (PARC) decided to tackle this problem.
The group, led by David Goldberg, set up a revolutionary mail and repository system called Tapestry. It allowed you to search based on the document contents and reactions recorded from other users. For example, you could ask it to “give me all the docs containing the words ‘racing bike’ that the user ‘William’ has considered ‘excellent’”.
The experiment was predicated on the belief that “information filtering can be more effective when humans are involved in the filtering process”.
Their research paper, Using collaborative filtering to weave an information tapestry, was published in 1992. Unlike filtering systems at the time, Tapestry’s output wasn’t just computed by examining an item when it arrived, but required continuous querying of human annotations as well.
Nowadays, filtering systems are better known as recommender systems, but the idea of collaborative filtering does remain one of the driving forces behind many of them. (See Amazon, Pinterest and Spotify.)
Collaborative filtering is based on the idea that people who agree in their evaluation of certain items in the past are likely to agree again in the future.
Tapestry was a manual collaborative filtering system — you had to pick your own expert whose annotations you wanted to throw into the mix.
Presently, most collaborative filtering algorithms draw upon a neighborhood approach. In this technique, a number of peers are selected based on their similarity to you. The recommendation is then made by calculating a weighted average of the ratings of these ‘nearest neighbors’.
By showing you new stuff you’re likely to appreciate, collaborative filtering is expected to increase diversity in what you consume. However, some recommender systems using this approach might trap you in your niche neighborhood and do the opposite.
One of the challenges underlining this, is the new item problem. Collaborative filters recommend based on past annotations, so they cannot come up with a sensible prediction for items with limited historical data. This can create a rich-get-richer effect for those popular items. This bias can prevent matches between you and an item of great value and interest.
The new item problem does not limit content-based filtering, which is the other major approach to recommendations. This is because content-based filtering is based on the items’ set of descriptors or terms rather than its annotations.
In the case of Tapestry, which also used content-based filtering, the items’ set was made up of the words that occurred in each. Due to developments in natural language processing we can now generate and add keywords, entities and high-level concepts to the set as well.
Content-based filtering recommends based on a comparison between the items’ set of terms and your user profile. Your profile is represented with the same terms and built up by analyzing the content of items which have been seen by you.
Involvement of humans
As Tapestry revealed back in the ’90s, a system that combines content-based and collaborative filtering tries to take advantage of both the representation of the content as well as the similarities amongst users. This common hybrid approach combines the two outputs, or uses it independently, such as across different web modules.
Fortunately, research into recommender systems remains active. There have been great developments in the so called knowledge-based approach, where items are matched to users based on their need or task at hand. In addition, experiments to promote diversity in media are taking off.
25 years ago, Goldberg wrote that information filtering can be more effective when humans are involved in the algorithmic process. We believe that to be true now more than ever. Let’s get humans genuinely involved. Let’s work on transparency of our recommender systems, enabling extensive audience agency and ever-improving suggestions that are both meaningful and challenging.
Bibblio is a content recommendation platform that helps content businesses and publishers deliver more relevant and engaging discovery experiences to their users. Visit us on Twitter, LinkedIn and Facebook.
More juicy posts by Bibblio:
The future comes highly recommended
Why this story is not the most important — it’s the next one
Advertising vs Subscription
Clicks vs Satisfaction
Popularity vs Diversity
60 YouTube Channels that will make you smarter