Highlights from PRS2016 workshop
Personalized recommendations and search are the primary ways Netflix members find great content to watch. We’ve written much about how we build them and some of the open challenges. Recently we organized a full-day workshop at Netflix on Personalization, Recommendation, and Search (PRS2016), bringing together researchers and practitioners in these three domains. It was a forum to exchange information, challenges and practices, as well as strengthen bridges between these communities. Seven invited speakers from the industry and academia covered a broad range of topics, highlighted below. We look forward to hearing more and continuing the fascinating conversations at PRS2017!
Semantic Entity Search, Maarten de Rijke, University of Amsterdam
Entities, such as people, products, organizations, are the ingredients around which most conversations are built. A very large fraction of the queries submitted to search engines revolve around entities. No wonder that the information retrieval community continues to devote a lot of attention to entity search. In this talk Maarten discussed recent advances in entity retrieval. Most of the talk was focused on unsupervised semantic matching methods for entities that are able to learn from raw textual evidence associated with entities alone. Maarten then pointed out challenges (and partial solutions) to learn such representations in a dynamic setting and to learn to improve such representations using interaction data.
Combining matrix factorization and LDA topic modeling for rating prediction and learning user interest profiles, Deborah Donato, StumbleUpon
Matrix Factorization through Latent Dirichlet Allocation (fLDA) is a generative model for concurrent rating prediction and topic/persona extraction. It learns topic structure of URLs and topic affinity vectors for users, and predicts ratings as well. The fLDA model achieves several goals for StumbleUpon in a single framework: it allows for unsupervised inference of latent topics in the URLs served to users and for users to be represented as mixtures over the same topics learned from the URLs (in the form of affinity vectors generated by the model). Deborah presented an ongoing effort inspired by the fLDA framework devoted to extend the original approach to an industrial environment. The current implementation uses a (much faster) expectation maximization method for parameter estimation, instead of Gibbs sampling as in the original work and implements a modified version of in which topic distributions are learned independently using LDA prior to training the main model. This is an ongoing effort but Deborah presented very interesting results.
Exploiting User Relationships to Accurately Predict Preferences in Large Scale Networks, Jennifer Neville, Purdue University
The popularity of social networks and social media has increased the amount of information available about users’ behavior online — including current activities and interactions among friends and family. This rich relational information can be used to predict user interests and preferences even when individual data is sparse, since the characteristics of friends are often correlated. Although relational data offer several opportunities to improve predictions about users, the characteristics of online social network data also present a number of challenges to accurately incorporate the network information into machine learning systems. This talk outlined some of the algorithmic and statistical challenges that arise due to partially-observed, large-scale networks, and describe methods for semi-supervised learning and active exploration that address the challenges.
Why would you recommend me THAT!?, Aish Fenton, Netflix
With so many advances in machine learning recently, it’s not unreasonable to ask: why aren’t my recommendations perfect by now? Aish provided a walkthrough of the open problems in the area of recommender systems, especially as they apply to Netflix’s personalization and recommendation algorithms. He also provided a brief overview of recommender systems, and sketched out some tentative solutions for the problems he presented.
Diversity in Radio, David Ross, Google
Many services offer streaming radio stations seeded by an artist or song, but what does that mean? To get specific, what fraction of the songs in “Taylor Swift Radio” should be by Taylor Swift? David provided a short introduction to the YouTube Radio project, and dived into the diversity problem, sharing some insights Google has learned from live experiments and human evaluations.
Immersive Recommendation Using Personal Digital Traces, Deborah Estrin and Andy Hsieh, Cornell Tech
From topics referred to in Twitter or email, to web browser histories, to videos watched and products purchased online, our digital traces (small data) reflect who we are, what we do, and what we are interested in. In this talk, Deborah and Andy presented a new user-centric recommendation model, called Immersive Recommendation, that incorporate cross-platform, diverse personal digital traces into recommendations. They discussed techniques that infer users’ interests from personal digital traces while suppressing context-specified noise replete in these traces, and propose a hybrid collaborative filtering algorithm to fuse the user interests with content and rating information to achieve superior recommendation performance throughout a user’s lifetime, including in cold-start situations. They illustrated this idea with personalized news and local event recommendations. Finally they discussed future research directions and applications that incorporate richer multimodal user-generated data into recommendations, and the potential benefits of turning such systems into tools for awareness and aspiration.
Click-through and conversion rates estimation are two core predictions tasks in display advertising. Olivier presented a machine learning framework based on logistic regression that is specifically designed to tackle the specifics of display advertising. The resulting system has the following characteristics: it is easy to implement and deploy; it is highly scalable (they have trained it on terabytes of data); and it provides models with state-of-the-art accuracy. Olivier described how the system uses explore/exploit machinery to constantly vary and evolve its predictive model on live streaming data.
We hope you find these presentations stimulating. We certainly did, and look forward to organizing similar events in the future! If you’d like to help us tackle challenges in these areas, to help our members find great stories to enjoy, checkout our job postings.
Originally published at techblog.netflix.com on May 5, 2016.