Notes from NYC Media Lab’s mini-conference on personalization and recommendation technologies
Technologies driving personalization and recommendation in media will fundamentally change the way we produce, distribute and consume content. Hosted by Hearst Corporation and organized by NYC Media Lab, Personalizationpalooza featured a morning of flash presentations and discussion on technologies for personalization and recommendation. The event took place on Thursday, February 26th.
Here’s a quick, quote-driven recap of the program, which consisted of three main parts- the view from the lab, the view from startups, and the view from industry.
The view from the lab
The first session of the day featured researchers working on various problems in recommendation and personalization technologies at NYU, Columbia, Yahoo! and Tumblr. It was moderated by Glossy CEO and Parsons professor David Carroll.
Alexander Tuzhilin, Professor of Information Systems at NYU Stern School of Business, presented an intellectual history of recommendation systems, noting most of industry is presently exploiting second generation recommendation methods. To understand what will drive the third generation of recommendation and personalization technologies, Tuzhilin suggested, we need to look at what is missing in the present generation.
“What is lacking in all these efforts is deep understanding of consumer behavior and psychology and good economic theory,” he said. Tuzhilin sees the research community moving towards these questions, with future work being driven by multiple disciplines, “including machine learning, data mining, psychology, economics and also experimental design theory.”
One of the biggest problems in recommendation is how to build scalable systems. At Yahoo!, Alejandro Jaimes has worked on a machine learning frame work that uses deep learning to automatically classify and tag videos. One of the challenges to such systems is whether they can get good enough at producing metadata that is useful to users. A related problem Jaimes has worked on recently is “cold start” recommendation. This is the problem of what to do when a new user comes in and Yahoo has no information about that user. One solution is to rely on information extracted from the content to power recommendations.
One of the most difficult factors in building excellent recommendation systems is that contexts and inputs are always shifting and changing. “You could have different systems for every hour of the day essentially,” said Jaimes, meaning that any successful system will have “many dimensions in practice and a “single approach is unlikely to work.”
At Columbia, Professor Tony Jebara has recently looked at a range of recommendation methods, particularly around social signals. Among the more interesting questions his lab is pursuing today is how to “obfuscate each person’s data so that personalization never feels too creepy.” Researchers there have considered how to provide users with options on the degree of anonymity they prefer and what types of information should be accessible by systems, for instance, like Facebook. A big question is how to achieve utility while preserving privacy. “If we delete too much, then there’s less utility to the data base and then you can’t do any recommendation whatsoever,” said Jebara.
Beitao Li, search engineer at Tumblr also talked about the problem of scale. “We have 100 million posts a day,” he said, noting that doing recommendation for mobile is particularly challenging. Li talked about the importance of having the Tumblr platform engineered to allow for dynamic A/B testing and the utility of the feedback loop, particularly when the team is shipping new products and features regularly. With these systems, Tumblr can “fail fast” and “quickly experiment on lots of things and then zoom into the advances that work.”
The view from the startups
The second session of the day included a number of startups- some selling personalization and recommendation technologies to media companies, and some building their businesses around the idea of personalized content. Hearst’s Executive Director of R&D & Strategy, Barin Nahvi Rovzar, moderated the discussion.
At Sailthru, CTO and Co-founder Ian White and his team are doing three main things: “predicting future user behavior, generating recommendations that increase lifetime value and conversion and optimizing for content channel and frequency.” One thing Sailthru knows from working with different media brands is that there are “really dramatic differences in terms of the predictive propensity of users to engage.” Sailthru’s “predictive scoring can have a really dramatic impact on user’s engagement and we can make better decisions that ultimately allow us to communicate better with the individual.”
At Parse.ly, CEO Sachin Kamdar sees things similarly. “Personalization really needs to reflect what you’re trying to do as a media company,” he said. “When you think about personalization you really need to start with your audience first. You need a clear understanding of who your audience is and how that audience is going to help you reach your revenue goals and your editorial goals at your organization.” Parse.ly believes media companies avoid “recommendation blindness” by developing specific personalization techniques that match the brand.
GameChanger’s CTO Kiril Savino is building a company on the idea that highly personalized content appeals at the local level. “There are 16 million teams in the amateur sports market and we take the data generated by those teams to write recaps, stories, alerts, and other play-by-play content for every single amateur team,” said Savino. The fascinating thing about GameChanger is the extraordinary quantity of data the company is collecting, “about 10 times the entire history of recorded sports prior to our existence.” The company is currently generating 3 to 4 million articles a year entirely based on data.
Brian White says his company, ZergNet, is powering recommendations “on over 2,000 websites from major media companies.” ZergNet uses a blend of technology and human touch to drive its high-touch recommendation service. “We do use machine learning algorithms to power our results. However, we feel our secret sauce is really by putting personal and people into personalization.” The company uses editors to make crucial decisions that protect the sensibilities of media brands.
Agolo, a startup out of Columbia University, is addressing the problem of information overload. CEO Sage Wohns says “we can leverage all of that noise by listening to 3.9 million tweets a day and a 190,000 unique links to create personalized summaries for every single one of our end-users, so that they see summarized views of the interests that they choose every day.” The company is using its natural language processing and recommendation techniques to do summarization, optimization, content recommendation and business intelligence for publishers.
Finally, news startup Circa is focused on personalized, mobile-first news. CTO Arsenio Santos says Circa is built around the “notion of an atom, the idea that every fact, every sentence, every notion that goes into the narrative of the story can be its own individual piece of information.” Circa deconstructs news into concepts and offers a high degree of personalization through its follow feature, which lets users choose stories they want to keep an eye on. The goal is “creating content that can be intrinsically specific to users and has higher value.”
The view from Etsy!
Before the media industry session got going, we had the opportunity to hear from Melissa Santos, engineering manager at Etsy. We invited Melissa to offer a view on personalization and recommendation from outside of the problems of media companies, specifically. Melissa spoke about bringing the fruits of data science into all parts of the organization, and how she goes about proselytizing for her craft at Etsy. An important part of her approach is speaking in “understandable abstractions” that allow teams to talk about data in terms that make sense to them.
I asked Santos about the importance of open source to Etsy, and about the dev team’s practice of sharing its work on Code as Craft. For Etsy, being part of the open source community is about more than just “developer happiness,” she said. “It’s a part of who we are as a company and that we want to give back to the tech community when we get better for everyone. Secret sauce is not how we roll.”
The view from industry
The final panel of the morning was devoted to perspectives on personalization and recommendation technologies from big media and advertising companies. The panelists also spoke more broadly about the challenges of encouraging adoption of data driven insights in big companies.
Ray Velez, CTO of Razorfish, believes big companies are still in their infancy when it comes to personalization and recommendation. “If I go to the deli down the street twice, it does a better job of remembering me than most large enterprises,” he said. “We’re excited because the opportunity is huge.” But what is needed is a new generation of talent. “The people are missing. We need more social scientists and more computer scientists.”
At Bloomberg, Gary Kazantsev oversees a machine learning group that is focused on a range of problems, chief among them “extracting meaning from that content. This ranges from recommendation systems to text analysis. Under text analysis there’s an enormous amount of work having to do with name identity recognition, disambiguation, topic classification, sentiment and so on and so on.” One of the interesting questions Kazantsev is exploring is how to combine human judgment with machine learning. “How do you insert editorial judgment for instance, into a recommendation system for news?”
At Gravity, a company recently acquired by AOL, head of product Josef Pfeiffer tries to understand “content at a much deeper level,” where “you can start to derive a lot more meaning and actually make recommendations off implicit data.” The company was built on research “around ontologies and mapping. Basically taking a map of every word in the English vocabulary and applying that to users.” Now the company is applying its techniques both for its publisher clients and also on AOL properties. “Huffington Post at this point churns out so much content and has so many users that we figured if we could get it to work there we could get it to work almost anywhere.”
At Hearst, VP of Data Services Rick McFarland believes the main challenge to adopting new data-driven practices such as personalization and recommendation technologies is figuring out how to get teams across the business to use new tools. He looks for “collaboration engineers” that can understand the existing business and help teams implement new tools. “You can’t just go to a society that is based off of traditional methods of transport with a warp drive and say, ‘Here you go.’” It’s important to recognize that there are already processes in place and that often quite successful businesses depend on those processes.
Chris Wiggins, Chief Data Scientist at the New York Times, agreed. “To get an organization to become data-driven it’s not just about a change in toolset; it’s also about a change in mindset.” Wiggins noted that “the business of newspapering has just completely changed in the last decade because of technology transitions and the way that people get their information,” and as a result “every publisher is now a startup. Every publisher is now searching for a repeatable scalable business model.”
The panelists agreed that the power of open source tools like the ones Melissa Santos referred to, when coupled with technologies like Amazon Web Services, means data science teams have amazingly powerful tools at hand, many of which are essentially free. The ability to wield such tools is one of the things that puts data science teams out of synch with the rest of the organization. “You can get weapons grade statistical software for completely free,” noted Wiggins.
That just underscores the importance of focusing on changing mindsets, said Kazantzev. “If you’re not a statistician, to become a statistician or to be somebody who is familiar enough with statistics to be able to think in these terms is a lot more challenging than learning Python.” Acquiring the mindset to think with data is challenging for non-technical members of an organization. Chris Wiggins looks for “culture brokers,” those who understand the technologies but who also “speak the values of the organization and understand the future.”
And yet with all of the technologies in today’s tool kit, it is still difficult to know the customer at a truly personal level. “I think connecting the data is the real challenge,” said McFarland. “Really getting that connection and knowing the customer is step one before we can actually personalize to them.” Gravity’s Pfeiffer noted that “ultimately personalization should give people that uh-huh moment when they’re seeing awesome content, and they recognize that content has come and found them because of a system.”
To a question on the how to balance personalization of media with the old model of selling access to the “masses,” the panel noted that the move to branded content rather than other forms of advertising creative is in part driven by the ability of such marketing messages to reach consumers through recommendation techniques. “If you can find the right person like myself that is really interested in BMW,” noted Pfeiffer, “then I’m a great person to have that recommendation paid for to be put in front of me compared to other advertisers that are competing for it.”
To the same question, Razorfish’s Velez pointed out that the shift to mobile is a big driver. “As eyeballs shift to mobile there’s less real estate and there’s less chance to get easy reach and frequency. You need to get more personalized, more relevant, and a heck of a lot smarter.”
A theme throughout the day- from Tony Jebara’s work on systems that obfuscate identity through to questions posed at the industry- was, of course, privacy. “I think this is going to be an ongoing conversation really for the foreseeable future,” said Bloomberg’s Kazantsev. “I don’t think people have really figured out what is the intuition for privacy that we ought to have in the 21st Century.”