A call for content exploration

Joe Isaacson
4 min readApr 16, 2018

--

Facebook, Twitter, Pinterest, Netflix, YouTube, Amazon — they all curate content on their home feeds, mining my click history, watch time and purchases to flood my head with a finely tuned subset of human knowledge. Knowledge optimally crafted to fire the neurons bridging my visual and motor cortex. View thumbnail, click, consume. Repeat.

We need advances in explorative algorithms coupled with cultural overhauls to measure and reward exploration. To expose me to ideas and viewpoints I have never seen or understood will be painful. But I am stuck in a local minima, in a filter bubble.

This is a call for exploration.

I like exploring

Designing a modern feed

Given the negative press Facebook is receiving, it is easy to lay blame on executives; scapegoating them as individuals who are purposefully splitting America and showing us all politically charged stories and advertisements. But having spent some time researching and building advertising and feed ranking systems, I don’t believe they’re fully to blame. Filter bubbles are natural segmentations of human ideologies that are automatically identified and exploited by products built using machine learning, optimized to maximize short-term human engagement.

I won’t go through the inner workings of how to design such systems, since I’ve written previously on the subject and researchers with much more experience have published on them for years. But an abbreviated version: to build a feed for a consumer-web focused product there are three steps:

  1. Define short term and long term objectives. Long term objectives reflect company KPIs: revenue (from advertisers or signups), customer life time value, good content production. Short term objectives are more easily measurable in a week to month long A/B test: clicks, upvotes, downvotes, time spent watching a video, metrics which correlate with long term objectives.
  2. Collect all of the data that you can on your users, your content and your content producers. Understand user demographics, user’s preferences including friends, followed pages, tagged locations, purchases, etc. For authors or sellers on your platform learn their reputation across the platform, their target audience, their brand.
  3. Build a ranking model. Use machine learning to combine 1. and 2. by estimating the probably of a user satisfying a short term objective on a piece of content. Evaluate this probability for as much content and as many objectives as is computationally feasible given site-load-time and cost requirements. Then sort.

Filter bubbles

When evaluated by such a ranking system, content will roughly fall into one of three categories: confidently good, confidently bad or unconfident. Given that I watch videos of comedy podcasts, YouTube is confident I will like new podcasts produced by the same comedian. Given that I purchased a book on engineering management, Amazon is confident that I will not purchase the same book on engineering management. And given that I’ve never seen a post from Starbucks, Facebook will have little confidence that I either like or dislike Starbucks.

The end result: on every major web platform, I see content sorted by ranking models based on the model’s confidence that I will click on, read about, watch or purchase the content.

And yet, I see such a narrow slice of every website. There is *so much* content in the world. Facebook has billions of users and pages, Amazon millions of products, YouTube has hundreds of hours of videos uploaded every minute. The fact that I see so little, this is the filter bubble problem.

Using 1–3 above, feeds are architected by machine learners, engineers, product developers and leaders all united and driven to maximize KPIs set by the organization’s board. This unification of goals, from board meeting to ranking model is a tremendously efficient way to grow an organization. It provides clarity and direction, encapsulates the role of the team and creates objective measurements of success or failure in a launch. Systems architected around the ideals of maximizing KPIs drive hypergrowth. Netflix’s feed has driven over a billion dollars in conversion and Facebook’s nearly 13 Billion in ad revenue last quarter. It it not surprising that, given sufficient data and complex enough algorithms, machine learning systems efficiently finds a local minima: a small grouping of content that a user will confidently engage with.

But it this iteration cycle of build => test against KPI => launch fuels the growth of filter bubbles; we are only seeing these local minima. We need more exploration.

Content Exploration

The cost of explorative models in ranking can be significant; such models necessarily drive down KPIs. This is a corollary from the hypothesis that machine learning is finding optimal content to satisfy user behavior for a given KPI. As an example, injecting completely random videos into my YouTube feed, a feed which has been carefully curated to maximize my likelihood of clicking videos, will almost certainly decrease my number of clicks. I just won’t like them as much as comedy podcasts.

And there are some attempts to do exactly this. Sites such as Forgotify, a website to find forgotten tracks on Spotify or YouTube Roulette: a random approach to finding new Youtube videos or Amazon’s Stream for seeing random, trendy products. These are amusing sites which I spend a few minutes enjoying. But random sampling of content is not efficient. Efficiency necessitates knowing more about a user, about content and about our world.

Towards efficiency, I would like to see more answers to these questions:

  • What is/are the right KPI(s) for exploration?
  • How do we trade off explorative KPIs with clicks, upvotes and purchases?
  • Do there exist Pareto improvements or is it a tug of war?
  • What algorithms efficiently sample space across billions of pieces of content?

Without answering these question we cannot hope to train algorithms to show us content that is enjoyable, outside our immediate comfort zone and still beneficial to the feed’s organization. I don’t have immediate answers to these difficult questions.

I just have this call for more exploration.

Opinions expressed are solely my own and do not express the views or opinions of any past or present employer

--

--