How Netflix does A/B Testing
Jessie Chen

Matt Chessen’s reply is very much on point. I started my own reply diving into a discussion of some very important aspects of A/B testing that you don’t mention and perhaps Netflix didn’t cover, but then realized it was more the length of a white paper than a comment. Rather than go into that level of detail, I’ll drop a few hints:

  1. First, I looooove A/B testing. I’m glad you’ve discovered the scientific method and A/B testing is a terrific application of it. I hope you undertake to really start examining scientifically controlled studies of human behavior, it’s perhaps the most fun part of UX for me personally., But A/B testing is a technique, not a panacea, and it’s far easier for people to design studies that accidentally provide misleading results than to get real insights. That part of study design is complex and requires an entirely different and much more nuanced set of practices than are required by the subsequent, relatively simple A/B execution process.
  2. My personal experience, and a casual and entirely unscientific poll of acquaintances, indicates that Netflix’s home page has become something of an ever-shifting, inconsistent nightmare, taking away valuable discovery features and never presenting the same options twice. I wouldn’t be surprised if users are building a slow-burn store of resentment and frustration over this tinkering.
  3. I spend more time on the home page than I used to because of this, looking longer to find what I want, with less satisfying results.
  4. If Netflix believes that more time on home page = engagement = success, they have let some data scientists with poor understanding of human factors run the show.

Erosion of customer loyalty begins long before the numbers start to reflect it. If you have a million customers who love you, they aren’t looking around for alternatives — and that presents a real obstacle to potential competitors. But if you have a million customers who are kind of sick of you but haven’t had another good option, they are looking—and when a good alternative arrives, you suddenly discover that you’ve squandered away that obstacle to competition because you assumed that if they stayed customers, they were happy.

(See: Microsoft, RIM, and every near-monopoly that had its lunch eaten because their numbers stayed high while their customers were getting parachutes ready.)

The article I’d really like to see is one that speaks to what kinds of qualitative work Netflix does with users, and how that is integrated with this quantitative work into a 360° understanding of how their customers actually feel about each of the options offered in the tests, and more importantly, what a longitudinal study would reveal about the effects of the constant experimentation on their experience of and attitudes toward Netflix.

On a purely anecdotal basis, I can say that I’m still with Netflix, and my viewing time stats may be telling them I’m solidly in their camp. But as soon as someone with a comparable library comes around, offers some of the basic affinity discovery features that Netflix has taken away, and Netflix’s original programming drops in quality or relevance to me, I’ll happily jump ship and their home page tinkering will be a significant part of it.