How We Determine Product Success

At Netflix we engage in what we call consumer science: we test new ideas with real customers, at scale, and we measure for statistically significant differences in how they engage with our product. Are members staying with the service longer? Are they instantly watching more TV shows and movies from us?

As an employee, the results of these tests are more important than your confidence in what the outcome will be, what your title is, or your ability to persuade. I’ve seen even our best product minds bet wrong on such tests on occasion. We absolutely believe we couldn’t build one of the best loved internet brands in the world without consumer science at the core of our product development methodology.

Job number one for our product-focused engineers is to effectively innovate for Netflix members. The product we built in 2006 would not satisfy our members today. The best product in our market in 2015 will be far better than Netflix is today. It is our fundamental challenge to figure out what a better product can be on behalf of our members, and to build it.

Innovation involves a lot of failure. If we’re never failing, we aren’t trying for something out on the edge from where we are today. In this regard, failure is perfectly acceptable at Netflix. This wouldn’t be the case if we were operating a nuclear power plant or manufacturing cars. The only real failure that’s unacceptable at Netflix is the failure to innovate.

So if you’re going to fail, fail cheaply. And know when you’ve failed, vs. when you’ve gotten it right.

Product development at Netflix starts with a hypothesis, which typically goes something like this:

Algorithm/feature/design X will increase member engagement with our service, and ultimately member retention.

The idea may be a way to increase the relevance of our search results, a new design for device UIs, or a new feature, such as showing members what their Facebook friends are watching from Netflix. This is the crucial first step in our creative process, from which any improvement we can hope to deliver starts. Our intuition and imagination in how better to serve our members fuels our entire product development approach.

The second step is to design a test that will measure the impact of the hypothesis. Sometimes this simply means build it, but often we can build a prototype more quickly that captures the essence of the concept. Maybe the back end isn’t fully scalable; maybe it lacks polish or all of the bells and whistles we’d like to include if we roll it out for everyone.

This allows us to move quickly and gives us something we can test with our members for a positive or negative signal. There is a big lesson we’ve learned here, which is that the ideal execution of an idea can be twice as effective as a prototype, or maybe even more. But the ideal implementation is never ten times better than an artful prototype. Polish won’t turn a negative signal into a positive one. Often, the ideal execution is barely better than a good prototype, from a measurement perspective. Embracing this simple, battle-tested heuristic can free an innovator to move incredibly quickly by removing extraneous detail in the testing process.

Step three is the test itself. We roll out our prototype to a set of members, and we create an equal cohort set up as a control for the experiment. And then we wait. We let our members quietly tell us what the best product is simply by using our service. We’re always focused on increasing engagement and retention. There are, to use a technical term, zillions of other metrics we measure to understand our results in more detail. But in terms of business value, those headline metrics are what drive success for our product.

Any one test could have hundreds of thousands of members taking part. It could have two tests cells or twenty, each trying a different approach or mixing different new elements. At any one time, we’ll have dozens of different consumer tests running.

Here is where the real beauty of the approach comes in. Sometimes our hypothesis is sound, we have a winner for our members, and we add the scale and polish necessary to get our improvement out for everyone. Or, as I mentioned, maybe the idea failed. The wonderful truth is, both outcomes help our product intuition, and therefore increase the chances that our next hypothesis will knock it out of the park. Our product team has great freedom to apply their best thinking to our product, and our collective effort helps each of us improve our understanding of our members’ desires.

It is humbling to be very confident in an idea, and wind up being totally wrong. But that is how we learn and grow, and we always try and take the time to discuss and internalize what we believe the lessons are from each of the scores of tests we run each year. It is the impact of the best ideas we come up with that will determine whether our members elect to continue with our service.

It can be frustrating to be in a product development environment where force of personality or hierarchy determines product outcomes. At Netflix the focus on customer value makes a teachable moment of those times one guesses wrong. My product intuition is vastly better today for the benefit of my mistakes.

I’ll close with this last point about consumer science. Testing our product ideas frees us to make big bets, to try radical or unpopular ideas. It allows the best product thinkers to build a track record based on real customer value. It allows us to build consensus out of debate and to build on our best ideas. It helps us avoid the tyranny of “or,” because we can test many approaches to solving the hardest challenges we face.

— John Ciancutti.

See Also:

Originally published at on January 19, 2011.