Evaluating Experiments: How do you decide what to do next?

So you read the 5 Steps to Becoming a Data Driven Product Manager, you’ve run your first experiment, and you’ve gotten back your results. How do you know if you should launch the feature or kill it? What is a good result, or a bad one? Ultimately, what should you do next?

A quick note on best practices before you start…

  • Clearly define the test metric & goal that directly relates to your company core metrics.
  • State the trade-offs you are comfortable with, win or lose. For example, are you comfortable with a 5% increase in revenue with a 2% hit to retention?
  • Understand the number of users that must be included in the test in order to achieve statistical significance.
  • Ensure there is organizational appetite to adopt the feature if it does succeed.

How should I categorize my results?

Generally I recommend that you think about your results as “Clear Wins”, “Clear Losses”, and everything else (ambiguous results). The first two (win / loss) are the easiest to handle. Unfortunately, you’re likely to find that the “everything else” category is where many of your results land. However, no matter what your tests results in, should think and act carefully about what you do next.

Clear Wins

If the data comes back statistically significant and positive, you shouldn’t have much trouble launching your product. You already made sure that the test key metrics and goals aligned with a top company metric, so a win for your test is a win for your company.

But, it’s MVP — just a proof of concept!

That’s a great place to start! You’ve proven that this product is something that users love. This means you can now rally the troops and dedicate more resources to polish and execution. What was a successful MVP test now has the opportunity to become an amazing, delightful, polished product launch.

Clear Losses

If the data comes back statistically significant and negative, you should abandon the product as-is. Because you are measuring key metrics that will affect your company, you know that launching this will have a negative impact.

But you believe in the product, what now?

If you have other evidence (user tests, competitive research, gut), or a strong business or strategic reason to build this product — that’s OK. You don’t have to abandon it, you just need to iterate.

In this case it’s essential that you included great tracking for your test. Where were users getting stuck? Can you use the data to dig into the experience and figure out what went wrong?

Work with the test stakeholders — the analytics team, design, product and engineering — are you all in agreement that it’s not yet time to give up?

If so, you now have a clear path forward for your next test. Use your learnings to improve the product and test again.

The Ambiguous Test Result

This is the hardest test to interpret. You built an MVP product, you aligned it with core company metrics, you tested it and… well, not much happened. Now you’re stuck. You don’t have the data to drive a coalition to iterate towards the next version of the product, and you don’t have a clear loss to drive your decision to abandon.

Ask yourself, how important is the key metric to company goals?

Often a company has several top level goals that they care about. Some are long term, ever-present (i.e. retention), some are shorter term critical goals. How does your test impact one of these goals?

Can you make an argument that your test, though ambiguous because of middling results, statistical insignificance or lack of feedback, still has the potential to impact the bottom line?

For example, if revenue was the key metric you were trying to move, and overall revenue is the top-line company goal, you can frame the results such that even insignificant results with no impact to any other core metric means launch and iterate.

Some other ways I’ve found to dig into ambiguous results:

  • If you’ve run multiple versions of this test across different user bases, look at the trends. If all of them point positive (but are not significant) you may have an indication that the overall impact is positive.
  • Look for other proxy metrics. If you are trying to impact something like average session time but see no real impacts in the results, you may want to look at the number of actual sessions started. Perhaps more sessions are leading to a lowered, or no change in average. Digging into the data could illuminate misleading results.
  • Filter the data. It’s possible that your test touched so many users that the impact of the product was lost in the overall noise. Try working with your data team to filter down to only those users that saw the test product, vs. those that used the control product. When you eliminate the extraneous noise you may have an opportunity to increase significance.

Ask your gut — do you believe in this strategy?

Great product managers have a “spidey-sense” for great product. You are obsessed with great experiences and products — and this means you are uniquely qualified to fight for something you believe in.

If your gut is telling you there is something special about what you built, you should go to bat for it. Bring all the data you have (user tests, raw data, competitive analysis) to back you up, but also recognize that as a product leader your experience can guide you to making good choices.

Build a roadmap, point out how you would improve the test, and then use your product know-how to convince your team that you are right.

Finally

Building and supporting an experiment and data driven culture requires patience and persistence. You have to accept that you may be wrong 9/10 times. You will have to make difficult, intelligent choices about what to test and why (not everything should be a test, after all).

However your hard work will pay off, and you will learn that you can use experiments to celebrate your wins as often as you can avoid taking big losses.