Applying the Scientific Method to Product Management

Danny Gold
Path to Product
Published in
8 min readAug 8, 2017
Credit xkcd

It can be daunting to read about how successful companies like Amazon and Facebook are continually running user tests and collecting hordes of data. Where do you start? Is this kind of experimentation even possible at a typical company?

The good news is that if you start experimenting, even on a small problem, you can build this type of rapid testing into your company culture. The next time you’re in a meeting and there’s a discussion, debate, or all out argument, bring up the question: “How can we test this?” You can change the momentum from problem solving to problem understanding, then move to experimentation to find the right solution.

Here, we’ll walk through how to apply the scientific method and all its parts — hypothesis generation, experimentation, and measurement — to product initiatives.

The Hypothesis

From Wikipedia: a hypothesis is a proposed explanation for a phenomenon.

Perfect, let’s stop there. For product purposes, let’s substitute “problem” for “phenomenon.” Writing a hypothesis helps us gain a clear understanding of the problem we’re observing so that we can create a set of experiments, and test potential solutions. Our hypothesis answers the question: “What problem is happening and what might be causing it?”

Here’s an example from my company, which makes tools for Product Managers. We noticed that PMs often struggle to identify the most important items in their roadmap at any given time. It’s hard to know the objective “value” of a given initiative. Most prioritization schemes boil down to a cost vs reward analysis where you look for low cost initiatives that you expect to generate high rewards. But how do you make an educated guess at what will be a high reward (and high value) initiative?

Our hypothesis is that Product Managers struggle to clearly establish and defend their priorities because it is difficult to aggregate and integrate multiple perspectives on the value of an initiative. Gathering and sifting through unique insights from diverse groups such as Sales, Marketing, and Engineering can be a huge hurdle. Down the line, not being sure of that value estimate can lead to unease in the organization if everyone doesn’t understand the logic behind the priorities.

Let’s use the hypothesis definition and break this down into two parts:

What phenomenon(problem) have we observed?

  • Product Managers have trouble estimating the value of initiatives.Product Managers have trouble estimating the value of initiatives.
  • Organizations struggle to understand how items are prioritized.

What could cause it?

  • Limited collaboration inside the organization on estimating value. (This is the cause that we felt was most likely, based on our experience.)
  • Limited collaboration with customers on estimating value.
  • Limited communication about “why” an initiative was deemed high value.

This allowed us to hone in on our hypothesis:

Product Managers have trouble estimating the value of initiatives due to limited insight from internal groups such as Sales, Marketing, Engineering and Customer Success.

Keep in mind — this doesn’t have to be the ONLY explanation for the problem, we’re picking one we feel is most likely and experimenting to gather data to help us identify whether we’re on track or not. We may scrap this hypothesis and go back to find other likely causes in the future.

The Experiment

Now that we’ve framed our hypothesis, we can begin working through possible solutions to help Product Managers gain confidence in the estimates they’re using to prioritize the roadmap. One way to start is to take all of the statements the team has made that start with ‘I think”, “We should”, “If we” and turn them around to be “What If” statements. This has the power to remind us we don’t know what the outcome will be, but if we experiment, we can find out.

For our hypothesis this became:

  • What if we sent out Google surveys every time a new roadmap item was added?
  • What if we reminded the product manager to meet with specific teams periodically?
  • What if we built a Slack bot to gather estimates from the team?
  • What if we built an AppleTV app to gamify value estimation in a meeting?

We decided the Slack bot best met our criteria — low friction interaction with a large number of team members in the organization. For our first pass, we’re going to break down our value estimate into two data points for each initiative a Product Manager is considering:

  • What is the expected impact of this initiative on the customer problems we’re seeking to address?
  • What is the expected impact of this initiative on our business goals?

Our Slack bot will generate a survey asking the team to estimate the impact in these areas on a scale of 1–5. We’ll also allow for free form text responses to the question “Why do you think the impact is a (3)?”.

Screenshot from early build of our Slack bot

We’ll then present this information to the Product Manager in an easy to consume interface with the range of estimate scores, and the detailed text responses from each participant. The goal is to give them access to opinions and insights they may not normally think to include when prioritizing the roadmap.

Early mockup of the review interface for Product Managers

We have our experiment! Using the template below we’re able to crease a concise overview of our experiment.

If we <try a solution> we can <expect an outcome>.

If we gather value estimates from team members with a Slack bot, we can improve the breadth of data a Product Manager uses to prioritize initiatives.

Measuring the Outcome

We have one more step to get from here to an experiment that is ready to run. How will we know if we’re onto something? We need a way to measure the desired outcome.

This can get difficult and is often the biggest hurdle in adopting experimentation. Measurement can be daunting, especially if the organization is used to running on emotion and gut feel. Unfortunately, measuring the emotions of the executive team won’t make our product more or less successful.

Your measurements should initially focus on user behavior. The business objectives are most likely related to revenue, retention, or acquisition. These metrics are extremely important, but they can be very difficult to measure quickly enough to make decisions. You need to find early indicators that point to the likelihood of achieving these business objectives. The key is to come up with usage patterns that you feel would indicate that users are driving towards the desired outcome.

For us, the assumption is that with more data from different teams at a Product Manager’s disposal, they’ll have a better view of the expected value of an initiative, which will lead to better prioritization decisions. In order for this to happen, we need to provide them with data they didn’t have before, and we need to track if this data impacts their decisions. We want to be able to experiment and tweak along the way and know if we have a chance of getting there. We can come back and measure the business metrics later if the experiment is an initial success based on user behavior.

Now we just need to come up with specific numbers for these metrics. This is a bit more art than science when you’re first getting started with metrics. If you have a solid baseline and your organization gathers analytics, you can get very fine-grained here. If you’re just starting out, I’d recommend breaking things into 1/4ths and looking at the user path as a funnel. As a ballpark starting point, you may need 75% of users to start using the feature at the top of the funnel to get 25% to complete the action. We used that technique to arrive at our metrics, stair stepping down the funnel from start of action to completion.

  • 75% of initiatives get sent out for estimates via the Slack bot
  • 60% of team members respond to the Slack bot
  • After being presented with team member estimates, Product Managers update their estimate 30% of the time
  • 25% of initiatives are tabled after receiving low value estimates from team members
  • 25% of initiatives are moved forward after receiving high value estimates from team members

The key here is to be less married to these metrics in the earlier stages of your experiment until you gather data. Is 75% the right number? Will 50% of users that start this feature drop off? Can we improve those numbers or are they just the way the world is? The truth is, you won’t know the answers to these questions until you start looking at the data and talking to users. One way to look at these metrics is that they’re each the start of a conversation — and discussing data is much more powerful than discussing opinions.

We now have a complete experiment ready to go.

Hypothesis:

“Product Managers have trouble estimating the value of initiatives due to limited insight from internal groups such as Sales, Marketing, Engineering and Customer Success.”

Experiment:

“If we gather value estimates from team members with a Slack bot, we can improve the breadth of data a Product Manager uses to prioritize initiatives.”

Expected Outcomes:

  • 75% of initiatives get sent out for estimates via the Slack bot
  • 60% of team members respond to the Slack bot
  • After being presented with team member estimates, Product Managers update their estimate 30% of the time
  • 25% of initiatives are tabled after receiving low value estimates from team members
  • 25% of initiatives are moved forward after receiving high value estimates from team members

Our experiment checks all the boxes we’re looking for in order to execute the Scientific Method:

  • A clearly stated hypothesis with an observation and an assumed cause
  • An experiment to test the hypothesis that will trigger observable results
  • A set of specific outcomes that put the measured results in context to validate our assumptions

This experiment will provide data to tell us if we’re on the right track to solving our customer problems and supporting our hypothesis. Experiments like this may become the primary features of our initial launch, or they may end up in the scrap heap of good ideas that don’t have the impact we expect. Either way, this scientific process leaves us less attached to our ideas’ success, and more attached to observing how they perform and reflecting on what that data suggests for our next move. If we do this right, we’ll be healthier in our work relationships, our organizations will be more successful, and we’ll build more trust in product managers as decision scientists.

How do you run experiments at your company? What’s the length of your average experiment?

What is your process for setting initial success metrics and how flexible are they as you gather data?

Has collecting data from experiments changed the way your company makes roadmap decisions?

If you like the idea of our experiment and want to help test it, let us know at beta@vspr.ai

--

--