Experiment Design Guidelines for Product Analysts — Part 1/3

Elisabeth Reitmayr
ResearchGate
Published in
6 min readJun 21, 2021

At ResearchGate, we run a lot of experiments to improve our product for our users. Our experiment design guidelines for product analysts give guidance on how to set up those experiments from the analytical and statistical perspective to ensure we can evaluate the experiment as intended. This guideline gives some hints but does not fully cover the product management, user research, and design perspectives, i.e. what to experiment on. In this post, we will focus on the work that is required before starting an experiment.

This post is the first part of a series in which we publish some of our internal guidelines and frameworks to make the way we work more transparent. We are interested in your feedback on these guidelines — please send it to elisabeth.reitmayr@researchgate.net.

Objectives of the experiment

Is an experiment the best method?

Experiments are a very powerful tool in the methodological repertoire of a product analyst because they allow us to causally infer from a treatment (product change) to an effect. This is much stronger evidence than correlation analysis for example, which does not allow us to draw causal conclusions. So why don’t we just run experiments for everything? Experiments are expensive, they require a lot of preparation, monitoring, and engineering time for implementation and resolution. They also come with opportunity cost: we only have a limited amount of traffic and time to experiment, and we should make sure we use it for the most impactful changes. Therefore, we should choose the assumptions and hypotheses we experiment on carefully.

Image source: Are you guilty of using the word “experiment” incorrectly?

As suggested in this blog post, we should only test assumptions that have the potential to provide high user value and which have high risk associated. As we want to minimize the uncertainty of the most impactful assumptions that our experimental hypotheses are based on, we rely on the concept of the “Riskiest Assumption Test” (RAT — read more on this concept here). The idea behind the RAT is to test the assumptions that can potentially have a strong effect on the product (high risk). “Risk” can be defined in terms of the potential effect on user behavior, or in terms of our uncertainty about whether the assumption is valid. If we rely on an assumption that we do not have any previous insights for, the uncertainty is high.

Whether an experiment is the best method to test the assumption depends on various factors such as:

  • What is the cost of the experiment?
  • Do we have enough traffic to evaluate the experiment in a reasonable timeframe?
  • What is the chance we end up implementing the tested solution?

We add a limitation to our interpretation of “riskiest” in the RAT concept: in case the solution we are testing is associated with very high risk, there is also a higher chance that we end up not implementing it. Therefore, a usability test with mockups might be a better (cheaper) first step to test the underlying assumptions before running an experiment:

Image based on The Art of the Strategic Product Roadmap

We run experiments to learn about our users

We run experiments to improve our product in a way that serves our users’ needs better. Therefore, we have to make sure that we have a solid understanding of our users’ needs in the specific domain we are experimenting on. For example, if we want to support our users in discovering relevant content in our product, we should have a good understanding about the different tasks that users are trying to accomplish with our product before we run experiments.

Each experiment should be set up in a way that enables us to learn about our users. We can often transfer learnings from one context to another. That’s why we want to make sure we test the assumptions about our users in the most direct way possible so that we can update our theories about our users with the new insights we generate via the experiment. For example, in most cases we should not test two changes at the same time (unless you use a full-factorial design — read more in the next part of this blog post) because we will not be able to attribute the result of the experiment to the different changes we introduced. We should also aim to test assumptions about our user needs (e.g., “People don’t want to click like on a story if they dislike the title”) rather than testing specific solutions (“Users will click more on stories if we introduce a dislike button”) (read more here).

Work that needs to be done before implementing the experiment

Designing an experiment properly requires a lot of work upfront — before writing any code. The first step for designing an experiment is defining the follow-up action you take in case you gather the evidence you are interested in:

“Statistics is the science of changing your mind under uncertainty, so the first order of business is to figure out what you’re going to do unless the data talk you out of it … That’s why everything begins with a physical action/decision that you commit to doing if you don’t gather any (more) evidence.” (Never start with a hypothesis)

Defining such a follow-up action often requires user research to make sure we actually address our users’ needs and not only experiment towards moving a certain metric. We should have a clear understanding about the user journey we are working on, and define a clear hypothesis based on assumptions. In this context, a hypothesis does not refer to the Null hypothesis we define for the experiment (we call this “statistical hypothesis”), here we are talking about the hypothesis about our users. A hypothesis usually has the following format:

We believe that <assumption for a certain type of user>, and if we provide <feature/change> for them, they will <behaviour/metric change>

The follow-up action should be defined based on a quantified expectation. This means that we do not only say “we expect a lift in conversion rate” but rather “we expect at least a 5% lift in conversion rate”. This helps to prevent the implementation of marginal improvements and is also important for determining the required sample size (“minimum detectable effect — read more in the third part of this blog post).

The following table summarizes these requirements based on an example experiment to improve the usability of the bookmarking option on the ResearchGate feed:

Template for experiment setup

As a product analyst, you will save a lot of time if you ensure clarity about all prerequisites for experiment analysis upfront. We strongly recommend writing down the background/context section of the experiment documentation and to gather feedback from the design, PM and user research before the experiment is implemented by the engineers. We recommend using this template (several components will also be useful for the final experiment documentation).

The next part of this blog post will focus on the setup of an experiment.

--

--