Experimenting on humans

Dalia Simons
Wix Engineering
Published in
4 min readSep 25, 2019

You’re a developer at Wix. It's Monday morning. you’re enjoying a quiet cup of coffee in the office. Suddenly your product manager enters your office excited: “I have a brilliant idea! Let’s change our home page! If we change the main photo, I think more people will choose to start a new website”.
“You think?” you say “It’s our most important page, how can we be sure?”
What do you think? Is our current home page (A) better or the new one (B)?

Which one is better? A or B

If you’ve been in a similar scenario before you probably need an experiment system.

What is an experiment system?
An experiment system lets you open code changes / new features to part of your customers. In our example, we can have 50% of the users see the old home page and 50% of the users will see the new. Then we can check which button is pressed more.
This kind of system is often referred to as an AB test system (because we have 2 options: A and B). A always refers to the old version and B to the new.
How can we check? We need to log some event when the “start here” button is pressed and compare the results. B is better then A if:
1. The event we’re measuring (known as KPI) didn’t lose. You would think we will look for a winning event, and that makes sense. But most of the times the changes we make are very small and don’t make a big impact. So it’s hard to see an immediate win. But it accumulates to make a winning effect on our customers. That’s why at Wix most of the time not losing is good enough result.
2. Result Are statistically significant. It’s important to understand that for the results to be correct the test group needs to be large enough. This can affect the length your experiment runs. If you’re running it on the main page that has very big traffic you can get enough data in days, but if it’s a page with low traffic it might have to run a few weeks.

Is the risk we’re taking too big?
We don’t always want to start by exposing 50% of our users to a new feature. If it’s risky we would like to start by exposing it to a much smaller percentage like 25% or even 10% for the most crucial parts of the system.
Then if that succeeds we can gradually increase the percentage.

Gradually increase the exposure

Controlling the audience
We can use the experiments system to make smart audience control. We can open new features to select crowds and make a better decision faster. for example:

  • Language — We can limit the experiment to users speaking a specific language (like Spanish, English). This is very useful when we have a new feature and we want to see if it’s successful before we translate it to all our languages.
  • Browser — We can expose to a specific browser type (Chrome, Firefox, IE, etc). This is helpful when we have new features and we want to test them before adjusting the code for all browsers.

The other advantage filters give us is being able to open experiments to company employees only, thus allowing us to test new features on production without risk.

The other advantage filters give us is the ability to run 2 experiments on the same page without tempering with each other. The only way we can do that is to separate the audience that sees each test. For example, by Geo:

  • Geo — By location of the users (US, UK, France, etc). This is the most common filter we use to separate experiments running on the same page. each experiment will be open to a specific country.
Experiment filters

The life and death of an experiment

Now that we understand what an experiment is, we can see how it actually works.
There are 4 stages in running an experiment:

  1. The product manager decides with an analyst what the experiment will be run on and what event are we going to measure.
  2. The developer defines an experiment in his code. The code looks similat ro this:

3. The product manager opens the experiment, monitors the results and decides with the Analyst if it succeeds and exposure can be increased or if it failed and needs to be stopped.

4. If the experiment succeeded we open it to all users, then the developer needs to merge it in his code and it can be closed.

If an experiment fails we will close the experiment, and try to understand why it failed, fix it and then open the experiment again.

We’re proud to introduce: Petri

At Wix we wrote our own experiment system called Petri. It’s been in use in production for 6 years now, giving us the ability to run hundreds of experiments in parallel.
We shared it with the world as an Open Source project, You’re welcome to give it a try: https://github.com/wix/petri

--

--

Dalia Simons
Wix Engineering

I’m an experienced software engineer, writing backend code has been my passion and my career for the last 12 years. Currently I enjoy working for Wix.com