Tests, Tests, Tests Everywhere

Published in

Wingie / Enuygun Growth

5 min readApr 25, 2021

Lately, A/B testing is a very popular topic. We think about what we can test, we read many “best case”s. But… What is A/B testing, are we using them right or just whistling in the wind?

Why do we need A/B tests?

There are many ways to see how users interact with a product and what are the pain points. Frequently used focus group studies, accessibility tests or surveys provide qualitative outcomes. A/B tests have a different spot from those studies by providing quantitive outcomes. In an A/B test, the aim is to decide which version of a product performs better or meets users’ expectations easier by exposing different versions of this product to two groups of users who have nearly the same characteristics. The number of versions and groups of users may be increased. Thus, it is better to describe the aim of the A/B tests as trying to reach the best-performing version.

When a product is launched, it does not take its final state. According to users’ interaction with the product, developments continue. Sometimes these interactions might be different from the expectations of the product team. In these situations, feedbacks coming from users are necessary to make any enhance the performance of the product.

Sometimes, we see leak points when we look at the conversion funnel of a product. There might be any technical problem but again because of users’ interaction with the product, users might face some difficulties moving towards the conversion. Those difficulties are our homework to study on.

After taking notes of what we need to improve, brainstorming starts, all available data is analyzed, and as a result, ideas are listed. After that, we run A/B tests to understand which one of these ideas will have a positive effect on users’ behavior or which one will contribute to our goal metrics. Also, we can see which one of those ideas should leave as soon as possible.

What should we consider in our A/B tests?

Compare apples with apples.

It does not make any sense! They have to be different from each other so I can compare, right?

No, it is not.

Let me continue with an example. There is a button that we want to increase its click-through rate (CTR). We can analyze the effect of button color, font, font size, or text on CTR. There are many ideas here. What we should do is testing these ideas with an order. There is an image below as a bad example. An A/B test comparing these two versions will not give a proper answer because even if we see an improvement in our goal, we cannot understand what is the real reason here. Color or font size?

Green and Small Button vs Red and Bigger Button

The desired outcome of an A/B test should be a measurable metric.

What you are going to test is different versions of a product but we need some measurable metrics to decide which version is the winner. For example, the previous example aims to increase the CTR of the button. In addition to this, increasing the number of pageviews, increasing average session duration, or decreasing bounce rate might be good objectives for an A/B test. For example, you may want your users to read the whole content in your blog content. Ideas here are changing background color and font. You decide to start with the background color. Whether users like your new design might be good to learn but it is not a proper objective for your A/B test. You can aim to increase the average number of scrolls done by users on your blog pages.

As a warning, be sure whether your objective metric is available before testing. If it is not, start saving logs of this metric as soon as possible. Observe for a while, then start the test.

Do not end the test before collecting enough data.

After the test starts, data is started to be collected immediately. However, the outcome of the tests needs some time to be meaningful. By saying meaningful, I mean statistical difference. Almost all A/B test tools have this calculation support. If you choose to run an A/B test in-house without any other tool, you should note that this calculation is as important as setting an A/B test.
In the early stages of the test, the number of participants is low and the result takes shape by chance. The number of participants goes higher, the result becomes more convenient. There are results of these two tests are shown below. I think we all agree that which result is more reliable.

In Wingie Enuygun Group, we always have tests running and tests planned.

Sometimes we run an A/B test with the early stage of a new feature and the website without the feature. It can show adding this feature whether improves the performance as expected.

Sometimes we run an A/B test about our existing features. For example, on the flight search page, there is a daily estimated price bar. It gives good insight to users about prices of neighbor days to day when they search for. Also, it might make flight search for users more complicated. Users might see another they, search flight on that day, and see a cheaper day again. Because the daily prices bar is originally expanded, we tested it with a shrunk version. Original and test versions are as following:

Top: Original (Expanded), Bottom: Test (Shrinked)

In this test, we saw that this feature is helping people to choose the flight and we decided to use it as expanded.

Do not afraid to test your ideas. It might be difficult but if your idea loses the test, leave this idea immediately. Keep testing with other ideas to get better and better.

If you want to join our team, share your CV with us: kariyer@enuygun.com

Tests, Tests, Tests Everywhere

Why do we need A/B tests?

What should we consider in our A/B tests?

Written by İlknur BAŞ