A/B Testing — Product Toy or Effective Tool for Product Optimization?

Published in

Docplanner Tech

7 min readAug 27, 2019

You can’t see it, but result of this A/B test is positive / Fot Michał Krawczyk.

My thesis supervisor once told me that statistics is like a bikini: it shows a lot, but it covers what is most important. I agree with him, but the bikini is cool too. Especially the analysis of statistical data from the A/B testing. Do you know any product manager who’s not excited when he sees their test results for the first time?

A/B testing is a method for verifying or disproving hypothesis. Half of the randomly selected users see the test version (changed); others see the original version (unchanged). Both versions are served simultaneously. Thanks to this method, in contrast to the before/after testing, we are sure that we only check the impact of the introduced change. The external factors, such as weather, changes in another area of the product, failures, etc., are eliminated. And this is beautiful; there is no bullshit. There are no excuses like “the landing page was great; only the users went to the sea because the weather was nice.” The change either improves the analyzed indicator or not.

Of course, to draw such an unambiguous conclusion, it is necessary to collect data for a representative period (in our case at least 2 weeks) and obtain a statistically significant result. Most A/B testing tools estimate statistical significance for you, but if you want to know more about it, you can search for articles and simulators on Evan Miller’s blog.

I’ve been performing A/B tests at DocPlanner for almost two years. It is a great privilege to be able to serve tests to millions of users every month and at the same time change the healthcare around the world (including Italy, Mexico, Spain, Poland, Turkey, and Brazil) for better. It’s mainly thanks to A/B testing that we’ve made some critical changes that have helped us improve conversion by 10% in the last quarter. It’s an excellent result, but the beginning was not easy. When we started the adventure with conversion optimization, on average, 1 in 10 tests achieved a positive and statistically significant result. Now the effectiveness has increased to around 1 in 3. Here are some things I learned during that time.

Don’t mess with the product analyst

Better yet, go for a drink and make friends with them. When analyzing the results of A/B tests, every detail counts. The smallest mistake in collecting and analyzing data can cause you to draw false conclusions and implement a change that only seems right. Here is an example: for the television campaign, we once prepared a content section on the doctors’ profiles that supposed to strengthen the message of the campaign. In truth, we did not quite believe in the success of this change, but we decided to A/B test it. In our case, the goal was the user conversion into an appointment made through our website.

The results exceeded our wildest expectations, and by so much that we became suspicious. We were a bit surprised that the results on mobile and desktop versions are different. The fact that version B (tested) performed better only on business days was also suspicious. Only our product analyst Piotr noticed that in Google Analytics, we have included not only visits arranged online by our users, but also those scheduled by our call center. By sheer chance, the call center employees were mostly directed to the test version and naturally exaggerated the conversion. After excluding this traffic, it turned out that the change had no impact on the conversion, so it was not implemented on the website.

Tip: Do not trust the results that are too good or too bad. Large deviations usually result from errors in the test configuration or in the way data is collected. When analyzing the results, use segments: device type, user type, source, etc. Bind additional events to the edited section (e.g., with added buttons), they may also be useful in the analysis.

We take data analysis very seriously. We train hard everyday / Fot. Michał Krawczyk

Your hunch is not enough

Even the CEO’s hunch is not enough to prepare and implement a test. The indiscriminate inspiration with trends or features of competing websites is also not the best idea. We used to try and test on this basis. We tested changes of button colors; addition of content sections that communicate how cool our product is; and simplification of the appearance of mobile listings because we observed this trend in other marketplaces. Effect? No improvement in conversion.

So how do you choose tests that have a better chance of success? We use the ICE scoring method. Our colleagues from Trello described it really great (link), so I won’t duplicate it here. The most important thing is to make the changes that users need. They should be the source of our ideas and inspiration. We change the product for them. We analyze how they move on the website, we talk to them through our website (quantitative research: surveys on the website) and directly (qualitative research: face-to-face interviews). A/B tests that were born in this way work best. Look at the example below.

We ask the users who have not completed the appointment reservation about the reason for leaving the process. Once, when analyzing responses, we were surprised with entries like “I don’t have Facebook” or “I don’t want to give you access to my FB account. What do you need it for?” These answers surprised us because we did not require such an action. Then we looked at our login page design. It turned out that, especially on mobiles, the option of logging in via Facebook stood out strongly. We decided to reverse the order as part of A/B testing. Effect? Improved conversion on desktop by 5%, and on mobile by 12%.

Tip: Always start the test with the insight collected from users. An excellent idea for a feature is not enough. If you are so excited about it that you want to test it, drink a glass of cold water and cool down. It only makes sense if it solves problems that are important for the user.

Go for quantity; the quality will come later

Within product teams at DocPlanner, we are not imposed with the quarterly goals by the management, but we set them ourselves. Of course, these goals refer to the long-term vision of the company, and everyone in the company can challenge them. However, it’s the team members that decide together what they want to achieve in the next 3 months.

In Q3 2018, we set ourselves a team goal on the number of tests carried out. Not on their relevance or the effect they are supposed to cause, but on the number. Of course, we had to defend this approach, but I explained that it was only a temporary stage in accelerating our test machine. The stage that will make the testing become our second nature and our habit. In retrospect, I can see that it came out great. Such endurance training in A/B testing has allowed us to estimate the workload better, assess the potential impact of the tests on KPIs, and make fewer errors in analytics. Thanks to this training, we now can also implement each A/B test faster. In our case, the quantity quickly translated into quality in the next quarters.

Tip: Try to impose discipline on yourself, e.g., “I will start 10 A/B tests in a set time” or “I want to have 3 tests in progress at any given time.” After some time, you will notice that you set up tests faster and with fewer errors.

Make testing a daily habit / Fot.Michał Krawczyk

Test your A/B test before the test

It is said that if something is free, people do not respect it. It is similar to A/B tests. The fact that they are widely available and most often do not require any advanced implementation means that people frequently prepare them sloppily, and without a deeper reflection. This approach is a mistake. In truth, A/B testing costs quite a lot. Usually, the process consists of creating a design/copy, translating texts to other markets, some coding in the tool itself (we use Google Optimize), binding events, setting the targeting, and communicating changes within the company. A poorly prepared test is not only a waste of time but also a waste of valuable insight. When analyzing the test results, we sometimes noticed that we made a rookie mistake in the design or did not check some critical conditions. Recently, we pay more attention to various (and not just one) user paths and… we test mockups with a small group of users on the street (so-called guerilla testing.) Thanks to this, we can clearly see the interface’s shortcomings or vagueness of a copy. It costs us half a day to accost people on the streets but allows us to avoid the repetition of the tests.

Tip: Most A/B test configuration tools give you the option to preview the test. Use it to verify your test with real users. Only 5 such tests will allow you to find the vast majority of errors.

Dear user! Have a cup of coffee and test our mock-up / Fot. Michał Krawczyk

Summary

A/B tests are fantastic — do them. It is a product toy, but one that, when used correctly, will become the best tool to improve your products.

If you think that the presented conclusions may be useful in your work, you can apply them. If not, please learn from your mistakes; it is also great. If you have any opinions on A/B testing, please share them in the comments.