3 Styles to Implement A/B Testing

Published in

FabHotels

8 min readDec 6, 2018

A/B testing is being widely practised across websites. It has proved to be a way to be less wrong in our immensely complex business world. It is now not just a tool but rather a process to be followed for all releases.

All new ideas, performance enhancements are exposed to a certain percentage of users, and depending on the actual user’s response, a decision is taken to either roll it forward or back. This way, we can compare the user behaviour and conversions across experiments and take a wiser decision.

Take calculated risks. That is quite different from being rash. -General George Patton

In the last article, I discussed in detail about the importance of A/B Testing and how it has helped us at FabHotels to make data-driven decisions. In this article, I will talk about the approaches that were considered while implementing it.

You may visit the last article at

How fusion of A/B/n Testing & Phased Releases helped to grow our product

Disruption is the new norm. Rolling out changes, understanding mistakes and adjusting to them rapidly is a key factor…

medium.com

What all can be experimented?

An A/B experiment can be like:

Change in UI & UX. e.g. change in the styling of a CTA (Call to Action)
Change in user journey e.g. change in payment funnel, change in auto-suggest algorithm
Change in business logic. e.g. change in pricing algorithm

10 Factors to narrow down the solution

Should be scalable to support increasing number of experiments.
Experiments should be exclusive. Performance of one should not impact the other.
Should be scalable across platforms be it: mobile website, desktop website, APIs, Micro-services, Android/iOS apps.
Changing distribution across variants should be easy.
Should be maintainable.
Should not pollute the codebase.
Should support ReactJS Single Page Applications.
Analysis data should be readily available. Making reports of each experiment should not be a hassle.
User view of the website should not flicker between experiments as it may lead to negative user experience.
It should allow running multiple experiments in parallel.

What are the stages to implement it?

A/B Testing implementation of ideas can be divided into 5 stages:

1. Distribution
2. Development
3. Deployment
4. Duration
5. Decision

How to approach it?

I. Use 3rd Party Tools

II. Deploy Application as a Variant

III. Support Variants inside the Application

Let’s now discuss all the stages for these 3 styles of A/B Testing implementation

I. Using 3rd Party Tools

There are a handful of tools in the market that make A/B testing a very simple job. For example, VWO, Optimizely, CrazeEgg, Inspectlet etc. These tools are scalable and very easy to deploy.

Distribution

We get a variety of options to decide what category of visitors are to be targeted for the experiment. Visitors can be targeted on the basis of Referral, Device, New visitors, Returning visitors, search traffic, etc.

Development

All we need to do is use their WYSIWYG(What You See Is What You Get) editor to make changes on the webpage. We can also write CSS/Javascript code on their IDE. We can also make UI changes spread across pages. For example, change the persuasion message across pages or change the styling of CTA across the website.

These are Javascript driven frameworks. They inject bootstrapping code into the website and, changes made onto their editor are executed once the page is rendered i.e once the page is rendered, they execute the javascript code to alter the page as per the experiment.

Deployment

One click deployment, just choose the percentage of people subject to the experiment.

Duration

They recommend it on the basis of traffic, expected conversion difference. It is explained at

A/B test duration calculator (Excel spreadsheet) - Blog

In a previous post, I provided a downloadable A/B testing significance calculator (in excel). In this post, I will…

vwo.com

Decision

On the basis of fancy dashboards.

Thoughts around it?

3rd party tools work best for experiments that require change only in the UI. Multiple experiments can be run in parallel. Their experiments would load a page, and then page rearranges, which is not a great experience. It can have problems running with ReactJS like applications that have their own life-cycle routine.

II. Deploying Application as a Variant

We can host our variants behind a proxy and, deploy as individual applications. NginX, Akamai, Varnish, etc can act like a proxy and distribute the load amongst variants.

Distribution

Nginx has a split_client method that allows defining traffic distribution. It’s typical example is:

In this example, we have our application hosted at :3001 and :3002. We use the remote_addr i.e. the IP address as the base to split the clients. Around 20% users are forwarded to B variant and 80% are forwarded to Control variant.

Success of an A/B Test Framework depends on the fairness of the distribution algorithm.

Distribution by IP address is not fair because:

Exposed IP address remains same for the entire private network or LAN. This will cause same variant being presented to a mass of people.
IP address is supposed to make the variant sticky for a user. However, this is also not guaranteed. e.g. offices can have multiple broadband connections causing fluctuating IP addresses.

In order to solve this, we need to base the distribution logic on a random number and store the variant in a cookie. This cookie can then be used to render the same variant to the end user.

You may check the production ready source code at:

charanjeet2008/ab-testing-nginx

Contribute to charanjeet2008/ab-testing-nginx development by creating an account on GitHub.

github.com

Development

We can implement the variants as branches of the codebase.

Deployment

We can setup a CI/CD pipeline for all the variants. It is like setting up parallel roads to the production.

Duration

Duration for which an experiment needs to be run to conclude depends on the traffic. We may use the same formula as above.

Decision

Google Analytics can be used to compare metrics amongst variants and decide the winner. Add Experiment Name in the pageview event of Google Analytics. It can be done via Custom Dimensions. We can then segregate all of the reports on the basis of experiment name.

Google Analytics custom dimensions

If you require dimension names in Google Analytics that are not included in the default dimensions and metrics, you can…

support.google.com

We can also setup variant-wise Newrelic properties to monitor the server level metrics data.

To know what all metrics should be observed please visit:

How fusion of A/B/n Testing & Phased Releases helped to grow our product

Disruption is the new norm. Rolling out changes, understanding mistakes and adjusting to them rapidly is a key factor…

medium.com

Thoughts around it?

Deploying application as Variants is comprehensible and easy to implement. It is easy to maintain as only the winning experiments are merged into the master. Debugging is also easy, as we can detect the variant via cookie value and codebase of that variant would have no clutter of other experiments. It is scalable as applications behind the proxy can be deployed across servers.

Mile wide and mile deep A/B Tests can be run with simplicity.

It can be a challenge to mange branches. Also, options to segment users for distribution are limited.

III. Supporting Variants inside the Application

The application decides which variant to expose to the user, and it contains the codebase of all the (running) experiments. We can ride on the back of Tested A/B Frameworks like Wasabi.

Distribution

It is handled by the application wrapper. We can write our custom algorithm for bucketing, which can be done not only on the basis of request headers , but also on the basis of user properties. For example, segregate users as per their spending habits.

We can also integrate with tested A/B Frameworks like Wasabi, Petri that provide an API interface and also have a back-end dashboard wherein we can control the distribution configuration.

Development

One application, thus same branch will hold all of the experiments. Here, the application has the responsibility to facilitate reuse of components across experiments, while still giving enough flexibility to create divergent experiments. We can strategically decide which component to render on the basis of the variant chosen, and components should be adaptable.

We need to device a scalable architecture from the very beginning. Directory structure needs to be planned in order to manage copies of classes or functions. We need to segregate application level code from the code that be experimented.

Deployment

Existing CI/CD will work.

Duration

Depends on the traffic and same formula can be used.

Decision

We can use Google Analytics by pushing the experiment name to as a Custom Dimension. GA is helpful to view reports. In order to see experiment wise segmentation, we need to push experiment details to the GA/GTM.

Thoughts around it?

Throwing away failed experiment might be difficult. Keeping experiments mutually exclusive will always be a challenge. Codebase can grow at a rapid pace leading to complexity, which can give a negative impact on performance. We need to take specials measures to keep the performance in check.

It will make debugging difficult. Also, in case of modern javascript apps, we need to cater variant-wise bundle creation.

So, what did we choose?

We, at FabHotels, are using a combination of two approaches i.e. deploying application as a variant and using VWO.

We initially implemented it using NginX, but then switched the logic to Akamai to support A/B’s variant-wise CDN caching of dynamic pages, and one of the variants is being used by VWO that helps pacing up the UI experiments.

The number of parallel experiments that we are currently running end up in single digits, and the approach chosen is able to handle it pretty well. I wonder which approach or what combination of approaches would suit when 100s of experiments need to be run in parallel. Give it a thought!

…and, finally, how to test the implementation?

A-A Testing

Test the platform by deploying the same codebase on both Control and Variant and distribute equal traffic to both. There shouldn’t be difference in Pageviews, Bounce Rate, Conversion Rate, etc. on Google Analytics.

Load Testing

Generate page requests on test site by using tools like ‘ab’ or ‘jmeter’ and test that distribution across variants should be as expected.

Debugging Variant

While on desktop, we can easily change cookie. However, on mobile we change url argument to switch the variant. It is also a solution to land AMP(Google’s Accelerated Mobile Pages) to a correct variant.
We should also add response headers to help debugging. You may check the source at -

charanjeet2008/ab-testing-nginx

Contribute to charanjeet2008/ab-testing-nginx development by creating an account on GitHub.

github.com

Given a key, output of split_clients is determined using murmur hash algorithm. You may check the outcome version for a given key at the below link. It helped in debugging.

MurmurHash Online — An online version of MurmurHash by ShoreLabs

An online version of Murmurhash and NGINX http_split_clients_module behaviour testing utility.

murmurhash.shorelabs.com

We can append the variant chosen in the access log and then, later check if the distribution across variants is as expected.
Keep a watch that pageviews across experiments remain as expected.

Get going with A/B Testing as a successful A/B Test brings forth an increased number of conversions and user engagement, and a failed A/B Test brings forth a lesson on user behaviour.

Thanks for reading! If you liked this article, hit that clap button below 👏. It means a lot to me and it helps other people see the story.

You may reach me at charanjeet2008@gmail.com

3 Styles to Implement A/B Testing

How fusion of A/B/n Testing & Phased Releases helped to grow our product

Disruption is the new norm. Rolling out changes, understanding mistakes and adjusting to them rapidly is a key factor…

What all can be experimented?

10 Factors to narrow down the solution

What are the stages to implement it?

How to approach it?

I. Using 3rd Party Tools

Distribution

Development

Deployment

Duration

A/B test duration calculator (Excel spreadsheet) - Blog

In a previous post, I provided a downloadable A/B testing significance calculator (in excel). In this post, I will…

Decision

Thoughts around it?

II. Deploying Application as a Variant

Distribution

charanjeet2008/ab-testing-nginx

Contribute to charanjeet2008/ab-testing-nginx development by creating an account on GitHub.

Development

Deployment

Duration

Decision

Google Analytics custom dimensions

If you require dimension names in Google Analytics that are not included in the default dimensions and metrics, you can…

How fusion of A/B/n Testing & Phased Releases helped to grow our product

Disruption is the new norm. Rolling out changes, understanding mistakes and adjusting to them rapidly is a key factor…

Thoughts around it?

III. Supporting Variants inside the Application

Distribution

Development

Deployment

Duration

Decision

Thoughts around it?

So, what did we choose?

…and, finally, how to test the implementation?

A-A Testing

Load Testing

Debugging Variant

charanjeet2008/ab-testing-nginx

Contribute to charanjeet2008/ab-testing-nginx development by creating an account on GitHub.

MurmurHash Online — An online version of MurmurHash by ShoreLabs

An online version of Murmurhash and NGINX http_split_clients_module behaviour testing utility.

Written by Charanjeet Kaur