My Experiences with A/B Testing

Albert Yip
Aug 30 · 6 min read

Introduction

At Hootsuite, a fair portion of our decision-making is built upon experimentation. Determining what features should be implemented or what redesigns should take place is at the core of A/B testing. Value is derived from this decision-making process in the form of increased signup rates and customer retention by providing a more consistent customer journey. Product friction, one of the main barriers of entry to customers, is tackled in a data-driven approach to increase the percentage of active customers that return on a per week basis. It is also used to reduce customer churn, the percentage of customers who stop using the product in a given time period.

How do I experiment?

Have you ever wondered what is the best way to conduct an A/B test? Probably not. Think back to high school chemistry class; you need some sort of hypothesis you want to prove true/false and multiple variations of some single variable. For example, you might test the reactions of chemical A and chemical B to some substrate and gauge catalyst effectiveness. The ratio of each chemical, in this experiment, is selected to be the single variable we are directly manipulating and measuring.

In the case of software experimentation, you measure customer response to each variation of the single variable, demonstrating the effectiveness of that change. We are also concerned with bias that might arise from selection of customers. Unsatisfactory features of a customer might bubble up and affect how the experiment runs. It might be the case that most customers visit a certain page and drop off before even entering the experiment, significantly affecting the results. It is expected we account for that external bias through some means (eg. adjustment of customer segmentation for the experiment).

Getting into the nitty gritty

To perform our A/B testing at Hootsuite, we use Optimizely X, an experimentation platform that provides A/B testing tools, to handle our experimentation logic. This logic encompasses what metrics we want to measure, how to bucket customers into certain variations, what customers we want to target, and how to successfully roll out these changes without interrupting the customer journey.

Frontend Experiments

Each experiment for the frontend is run with a custom code snippet which usually contains the variation logic that modifies what a customer may see or do. Every code snippet might be injected into the normal code path of a user. This depends on what variation that user is segmented into. For example, a user would see no changes if they were segmented into the control group while a user segmented into a variation group might experience a different workflow. This script injection and execution is delegated to Optimizely, effectively decoupling experimentation code from baseline code. Consecutive experiments are able to be run with the guarantee that no previous experimentation logic will carry over.

Like with all web-related solutions, there are potential issues that might arise with safety, reliability, and testing. Because we are injecting code, there is an inherent dependency on what the injected script is modifying. Some part of the experiment could break for users if the base code is changed without taking into account the experiment logic. There could possibly be issues with the script injection such as the script is injected after the base view, and causes the browser to flicker. Alongside, it can be difficult to write proper tests because of how decoupled and short-lived an experiment is.

Experiments encompassing the full stack

To augment frontend experiments, there is also the option for server-side experimentation. We are able to deeply experiment how the product works (eg. underlying features, algorithms, etc.), as opposed to what the customer experience looks and feels like. Experiments are also able to run on multiple stacks, whether that be web, mobile, etc. An email verification service, for example, could run and be experimented upon, regardless of where those emails surfaced from. There would little to no performance hit on the end customer side, as all experimentation would be executed and decided on within the server. On the other hand, there would be more complexity towards a full stack experiment implementation. It would have to deeply touch core baseline code and possibly modify existing behaviour.

Linking the LinkedIn

Let’s take an example where we are conducting an experiment that is interested in how a customer engages with LinkedIn, specifically within the early stages of the product. In the following picture, we see there is a homepage banner meant to guide the customer and get them to add a profile picture, connect to a possibly related source and follow 5 hashtags.

We want to measure what sort of impact will occur if customers are shown different views of the product (eg. simpler views). Let us call this homepage banner shown to be the control. Our super duper cool analyst has reasoned that changing the number of cards displayed will reduce the barrier of entry to the product and therefore increase customer retention rate. It is decided that the left card (eg. “Add a photo to get recognized”) should be removed as part of the first variation, middle card (eg. “Do you know Foo Bar?”) should be removed as part of the second variation, and so on. Considerations that we might need to be aware of include requiring the customer to have some previous product knowledge, the impact of repeated experimentation on current customers, potential edge cases and unintended consequences that could appear from modifying a customer’s experience, and any sort of bias that might arise from customer segmentation into testing groups.

After setup and running of the experiment, we look for statistically significant values and conclude that there is no variance found in the retention rate of customers in each cohort. Our experiment tells us that the customer does not gain any value from any of the changes (eg. we have failed to reject the null hypothesis) and thus the control is still the best.

Metrics & Tracking

One of the most essential parts of running an experiment is thoroughly tracking customer interaction and collecting appropriate metrics. Sometimes it is difficult to determine the source of truth; what metrics can provide insight into customer engagement? What metrics potentially will add noise to our analysis? We must define some sort of assumptions of what a customer might go through within their product journey. We might expect a customer to click repeatedly on certain filter buttons on the screen, potentially correlating to some key experiment metric. Often times, these metrics will be discovered and iterated upon as the product evolves and the customer base becomes more uniform. For instance, we might discover that a user’s filtering workflow is not influenced by the visibility of filter buttons, relative to other UI elements, but rather responsiveness of the interface.

The case against experimentation

Throughout this article, I’ve advocated for the use of experimentation and A/B testing to understand and maximize customer success, allowing them to overcome the hurdles and friction points of the product. However, we could also affect customers in a negative way with too much experimentation. A constantly changing feature base or customer interface puts a lot of stress on how the customer can use the product (“Where did my favourite buttons go?”, “How can I access this workflow?”, “Why did they choose this ugly guide?”). Customers are very resistant to change. They like to preserve their understanding of the product and would rather build upon that knowledge.

Although the experimentation concepts talked about in this article were focused towards software development and maximizing customer success, they are still applicable in a more general manner. We might want grocery store employees to understand how and why their departments are structured inefficiently or the best way to empower each other. Either way, we can see this iterative process, driven by data, is an interesting approach to how progression and growth occur.

About the Author

Albert is a Software Developer co-op on the Product Growth Retention Team at Hootsuite. He is currently a third year Computer Science & Mathematics Double Major.

Connect with him on Linkedin.

Hootsuite Engineering

Hootsuite's Engineering Blog

Albert Yip

Written by

Albert is a Software Developer Co-op on the Product Growth Retention Team at Hootsuite.

Hootsuite Engineering

Hootsuite's Engineering Blog

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade