The Essential A/B Test Checklist
I wanted to start a 2-part mini-series that covers how I’ve run and observed A/B testing over the course of my career as a digital product designer and researcher. I am not saying I have all the answers on this topic, but I want to share what has worked for my product teams and I in the past.
Part 1 will cover some of the things you’ll need to start an A/B Testing.
Part 2 will cover when you should or shouldn't run an A/B test.
Prerequisites for A/B Testing
Before I even begin to dive into how I’ve run A/B testing in the past, I thought it would be appropriate to talk about some of the things you’ll need to have before you get started at your organization. In my mind, there are two major things that you need to have in place before conducting an A/B test at a SAAS company:
1. Buy-in from your cross-functional Product Team.
2. An analytics stack that gives you relevant information about how people use your product.
Buy-in from a Unified Product Team
Usually, when people write these articles it’s generally implied that a research team does all the work for the setup and execution of an A/B test. But in actuality, the only time I’ve seen a large scale A/B test executed effectively has been when an entire product team is working together like a well-oiled machine. A unified product team is usually comprised of four things:
1. Crystal Clear Research Goal
A unified product team is always unified by a well defined goal or question that they want to answer. A/B testing requires you to have a very specific metric that you are targeting that can be tested by altering ONE variable in a test. In other words, make sure your research goal has a measurable metric. And the variation between your two ideas is small enough that you can definitively prove that it did or did not impact the metric you’re testing.
Avoid These Pit Falls
Teams that set out with a goal that can’t be measured usually wind up not knowing what to do with their results at the end of a test.
Teams that have well-defined goals but get too liberal with how much they change between versions A and B can’t figure out what change impacted their primary metric.
Avoid these scenarios by sitting down with your product team and defining and evaluating your primary KPIs (Key Performance Indicators) ahead of time.
2. Product Management
Usually, a product owner or product manager will have a strong understanding of the KPI for the product and how your KPIs map back to the primary goals of your business. If you don’t have these well-defined then you should sit down with your product owner and conduct a stakeholder interview to get on the same page. Always make sure their KPI is quantitative and can be measured. If your KPIs cannot be measured in this way, an A/B test maybe the wrong research activity for validating your research goal. Staying on the same page with your product owner is paramount because if anyone ever questions the value of a particular study or A/B test, your PO can always ensure your test is tied back the primary goals of your product and ultimately the business.
If you want to learn more subscribe to our Medium. We plan on going into more detail on this process soon.
3. Technical Leadership
A technical lead usually takes the form of a senior developer that has a strong understanding of the ins and outs of the product you’re testing. Historically, I have seen technical leadership left out of research activities or only brought in for the purposes of implementing designs after they have been decided on. Having a tech lead in the room while you are planning your A/B test is incredibly important to make sure your system is ready or can be set up to deploy different versions of the same design to specific subsets of users. This is especially important if your test plan requires people to test your design in a production environment. Another huge benefit of having a tech lead in the room early is that they may have valuable ideas on how to automate time-consuming data gathering and reporting tasks.
Things your tech lead may need to know:
Will your test will affect production code?
What are the metrics you need to report?
How you want your metrics sliced?
How long will your test be in production?
How you want to slice your pool of participants?
(Random Sample vs Hand Picked Pools)
How many different versions of the same UI will be available at the same time?
4. Design and Research Leadership
If you are a dedicated designer or researcher, priority number one should be to figure out if your sample size is large enough and is an accurate representation of the people that will use your product. The second thing this group should really focus on is articulating the goal of the test. You need to know this for two major reasons:
Communicating to Management (or others outside of your team)
You will undoubtedly need to articulate why you’re tying up product team time with conducting research that you would normally do alone or without development. So make sure you have the test goal memorized in a succinct fashion, almost like a mantra. This is especially the case if you’re introducing A/B testing for the first time.
Providing Context During Critique
If you work on an interdisciplinary team of designers, researchers, developers, and business people you will likely go through multiple design review sessions. During these sessions, people will want to recommend changes as though they were in a UI design review. This is good news! You have a team invested in what you’re doing. Before acting on any suggestions, you need to ensure these recommended changes will not impact your test.
One good way to make sure this doesn’t happen is to reiterate the goal of the A/B test during critique: ask everyone in the room if adding, removing or changing the UI element in question will add an additional variable to your test. If the answer is yes, then you shouldn’t make the change this round.
Pro-tip: To remember valuable feedback received during, start a shared list of future ideas to A/B test. Make sure to also record the respective hypothesis to go along with the stated idea and make that list public for the team to see.
A strong analytics stack is something you should definitely consider before running large-scale testing. There are a lot of tools out there (and your organization may already have something in place), so you should think about your research goals before choosing a new set of tools for your research. Here are some of the requirements we had for past studies and the combination of tools we use to meet said requirements.
A Place to Store and Track Historical User Data
We use MetaBase to store historical data on how people use our app. Since our conditions can vary from community to community sometimes we run tests on a community level on focused segments of users. We rely on historical data to decide how we divide our communities to ensure balanced pools for testing. We also use MetaBase to collect results during our tests.
To get the most out of MetaBase, some experience with SQL and understanding how the data is stored will be helpful to get the most relevant stats for your test groups. It may sound like a lot of work, but it’s definitely worth it when you have a configurable solution and need to rule out as many variables as possible.
Pro-tip: If you’re unfamiliar with SQL, making friends with your local DevOps or technical UX person is critical. They may be able to help. Make sure you sit with them so you learn a thing or two about SQL if possible.
A Tool to Slice Your Stats
We used Excel’s pivot tables to help organize and distribute our data our historical data in Metabase. We also used this tool to help us slice and deliver our test results. This can be replaced with Google Sheets however I don’t typically use it.
Tracking When/Where Users Abandon Tasks
This one is obvious. Google Analytics has been critical for us when pinpointing click-through issues, bounce rates, and complete task failure on specific parts of the interactions we test. This tool is good for helping you form a hypothesis around what and where your design went wrong.
However, it’s critical that you don’t just take this data and start designing. Validate your hypothesis with interviews and observations, when possible.
Also, make sure that your page ids are different for each test so you can set up filters for them after they are implemented. Take look at this article if you have some time.
Pro-Tip: If you are testing a React web app make sure the pages you want to track have different URLs for Google Analytics to track. Your tech lead will thank you for bringing this up early in the process so don’t be shy about bringing this up early.😁😁😁
Pro-Tip (Another one): Try to make sure the naming convensions make sense to all parties consuming Google Analytics data.
A Tool for Discovering False Affordances
Hotjar is really good at telling you where users are clicking on your page. This is always good information to have when trying to determine if particular parts of your app are giving off false positives to your users on what is and isn’t clickable.
Depending on how it’s configured, HotJar may only show 50% of your app’s traffic, which might be problematic if your study is time sensitive.
(less traffic means your test needs to run longer) Also, if you are testing a context that sits behind a login page make sure that your HotJar URLs are set up correctly.
This isn’t a comprehensive set of tools but its defintely enough to get you started. If we didn’t list a product you’ve used for testing, feel free to post what has worked for you in the comments below.
Wanna join the OnShift Product team?
We are currently hiring a Lead UX Designer.