5 A/B Testing Lessons I Learned The Hard Way

At Creative Market, we have a strong test-first culture. Whether we’re thinking about a large-scale new feature or a minor UI tweak, we try to always test out our assumptions. Most often, this comes in the form of a simple AB test. I’ve run or been part of a bunch of these tests, and to be honest, I’ve flat out failed or screwed up a bunch of times along the way! Here are some of the valuable lessons I’ve learned from my mistakes.


First, The Good Stuff

The upsides of running A/B tests are numerous and well documented, so I won’t spend too much time on them, but here are a few quick things that we love about testing:

Increased Confidence

Testing allows us to approach our problems and solutions with confidence, knowing we made the right choice from a data perspective. This is especially important when approaching large scale projects that will take weeks or even months to build. By starting with a basic hypothesis that can be quickly tested, we avoid wasting time building big failures. Failure is always an option when it’s small and ultimately yields valuable insight.

Lower Risk

We’ve greatly lowered our risk of accidentally breaking a funnel and decreasing conversion rates without realizing it. Before we fully developed our testing culture, we had an untested checkout flow “improvement” that ended up costing us a lot of money, and we ultimately had to revert the change. This would’ve never happened under our current way of working.

Big Wins With Little Effort

Sometimes that rare and mystical event does in fact happen: you change one small thing and conversion rates shoot through the roof. We’ve seen big results from the smallest things like moving a button or stripping out a superfluous step in a conversion funnel.


What I’ve Learned The Hard Way

All right, here’s the good stuff: all the ways I’ve messed up and what I’ve learned as a result.

Lesson 1: It’s Easy to Get Addicted to Small Improvements

The first time you experience the joy of a solid winning test result, you’re hooked. Your team changed something, the numbers went up, you’re all heroes! The CEO even mentioned it on the team call! Now you’re all in, dreaming up and shipping as many quick tests as you can.

Fast forward a few months and you’re looking back on what you’ve accomplished recently. You’ve shipped a whole mess of tests, so you must be killing it, right? Well… it turns out most of your “quick wins” turned out to be pretty minor. They felt great at the time, but opportunity cost rears its ugly head and you start wondering what else you could’ve built in that time.

Here’s the key: Never mistake being busy for being productive. It’s so easy to get obsessed with every possible teeny tiny conversion rate increase bump that you miss out on real, innovative product work that pushes your company forward and keeps you ahead of the competition. Use your limited resources wisely and find a way to balance important new projects with incremental improvements to existing flows.

Lesson 2: You’re Only as Good As Your Tools and Tracking

Far too many times, we’ve spent time and money running tests that proved to be inconclusive due to a tracking bug, faulty tool, or some other technical glitch with how we set up the test. Make it a habit to talk to your engineers and data analysts about the tooling necessary for each and every test before you begin work. Ask about existing tracking bugs and what data in your analytics tools can and can’t be trusted. If you can’t have total trust in the data coming out of an AB test, there’s no point in running it.

Lesson 3: Watch Out For Frankenstein!

AB testing teams can get into a pattern where they have blinders on, focusing on improving a single flow without considering the greater UI and UX ecosystem. Recently, my boss looked at a flow my team had been working to improve and told me that it looked a bit like Frankenstein’s monster. And he was right! It had all these random UI elements and design decisions bolted onto it in weird places and simply didn’t fit in with the rest of the site. Had we improved conversion rates? Yep! Were we making more money? Yep! Were we taking the time between tests to roll out site-wide design system tweaks to make sure everything felt clean and consistent? Nope.

The lesson: it’s ok to move fast and try new things, but don’t rush on to the next project after your test wins without doing the necessary cleanup (both on the frontend and backend). Otherwise, you’re just building up a mountain of tech and design debt that will be time intensive and expensive to fix down the road.

Lesson 4: Do Your Homework and Shoot For Impact

“Honestly, there’s just not enough traffic to this flow to make a significant impact.” This sentence can be useful or terrible. When I heard it after we’d already been running a test for a week, it was the latter (ideally, you figure this out before you run the test!). I instinctively thought we were experimenting with a high traffic area, but I failed to validate that assumption.

Here’s the thing: I love fixing bad user flows, and when I see one with an obvious improvement, I instantly want to spin up a test to fix it. Unfortunately, not every flow is worth fixing (some should even be killed altogether!). When you have limited time and resources, you need a laser-like focus on only improving the metrics and flows that can make a big difference to your bottom line.

The lesson: run the numbers first. This is often really easy to do and yet it’s so easy to be distracted by a seemingly great idea that you skip this step. Always ask how much you’d have to improve the conversion rate to make the project worthwhile and then compare that to your reasonable expectations of the outcome. Is a 5% boost enough or will it take a 200% boost to make a dent in your goals?

Also consider what variables reduce the audience of your test: is the element you’re changing only visible to signed in users? Signed out users likely represent a big chunk of the total traffic to the page, is it still worth running without them? What other personas will and won’t be impacted by this test? How can you expand or tweak your hypothesis to create something that impacts a larger audience?

At Creative Market, we now rank our test ideas by potential impact before we decide whether or not they make it on the roadmap. This helps us weed out the small stuff early in the process.

Lesson 5: Tradeoffs Suck

On a complex site or app, any given page contains not one, but several flows that your business depends on. For instance, on a Creative Market product page, we have flows for saving products for later, sharing products on social media, and of course, purchasing products. I’ll be totally transparent: I never expected these actions to be as closely tied together as they are. Time and time again, we’ve seen that a change in one impacts the other. Getting a user to purchase is obviously our main goal, but what happens when we successfully run a test that increases the “save for later” action but decreases the number of users who click the sharing features? Did the test win? Should we ship it? The answer is complicated!

As it turns out, responsible testing requires that you know the value of each action a user can take, both in the short term and the long term. Without this valuable insight, you’re left wondering if the tradeoff is worth it.


Test Carefully Friends

Nothing beats testing your assumptions with real, actionable data. Unfortunately, testing is a minefield and there are so many wrong steps you can take along the way. I hope these lessons help you improve your testing habits. I’d love to hear about any valuable lessons you’ve learned the hard way!