Brad and GwYneth, circa 1997

The Agony and Ecstasy of Building with Data

Ah, Data. And of course, Data’s best friend, A/B Test.

They’re like the the It couple of many a young software company these days. You can’t seem to turn a corner or make a sandwich without encountering their know-it-all allure or the gleaming exactness of their figures.

The hype is quite warranted. There are few tools more powerful. Like a lovely dram of scotch at the end of a long day, Data and A/B Test have the power to wrap you in the warm comfort of understanding, soothe philosophical disagreements, and make the hard decisions easier. How many people actually use that feature that I think is stupid and you‘d fall on the sword for? Instead of yet another long debate, let’s call over Data and answer the question. Do users actually prefer space monkeys or deep sea pigs? No need to offend anyone’e personal tastes—let’s send for A/B Test and figure it out.

Alas, as with most things in life these two can dangerously overused. They can serve as a crutch for a lack of conviction. Can’t decide how you truly feel about the future? Have Data and A/B Test take the decision out of your hands. Don’t want to invest long-term in something that has the uncertainty of failure looming in the horizon? Take the easy path instead with a bevy of A/B Tests. At their worst, Data and A/B Test lead to the kind of paralysis that comes from sprinting full-steam ahead only to find that you’ve been running on the treadmill known as Local Maximum.

Don’t end up in rehab. As awesome and enticing as Data and A/B Test are, there are a few common pitfalls to watch out for.

Data Pitfall #1: Picking the wrong metric to optimize for.

It’s nice to have a metric that everyone can rally behind. It’s easily explainable. It’s simple. You can have a giant monitor in your office that displays the number ticking up in real time and it will feel like Mardi Gras watching that number dance.

The problem is that it’s impossible to distill value into one metric. If you’re optimizing for a growth number, is it a cause for celebration if you get tons of people onto your service who only stay for a month before ditching, never to return? Vice versa, if you’re optimizing for engagement, is it worth it if some other company is eating your lunch by growing faster? Or if you’re getting 2-star reviews in the App Store?

Pretty much everyone understands this at a theoretical level. No data dashboard is ever only a single number. And yet, I have never heard someone say, “Our team spent too much time talking about the right way to measure value.” I have heard many, many more stories about teams optimizing for the wrong things at the expense of something else that wasn’t realized until the damage was done. It’s pretty natural to slip into talking about things like growth rate, revenue, DAU, etc because it’s what everyone else is talking about, without spending the time to sit down and rationalize whether it’s the right encoding of your values.

Data Pitfall #2: Over-pivoting towards what’s measurable.

Okay, let’s say you didn’t pick the wrong metric. Let’s say you were able to track every single possible number you could measure and weigh them carefully and thoughtfully against each other. Great. Unfortunately, you still don’t have all the information because not everything is measurable. For instance, how users feel. Sure, maybe you can ask them how they feel and aggregate that into some number, but that’s biased because the respondents will self-select, they might lie, they might not have understood the question, they might not be very good at transcribing how they feel, etc. What you really want to know is how they truly feel, and that is very, very difficult.

But of course it’s important. How users feel about your product is one of the most critical things to understand, and whether those feelings are changing based on the decisions you’re making. But—because it’s hard to measure, it’s easier to stick to numbers that feel more real and concrete, and as a result, roadmaps regularly tend to undervalue the things that don’t show up in a metric.

Data Pitfall #3: Biasing towards the short-term.

Let’s pretend for a second that you’ve picked the right set of metrics AND you’ve somehow managed to wire a sensor to people’s brains that gives you accurate readings on how they feel right then. Still not enough. Because while you may know what’s happening right now, it doesn’t tell you anything about what’s going to happen down the road. Would you take a short-term hit to growth for a long-term increase in trust (say, because you stop doing something that may be perceived as spammy?) Maybe. Depends on how much trust will be gained or lost. But even if you knew how your users felt the first time you nudged them into blasting an e-mail to their entire address book, it’s hard to know how they will perceive the whole of your app the second, third, tenth, or hundredth time they encounter something that annoys them.

A/B Test Pitfall #1: Spending too long perfecting tests.

Here is how a lot of A/B tests come to life:

Person A: We have seven great ideas that will help with growth/satisfaction/engagement/revenue. But we don’t know which of the seven ideas are the best.

Person B: Hmm, why don’t you runs them as tests to see which ones are the most promising?

This is a great strategy for narrowing the solution space of ideas. All of us have spent time toiling over stuff that didn’t end up mattering in the end (either because the product wasn’t a good idea, or it wasn’t effective, or our strategy changed, etc.), and knowing what to focus on so you don’t waste your time is invaluable.

The key phrase here is waste your time. Why would you polish to the last pixel each of the seven ideas if you aren’t sure which of the ideas are worth investing in? Of course, you don’t want to test something that’s so broken and janky it’s not going to give you good signal on whether the idea was a good idea in the first place, but if you’re just trying to figure out which general direction to pursue, the test doesn’t have to be perfect, especially since it probably involves a small percentage of the population. (Caveat: if you’re trying to decide between a few very precise things, like which language is clearer in a message, or the design of a checkout or registration form—then they should probably be the actual thing you’re going to launch.) But in general, the goal of an A/B test is to whittle down a large number of ideas into a few key levers to get signal on which directions are exciting to pursue. If it takes you two hours to come up with a rudimentary proposal to test but ten hours to perfect it, you are much better off investing 14 hours into figuring out which of the seven proposals are really promising, and then spending 10 hours perfecting one or two rather than perfecting all seven. 34 hours instead of 70—that’s a 50% time savings for 90+% of the benefit. Imagine what you could do with the rest of the time.

A/B Test Pitfall #2: Shipping successful tests right away.

Of course, as a part of doing the above point correctly, you need to have the discipline to not immediately ship any positive test. Just because an idea tested well doesn’t mean it’s ready to go. Since you rushed getting the A/B tests out in order to save time on narrowing down the options, you now need to invest the time and energy into building out an idea the right way.

A/B Test Pitfall #3: Running too many tests on details that don’t matter.

Seriously, eking out a quarter or half-percent in some metric here or there is soul-sucking work. You could easily do that for years and not get much farther than you are now. Instead, why not put the time and energy into figuring out which ideas are going to give you the step function improvements you’re hoping for?

There are costs to too many A/B tests piling up, including but not limited to: time spent designing, building and running A/B test; time spent on data analysis and post-analysis decision-making; code complexity due to tons of branching pathways; your users all having slightly different (probably subpar) experiences; bugs due to difficulty in observing/testing multiple variations of a product; etc.

A/B Test Pitfall #4: Relying on A/B tests to do anything innovative or large or multi-faceted.

You can’t A/B test your way into big, bold new strategies. Something like the iPhone is impossible to A/B test. If you had asked people or invited them to come into the lab to try some stuff out, they would have preferred a physical keyboard to a virtual one. If you had them use an early prototype of the touch screen where not every gesture registered perfectly, it would have felt bad and tested poorly. But the power of the iPhone wasn’t that the keyboard went from physical to virtual. It was what doing that unlocked—a rich gaming ecosystem. Beautiful web browsing. Full-screen videos. And the fact that all of it came together so splendidly in the details that the end result set a new bar for the industry. I would venture a guess that nothing along the way in the vein of data or tests would have indicated that that would happen. Nothing except vision and faith.

Data and A/B test are valuable allies, and they help us understand and grow and optimize, but they’re not a replacement for clear-headed, strong decision-making. Don’t become dependent on their allure. Sometimes, a little instinct goes a long way.