After running 2,000 experiments for Fortune 500 product teams, here’s what we learned

By Nis Frome, Thor Ernstsson, & Michael Williams

The ‘Lean’ movement has taken the corporate world by storm, but there are still countless barriers for product teams that seek to adopt its experiment-driven ethos and make decisions informed by customer data. That’s why two years ago we started building Alpha, a platform for Fortune 500 product teams to turn hypotheses into customer insight within 24 hours without having to tap any internal capabilities or navigate compliance obstacles. In the process, we’ve learned a considerable amount about corporate culture, the nature of user research, and product management processes.

Today, our clients include forward-thinking product teams from AT&T, Capital One, PwC, Aetna, and many others. Recently, they collectively surpassed 2,000 experiments on our platform! After generating roughly 660 prototypes, they got feedback from nearly 400,000 users. The result: 46,000 minutes of video from moderated and unmoderated interviews, 6,500 charts, and hundreds of undoubtedly brilliant — and informed — product decisions.

We spent some time mining our databases (for what we call ‘experimetrics’) and reflecting on client conversations to extrapolate what we’ve learned along the way. Below are the seven most meaningful and actionable insights we found:

1. Change is difficult. The old adage is painstakingly true. As a startup, we have to keep in check our myopic perspective of the world — talking to customers may be an organic part of our job, but, as we learned, that’s rarely the case at a large organization.

Despite long believing in the value of rapid prototyping and experimentation, Fortune 500 product managers generally operate in environments with many competing priorities. User research is typically expensive and executed by internal teams or agencies in monthly or quarterly cadences. The ability to turn around research in less than a week, yet alone a day, is completely unheard of.

And while ‘on-demand user insights’ sounds appealing, in practice it challenges many corporate conventions, the most entrenched of which is the bias to overplan. When research cycles take months, it’s critically important to make sure that each aspect is carefully crafted and vetted. But when you accelerate that process to a matter of hours or days, iteration eliminates the need for exhaustive planning.

Our data illustrates how difficult this change in mindset and behavior can be. At full capacity, individual product teams execute about 8–12 experiments per month on our platform. Even with workshops and extensive onboarding, it takes anywhere from three to six months for clients to reach that bandwidth. Sure, some of that time is spent figuring out how to quickly turn data into decisions. But the overwhelming majority is consumed as a product team culturally and practically shifts from waterfall to agile experimentation, recognizing that planned research pales in comparison to iterative research. Spending two weeks outlining customer research that will inevitably be flawed is no match for six iterations that can be executed in the same timeframe.

On our podcast, This is Product Management, Cindy Alvarez, Director of User Experience at Yammer, echoed one of the most common sentiments about practicing ‘Lean’ and ‘Customer Development’ within a large organization. She urged listeners to stop planning and just go start talking to customers because it’s impossible to get better at doing it otherwise.

She’s absolutely right and it’s a strategy we are heavily invested in. We started pre-populating new client accounts with research executed for them, including customer insights into competitive benchmarks and usability across their respective products. So far it’s been a helpful ignition for product teams to begin iterating.

2. Sometimes, formality trumps informality. Continuing the theme from the previous insight, we’ve learned that, even once clients hit full speed, it doesn’t quite resemble the cadence of how startups practice experimentation. We initially designed the product so that any stakeholder could easily submit an experiment on an ad hoc basis, which is similar to how we operate. Instead of running impromptu experiments though, our clients submit experiments in batches, often weekly.

And it turns out there’s a good reason for this. While a fluid workflow makes sense in a startup, it typically doesn’t within a large organization that has various stakeholders with different (and often competing) objectives and projects. Product managers diligently consult these stakeholders when explaining customer feedback and deciding on next steps. A predictable and recurring cadence is often necessary to keep everyone on the same page.

That’s why concepts like the ‘design sprint’ have taken off: they allot time for stakeholders to get aligned. We’re embracing the role that formality plays here, and now encourage clients to organize ‘experiment sessions’ on a regular and consistent basis, so long as those sessions end with testable hypotheses.

3. Product experiments can be grouped into discrete categories. Before we could create a platform and workflow to accelerate user research processes, we had to better understand the types of research product teams need in the first place. That’s why, before writing a single line of code, we conducted the first 500 or so experiments manually using third-party tools.

We found that user research experiments involving prototypes (as opposed to doing experiments in a production environment) generally fall into one of six discrete categories. One of them, usability testing, has a widely accepted definition. We had to delineate the others though, and while our definitions are by no means gospel, they suffice surprisingly well, requiring only modest ongoing revisions. Each category is accompanied by ‘rules of thumb’ and a suite of configurable experiment templates, which you can read about in our guide to prototyping, but here is an overview of each:

Here is a breakdown of the popularity of each test run on our platform:

We have plenty more research to do, but these working definitions enable user researchers in our exchange to take virtually any client request and turn it into an executable study within minutes.

4. All research is biased. Our offering primarily includes testing in what we call a ‘simulated environment.’ The users who provide feedback know that they are part of a study and are paid for their time. They interact with high-fidelity, interactive prototypes, and generally understand that the products have not been engineered and released to the market.

We specialize in this type of testing because product teams can learn a tremendous amount from it while complying with their organization’s existing processes and risk tolerance. No internal engineering or design resources are required; no valued customer becomes the victim of a half-baked product; and no legal department needs to be consulted. Of course, the data is not as reliable as what you’d learn from shipping a product.

All research, including ours, suffers from a degree of bias. But acknowledging such isn’t an excuse to avoid doing user research altogether. It’s an argument for the opposite: to fervently do even more research and attempt to minimize the bias across it. Thinking otherwise is missing the forest for the trees.

One of the core principles of the scientific method is the concept of replicability — that the results of any single experiment can be reproduced by another experiment. We’ve far too often seen a product team wielding a single ‘statistically significant’ data point to defend a dubious intuition or pet project. But there are a number of factors that could and almost always do bias the results of a test without any intentional wrongdoing. Mistakenly asking a leading question or sourcing a sample that doesn’t adequately represent your target customer can skew individual test results.

To derive value from individual experiments and customer data points, product teams need to practice substantiation through iteration. Even if the results of any given experiment are skewed or outdated, they can be offset by a robust user research process. The safeguard against pursuing insignificant findings, if you will, is to be mindful not to consider data to be an actionable insight until a pattern has been rigorously established.

That’s why we make sure that for almost every experiment, qualitative and quantitative research is conducted. Further, we strive to generate insights that are comparative — it’s rarely enough to learn what users think of a prototype in a vacuum. In the real world, users have an array of options to satisfy any given need, so we make sure that feedback on a solution is always relative to an alternative. Combining and optimizing these two approaches has greatly minimized bias, and often leads to a plethora of data from which to identify patterns and insights. And, of course, we stress the importance of incorporating other data inputs, like traditional market research and in-app analytics.

5. User feedback never ceases to surprise us. You would think that after generating data from hundreds of thousands of users, we’d have ‘seen it all’ when it comes to feedback and insights. But that isn’t even close to true. We continue to be surprised by what we see on a daily basis, primarily with regard to…

…the difference between what users say and what they do.

It’s been well established that humans are pretty bad at predicting their future behavior. We’ve researched the psychology of that dynamic extensively. But it’s still surprising when we find virtually unanimous support for a feature in a survey and subsequently find absolutely no interest in the feature once it’s prototyped. Putting a visual stimuli in front of your target market is absolutely essential for substantiating findings.

…the sincere emotions expressed.

Market trends change rapidly and product teams are in a constant hustle to keep up. Few things get them to drop what they’re doing and sit silently as well as watching a video of an emotional user interview. We’ve witnessed a senior citizen cry profusely as they interact with a prototype that invokes nostalgia. We’ve giggled as a Millennial described how much they hated a product concept and all the things they’d rather use instead of it. We’ve been shocked by a gentleman who opened up about how a new product could help him rebuild relationships with his kids. User research is truly an emotional rollercoaster.

…the validation of passionate enthusiasm.

One of the most common questions our clients ask is: “How do we know when we’ve validated a product concept with customers?” While we don’t have any hard-and-fast rules, we’ve half-joked about applying the “Pokémon GO Benchmark.” For fun (and because we’re addicted to the game), we executed research against a few hundred users of the mobile game. The responses were impressively enthusiastic and exemplified patterns to look for when assessing validation. Players gave detailed feedback to open-ended questions, spent significant time engaging with prototypes, and routinely offered to pay for new features we designed. Obviously, every product doesn’t need to be a meteoric hit to find success, but evaluating outliers like Pokémon GO serves as a powerful anecdote.

The key takeaway is that even when we think we know a user segment really well, research findings are rarely predictable or obvious. You simply can’t underestimate how difficult and rewarding having empathy can be.

6. Shorter iteration cycles unlock deeper insights. When our initial clients finally starting rapidly running experiments on Alpha, it became clear why generating meaningful customer insights is often so elusive for companies that take months to execute research. Speed in and of itself is the key.

When iteration cycles are slow, product teams prototype and experiment until they generate promising results. The moment they get the slightest sense that they’ve struck gold, they start engineering a solution (if they haven’t already started). In essence, they learn ‘what’ resonates well, but they don’t have the time to learn ‘why.’

But when we accelerated the research process to days, we found that clients were no longer content once they validated a product concept. They finally had the time and bandwidth to ask ‘why’ a prototype was perceived as more valuable that earlier iterations or alternatives. To keep up, we had to build out an extensive qualitative workflow so that we could go back to a sample of users that tested a product and ask them open-ended questions. In doing so, we were able to unlock ‘deep insights.’

We define a deep insight as a comprehension of a customer persona that is so robust that its value transcends the individual project that a product team is working on. It is useful to anyone in the organization who is focused on delivering value to the same market. Instead of merely understanding that customers prefer your prototype with an expensive one-time purchase compared to a cheap monthly subscription, you conduct interviews to learn why and discover that customers are actually afraid of forgetting to cancel their subscription. That’s an insight that is so meaningful, it can be applied to other products in your organization’s portfolio. And it’s made possible by speed.

7. Data is a means to an end. It’s easy to get lost in the buzzwords du jour rather than to do the hard work of discovering value and driving ROI. We learned quickly that to build a successful platform, we’d have to deliver to product teams more than the ability to be ‘data driven.’

Initially, our assumption was that data that clients generated within Alpha would translate directly into better product decision-making. That’s true to an extent and it certainly matters to the organization as a whole. But when we really investigated what was going on, we found that being data driven isn’t really what product managers want or need.

We listen intently to how our clients communicate the value of our platform and experimentation to peers at other organizations. Frequently, they mention how it aligns their team around hypotheses rather than opinions. Instead of two hour meetings filled with debates, the team spends 15 minutes putting hypotheses into Alpha and then 15 minutes reviewing the findings when they’re ready. One product manager discussed how he uses Alpha simply because the data gives him a reason to email his director an update once a week. Another spoke about how thrilled he is to influence other departments to recognize the value of iteration and learning.

Of course, data is critical to enabling all of these benefits, but it’s a means rather than an end. And that matters because it informs our product roadmap. For example, early on we didn’t put much effort into the data visualization of research findings. But now we understand that presentation is just as, if not more, important than the underlying information, because it’s going to be shared and used to influence stakeholders. Recognizing how product managers must manage upward, sideways, and downward led us to prioritizing features like reporting and sharing.

We’ll continue to update this list as we learn more. If you’re as passionate as we are about experimentation and customer insights, join our team. Or give Alpha a spin and start making smarter product decisions :)

Want to submit your story to Product Management Insider? Click here for details.
Show your support

Clapping shows how much you appreciated Nis Frome’s story.