Metrics of Experimentation
Why there is always more than one metric that matters
By: Jordan Kerzee, Director of Digital Products at Bionic, part of Accenture Interactive
In the startup world, it’s common to hear people talking about “the one metric that matters.” This is the idea that every business should identify a single metric by which to track its growth and success, and then operationalize it across every aspect of the business, from research and experimentation to investment, advertising, and marketing.
There are two problems with this theory. First, real-world business challenges are far too complex to be reduced to a single metric to the exclusion of all others. There will always be more than one “metric that matters.” And just as importantly, insisting on a single metric raises the risk of choosing the wrong one.
To use a current example, we’ve seen that the three COVID-19 vaccines in the US (1) significantly reduce the chances of severe illness, hospitalization, and death, and also (2) make it less likely that vaccinated people will catch the virus in the first place. These two key metrics measure exactly what vaccines are intended to do, and all three vaccines are doing impressively well. However, unfortunately, some people see “breakthrough infections” among vaccinated people as proof that there’s no point in receiving the vaccine to reduce transmission. The end of the pandemic keeps receding, and the likelihood of vaccine-resistant variants keeps rising, in part because of this mistaken belief that the only “metric that matters” with vaccines is 100% effectiveness.
Most of us will never face the challenge of developing a service or launching a product that’s a matter of life and death. Regardless, as this example suggests, you have to clarify your goals in order to understand which metrics really matter. When you’re in the earliest stages of generating ideas and potential new products, your goals may be changing. To find out what actually matters, you have to experiment.
Multiple metrics matter
Early stage experiments aren’t conclusive — they are directional. You’re trying to discover multiple, possible directions to pursue, then decide which one is worth exploring further. You can’t be confident that you’re on the right path until you actually start going down it.
In early stage experiments, you’re trying to determine who your users are and what they do, individually and in aggregate. By aligning experimental and real behavior, you can determine what circumstances generate the behaviors you want to see. For example, to indicate that a customer would convert in an actual sales situation, you can conduct an ad test for a hypothetical product on an experimental website. Through the test, you can determine what gets a customer to click through an ad, try to complete the transaction, or sign up for a wait list after being told the product isn’t currently available. The customer’s behavior at each step indicates important metrics by which to judge the product.
Because the whole point of experiments is to see what happens, trying to spot the one metric that matters removes context and oversimplifies user behavior. Hoping that an experiment shows you what you want to see can lead you to overlook what’s actually happening.
Bionic recently worked with a food company that wanted a high degree of confidence in its decision to make a new product for people with food sensitivities. They used Bionic’s Blimp platform to create an ad to test consumer interest in a specific city. People who clicked through the ad were taken to a site with a description of the product that took them through the purchasing process, culminating in the option to sign up to be alerted when the product became available. The company then created a sample batch and told people who had signed up to try it that they could either have it delivered or buy it in person at a single location about an hour’s drive from our target market.
Of the people who wanted the product, 60% bought it online for delivery. It would be easy to consider this the “one metric that matters,” and take it as a sign to focus on online sales. However, we knew to also consider the importance of the remaining 40%. Despite being in the minority, these customers were so enthusiastic about the product that they were willing to invest three times the price of the product on gas and tolls to redeem a coupon allowing them to pick up the product in person. This unexpected metric revealed opportunities not reflected in the online conversion rate: that the product could be priced higher, that it might perform better on store shelves than online, and that some online customers might actually prefer to buy the product in person if it was available in a more convenient location.
Better experiments for better metrics
As you set up experiments to gather different metrics, these three qualities are key:
1. Fidelity: How well do your assets (ad, website, etc.) in the experiment align with what they would be in the real world?
In the case of a beverage, a low-fidelity asset would be an image of someone pouring from a bottle that’s implied to be your product. A mockup of the bottle would be higher, and an actual product image would be highest. The higher the fidelity of your images, the more reliable your data.
2. Behavior: How closely does your experiment simulate the real-world conditions you care about?
For example, if real-world shoppers would click through an ad, put things in a shopping cart, initiate checkout, and pay, your experiment should mimic that experience up to the point of payment, then ask them to sign up for a waitlist instead of taking their payment details.
3. Stage: Does your e-commerce testing website look like it belongs to a genuine, maturing organization?
Consumers have been fine-tuning their e-commerce scam-meters for a while now. If your website is vague or unclear, short on information, or lacking in functionality, users are less likely to trust it, and less likely to respond to it in relevant ways.
Learning from marketing and sales
The same tools you use to expand the market for existing products can be adapted to discovery and product development so that you can extrapolate from experiments to real user behavior.
Consider how similar these metrics for product discovery experiments are to marketing metrics:
- Reach: Total number of users who saw an experimental ad
- Click to Respond (CTR) percentage: How many people clicked on the ad
- Add to Cart (ATC) percentage: How many people added the product to their shopping cart
- Conversion percentage: How many people tried to purchase the product/check out
- Signup percentage: How many people signed up for a wait list after being told the product wasn’t currently available
- Cost per result: How much it costs per ATC, i.e. for each person who expresses an interest and acts on it
Your marketing and sales teams use data to identify users’ issues with an existing product, improve it, and use approaches like split testing and funnel optimization to do a better job of selling it. Using the Blimp platform with our partners, Bionic applies similar technology and frameworks to help identify consumer pain points for which there is no good solution, create a product to solve it, and then iterate on audience, messaging, and brand until arriving at the right product for the right users at the right cost per result.
All the data in the world can’t deliver a perfect guarantee that a product is worth developing and will be successful. Ultimately, consumer desires are too mercurial and personal to be fully quantified. Don’t feel limited by focusing on one “metric that matters” — dive into the potential relevant metrics, and feel more confident about whether or not to keep investing in and testing a specific new product.