Testing methods for startups to find product-market fit faster, more reliably, and at lower cost

Published in

Shadowboxer.co

14 min readNov 15, 2022

Image showcasing IBM’s speech-to-text ‘mechanical turk’ pretotype experiment. Source: IBM

Who is Shadowboxer?

We're a venture studio with offices in San Francisco (US), 
Melbourne (AU) and Manchester (UK) that partners with early-stage 
ventures to accelerate their progress. Our team of senior product 
strategists, copywriters, designers, and engineers are experienced 
venture builders, making them the ultimate partner to early-stage 
founders. We partner with founders from pre-seed to Series A, 
working for a mix of fee and equity to ensure aligned incentives 
- we profit only when we ensure our partners succeed.

Interested in chatting with us? Reach out to us via shadowboxer.co

The oft-cited wisdom of founders who have achieved product-market fit is to ‘move quickly and break things’, ‘rise and grind’ or ‘respect the hustle’. The issue with this advice is that it essentially boils down to: ‘put in more to get more out.’ It’s not very helpful for founders, operators or teams who are already giving their all.

Putting in more effort doesn’t always deliver returns.

The worst way to find out if you’ve got product-market fit is to sink thousands of dollars worth of labour (and your own invaluable time) only to find that, in the words of entrepreneur and seasoned venture capitalist Marc Andreesen;

“… customers aren’t quite getting value out of the product, word of mouth isn’t spreading, usage isn’t growing that fast, press reviews are kind of ‘blah,’ the sales cycle takes too long, and lots of deals never close.”

If you find out this way, you’ve probably put in a lot — and despite your efforts, you’ll not be receiving significant reward.

It’s important to make the distinction between effort that delivers progress and focused effort that delivers meaningful progress. Our view is that a procedure to help you identify critical challenges can help focus your effort, and accelerate your business faster than any diet, sleep schedule or motivational force.

Fortunately, there’s a simple toolkit that you can apply to your business, product or feature concept (or really any future endeavours) to dissect, de-risk, validate and track your progress towards product-market fit:

Focus. Identify the foundational assumptions that your success or failure rely on, and define testable hypotheses
Test. Employ the fastest, most cost-effective and reliable procedure to test your hypotheses
Evaluate. Launch, measure, analyse, learn, re-adjust — rinse and repeat.

This article — part two of this three part series — is on testing: how to quickly, cost-effectively and reliably test hypotheses to zero in on what matters for your business.

Testing hypotheses

A business with a clearly articulated set of hypotheses about critical assumptions will lead founders or product leaders to the next step: designing a testing procedure.

Testing procedures can come in all shapes and sizes. You could speak to customers, suppliers or experts in the field. You might conduct a survey, design a mock-up, build a prototype, build the real thing, or a portion of the real thing. Maybe you tell someone you’ve built the real thing and peddle furiously behind the scenes to make it seem automated. Each and every avenue has its merits and constraints. Here, we hope to capture some of them to help you decide which is the most reliable, sufficiently speedy and cost-effective.

Let’s start with marketing research methods, which have never been more accessible.

Qualitative Research

Qualitative research can be done with an hour of your time and a willing participant (which, for most products or services, can be secured for $80 or less). Multiply that by ~30 to find yourself a very robust sample, and you’ve officially conducted research! Outsource your recruitment to a platform such as respondent.io, userinterviews.com or newtonx.com, and you’ll have suitable respondents on a zoom call within hours.

Quantitative Research

Quantitative research can be done with a few days of effort and a couple of weeks of patiently waiting for fieldwork. First, create a survey, refine it with your immediate network, then deploy it via a surveying platform like Qualtrics — this can be done in a day or two. For a few thousand dollars, you’ll have results from a nationally representative sample — all in a week or two.

There are countless variants of qualitative and quantitative research methods, and they can be incredibly valuable in optimising, refining and tweaking existing businesses and validating certain elements.

We’ve done them all for founders, but, when testing a conceptual idea, we’ve found that nothing compares to real, market behaviour. All too often, the social desirability effect and ‘participant bias’ wreak havoc on early-stage product and business concept evaluation. Sampling bias is common, where respondents try to be your ideal participant and you end up with a skewed sample. Plus, your phrasing of questions can easily introduce measurement error, where participants tell you what they think you want to hear, so you end up with skewed results.

Speaking from experience, we’ve conducted several of the most well-intentioned qualitative and quantitative research projects, surveying thousands of potential users of our clients’ proposed products and services. We’ve been told by 65%+ participants that they “definitely” or “very likely” buy and use the product. And time and again, the product or service falls flat in market.

“Consumers don’t think how they feel. They don’t say what they think and they don’t do what they say.” — David Ogilvy

In our experience, the core benefit of qualitative or quantitative research during early-stage business concept or product validation is to better understand the problem space and your user target(s).

When it comes to testing problems and conceptual or hypothesised solutions, observation of real-world behaviour reveals the truth. Building operational environments where you can facilitate actual, real-world behaviour that tests your hypotheses on your critical assumptions, and observe participant behaviour (or lack thereof) is the best litmus test for whether or not your business concept is likely to be functional.

Pretotyping

Enter, pretotyping — the hybrid act of ‘pretending’ your product is real/complete, and ‘prototyping’, building a functional, cut-down version of your product or service that people can use.

We have found pretotyping is a more valuable method of business concept validation and, perhaps more importantly, initiation. Unlike prototyping, pretotyping is a methodology that helps you test whether what you’re building is the right thing to build — while keeping investment leaner than if you were to build a full-featured product. In essence, we think of it like a method of ‘prototyping’ to quickly, cost-effectively and reliably test critical assumptions.

It might not be a new concept to many, but it’s crucial to practice in the early stages of new venture pursuit.

Alberto Savoia shares a great example from IBM in his free e-book, ‘Pretotype It’, to illustrate the meaning of a ‘pretotype’. (2)

A few decades ago, well before the age of the Internet and before the dawn of ubiquitous personal computing, IBM was best known for its mainframe computers and typewriters.
In those days, typing was something that a small minority of people were good at — mostly secretaries, writers and some computer programmers. Most people typed with one finger — slowly and inefficiently.
IBM was ideally positioned to leverage its computer technology and typewriter business to develop a speech-to-text machine. This device would allow people to speak into a microphone and their words would “magically” appear on the screen with no need for typing.
It had the potential for making a lot of money for IBM, and it made sense for the company to make a big bet on it. However, pursuit of this business concept would be incredibly costly — requiring almost-unfathomable computing power [for the time] and overcoming a very difficult computer science problem.
Many of IBMs research participants and customer-base had said they “would definitely buy and use” a speech-to-text solution, were it available, but some executives were unconvinced. After all, people had never used a speech-to-text system, so how could they know for sure they would want one?
So, IBM devised an ingenious experiment: they put potential customers of the speech-to-text system — people who said they’d definitely buy it — in a room with a computer box, a screen and a microphone, but no keyboard. They told them they had already built a working speech-to-text machine and wanted to test it to see if people liked using it. When the test subjects started to speak into the microphone their words appeared on the screen: almost immediately and with no mistakes!
The users were impressed: it was too good to be true. As it turns out, it was. What was actually happening, and what makes this such a clever experiment, is that there was no speech-to-text machine, not even a prototype. The computer box in the room was a dummy. In the room next door was a skilled typist listening to the user’s voice from the microphone and typing the spoken words and commands using a keyboard: the old-fashioned way. Whatever the typist entered on the keyboard showed up on the user’s screen; the setup convinced the user that what was appearing on the screen was the output of the speech-to-text machine. In doing so, IBM was able to test not just whether people liked the idea of a speech-to-text machine, but actually liked using one, in practice, in real-world contexts.
After being initially impressed by the “technology”, most of the people who said they would buy and use a speech-to-text machine changed their mind after using the system for a few hours. Even with fast and near perfect translation simulated by the human typist, using speech to enter more than a few lines of text into a computer had too many problems, among them: throats would become sore by the end of the day, it created a noisy work environment, and it was not suitable for inputting confidential or private material.
30 years later, we all tap away on a keyboard, every day. As it turns out, easier keystrokes and a smaller form factor was the innovation people wanted and needed .

This is a great example of the value of pretotyping.

The obvious path of hustlers trying to build quickly would be to ‘back yourself’ — to design and build a prototype of the speech-to-text service, sinking countless hours and dollars before being able to test market appetite. Yet, without validation of the underlying critical assumptions (that consumers would prefer speaking to their computers as their primary mode of input), IBM would have been putting a considerable labour and capital investment at risk. They went the smarter route — early, inexpensive evaluation of critical assumptions through pretotyping — which led them to scale back their investment.

Take it from us — using pretotyping to test and validate critical assumptions is one of the best things you can do, both to ensure you’re not wasting your own time, effort and capital, and because prospective investors seriously value this real-world data.

So, how do you pretotype? There are countless methods of pretotyping — most of which are well documented in ‘Pretotype it’ by Alberto Savoia.

The Mechanical Turk
Purporting to have an automated, technology-based solution when, in reality, you’re running things manually behind the scenes.
Focus of method: usage
Example: IBM’s speech-to-text experiment.
The Pinocchio
A non-operational version of your product that can be used with a bit of imagination.
Focus of method: usage
Example: Jeff Hawkins, creator of the Palm Pilot, carried around a palm-pilot-sized wooden block in his pocket for months to test whether the form factor made the product sufficiently portable.
The Minimum Viable Product
Creating a functional version of your business or product concept, but stripped down to its bare minimum.
Focus of method: appeal and usage
Example: Shadowboxer client, Ren, is building a technology-enabled service to support parents to nurture their child’s emotional wellbeing and development. One component of the hypothesised service is group discussions moderated by expert child-psychologists — so Ren have launched moderated, pay-for-access WhatsApp communities to validate usage of the proposed in-app experience.
The Provincial
Testing in a small region, network or sample.
Focus of method: appeal and usage
Example: Facebook, launching college-by-college, testing and iterating before moving on, or Tinder, party-by-party.
The Fake Door
Building a campaign, landing page or ‘entry funnel’ for your software, as if it already exists, to gauge appeal, before presenting a sign-up form or waitlist for when it really exists.
Focus of method: appeal
Example: Shadowboxer client, Greener, testing their product appeal amongst heavy shoppers and heavily sustainable shoppers to gauge interest.
The Pretend-to-Own
Before investing in buying, building or integrating your business or product concept, rent or borrow what you can, first.
Focus of method: appeal and usage
Example: Pop-up stores, or generally, avoiding significant capital expenditure when those business investments may be rented, first.

Depending on the hypotheses you’re testing, you’ll want to select a different method. Some of these pretotyping methods are focused on appeal — how likely people are to see, be intrigued by, and trial your product or service. Others are focused on usage — how likely people are, post-discovery, to use, derive value from, and continue to use your product or service.

Alberto Savoia defines ‘initial level of interest’ as follows:

👀 Initial level of interest (ILI) = number of actions taken / number of opportunities for action offered

This is essentially a discovery-to-conversion rate — the number of prospects who saw your trial call-to-action, divided by the number of prospects who trialed your product or service. It can be useful to measure this conversion rate across different prospect segments for comparison, prioritisation and to generate learnings.

For example, a client of ours at Shadowboxer, Greener, is building a sustainable shopping companion service, and used a fake-door experiment to test their business concept’s appeal amongst heavy shoppers (segment 1) and heavily sustainable audiences (segment 2). What they found was that the initial level of interest for their concept was much higher amongst heavily sustainable shoppers (segment 2). Greener and Shadowboxer have since made several decisions to further refine the product roadmap and serve this cohort as a priority.

Of course, appeal is only one side of the equation. A successful business needs to generate usage, ongoing, to retain any users they’re fortunate enough to acquire.

Alberto Savoia defines ‘ongoing level of interest’ as follows:

🔁 Ongoing Level of Interest (OLI) is best represented by a time-based graph (or table) rather than by a single number.
Each point/entry in the graph/table represents the level of interest at a particular date — the OLI is your trend.

To apply the concept of OLI to an example, let’s revisit the IBM speech-to-text ‘mechanical turk’ experiment. IBM had appeal validated. Their customer base and many more prospects were intrigued by the proposition, and had expressed to executives that they were willing to try it and would ‘almost definitely purchase it.’ However, through pretotyping and the hidden typist in a room, IBM discovered a usage problem — despite being an appealing proposition, it simply wasn’t practical to sit in an office, shoulder-to-shoulder with your colleagues, chatting away to your computer for hours on end — interest dropped off after a short period of usage — and they concluded the ongoing level of interest was insufficiently low.

When selecting pretotyping methods to test critical assumptions, it’s important to use mixed methods to ensure you’re examining your business concept, and its underlying critical assumptions, from all relevant angles. If a business concept is reliant on ongoing or repeat usage (as most are), you’ll want to measure both appeal (or ‘initial level of interest’) and usage (or ‘ongoing level of interest’). That means your experimental research design may require multiple separate pretotypes to examine different elements of, or assumptions about the concept.

Let’s take a look at another example.

Another Shadowboxer client, Ren, is building a technology-enabled service to support parents to nurture their child’s emotional wellbeing and development. The service involves multiple feature areas, which have been quantitatively tested via marketing research to assess appeal and willingness to pay (this is how Ren garnered an ‘initial level of interest’ — albeit a ‘stated’ one).

One hypothesised feature of the service is democratising access to the expertise of child psychologists, which are usually cost-prohibitive to most parents. Ren observed that parents already trade war stories of their children’s developmental hurdles via social media, which often garner well-intentioned, but mis-informed responses.

This observation led the team to a few critical hypotheses:

H1: Parents would be willing to pay a low fee to access expert-moderated group discussions (the same environment, without misinformation)
H2: Child psychologists could deliver valuable and sufficient guidance while being fairly recompensed as moderators of group discussions on Ren’s platform.

The team hypothesised that a low-fee, paid group discussion, moderated by an expert child psychologist could be a great way to deliver expert advice and guidance to those who typically couldn’t afford it, or wouldn’t think to seek it out.

Of course, Ren could go and build a complex messaging system, investing time and effort on encryption, account origination, psychologist recruitment and scheduling systems — the works — to accompany all the other hypothesised features the team were already working on. However, Ren sought to pretotype this hypothesised idea by testing appeal and usage.

First, they sought out new parent groups to test appeal by measuring initial level of interest. To do this, they used a ‘fake door’ pretotyping method — reaching out to parent groups online to see how many would be interested in their service for a low monthly fee, and asking them to sign up for a waitlist.

Satisfied with their appeal, and armed with initial level of interest data, Ren set up a minimum viable product to test usage. Would parents continue to pay ongoing after being granted access to the group, and would they keep returning to share stories, ask for advice, etc.? Or, would the group chat fall quiet after a month or two, with parents slowly dropping off the subscription list?

Rather than building anything at all, Ren chose to ‘rent to own’ the messaging service — opting to setup invite-only group discussions via WhatsApp. They would take subscription interest via typeform, payments via bank transfer and facilitate group discussions via WhatsApp. No coding required — while still delivering a fully-featured service for parents. Upon conclusion of the time-bound ‘rent to own’ pretotype, Ren found a high ongoing level of interest, concluded that they were onto something and are now busily scaling their venture.

Importantly, the pretotype examples discussed above focused on observed, rather than stated behaviour. The data doesn’t lie — there really were x number of people that signed up to use the service, and there really were y number of people subscribed and messaging over the course of the experiment. Not only did the Ren team avoid the potentially costly exercise of building a fully-fledged messaging system that wasn’t yet validated, but they also found their first y happy, engaged and active users ahead of launching the built product.

Understanding your business’ critical assumptions and foundational hypotheses is a great foundation on which to build acceleration towards product-market fit — but you need a way to test them, quickly, reliably and cost-effectively. Research (qualitative or quantitative) is more accessible than ever and can be quickly, cost-effectively executed, but in our experience, most helpful in understanding the problem space and activities prior to critical assumption identification, and is unreliable when seeking validation or invalidation of hypotheses. For hypothesis-testing, teams need to observe human behaviour in-context — the best method for which is pretotyping. There are countless types of pretotypes you and your team can employ quickly and cheaply to test business hypotheses, and the right method depends on what you’re seeking to test. Breaking down a business into critical assumptions, testable hypotheses, and then testing as many of these as sensible, quickly and cost-effectively, can help you and your business zero in on the optimal concept to invest in, without breaking the bank.

Stuart Aitken is a Strategy Director at Shadowboxer, a venture studio with offices in San Francisco, Melbourne and Manchester. Shadowboxer partner with investors’ portfolio companies, and directly with pre- and seed-stage founders, to accelerate venture progress toward, and beyond, product-market fit.

Interested in learning more? Reach out to us via shadowboxer.co

Appendix

1 — For more valuable reading on the topic of team, product and market, see Marc Andreesen’s article ‘The only thing that matters’.

2 — ‘Pretotype It’, Alberto Savoia, 2011, accessible here, and if you want to purchase the full book, available on Amazon here

3 — ‘How Superhuman Built an Engine to Find Product-Market Fit’, Rahul Vohra, 1st Round Review, 2018

4 — ‘Using Product-Market fit to Drive Sustainable Growth’, Sean Ellis, Growth Hackers, 2018