Fake doors. Real insights.

Matheus Winter Dyck
Babbel Design
Published in
5 min readMar 1, 2023

How we get quantitative feedback about design concepts from real users within hours with “hacky” fake door tests.

Photo by Hal Gatewood on Unsplash

At Babbel we are always looking for ways to de-risk solution concepts before jumping into their development. It has sort of become second nature to identify where we make assumptions about a solution and find ways to verify them. We have even developed a handy toolkit to help us.

In a recent project we were working to address the opportunity “how might we show language learners progress when they finish an activity?”.

Solution concepts for the selected opportunity.

We had a bunch of ideas that were quite distinct from each other and approached to solve the problem from quite different directions. But because we weren’t sure about what information to show, and if the end of an activity is the right time for that feedback, we really wanted to check the desirability of the concepts. So naturally we looked for ways how we might get some reliable signal from users.

Qualitative tests are limited.

In the past we would a lot of times have turned to traditional qualitative tests (e.g. moderated user interviews, first click tests, unmoderated task-based tests).

While they are helpful in some cases, we have found that they aren’t the most reliable tool to test desirability once we’re past the solution discovery. Qualitative tests normally only represent the perspective of a limited number of users and are prone to biases if not designed with a lot of care (answers are self-reported; users who participate in studies tend to represent a more engaged user group). In this project, we had already obtained low-fidelity results through other qualitative research. But the question we had at this stage was:

How might we get more reliable data about whether users are interested in the designed solutions, without going overboard with an expensive AB-test?

Solution

When thinking about possible answers we had a crazy, kind of hacky idea:

What if we use the in-app survey tool that we have available to us and use it for our purposes to run fake door tests? 😏💡

Quick definitions

The in-app survey tool: It’s commonly used to invite users to participate in UX research or offer feedback. It’s able to surface surveys as pop-ups to app users based on pre-defined events (e.g. finishing a lesson, finishing a review, seeing the homepage, etc.)

Fake door tests: It’s a dummy simulation of real in-product experience. More info here.

The marriage between the “hacked” survey tool and a reliable semi-quantitative UX research method gave us just what we needed:

  • Through the definition of triggers we can easily define when and how many users would see the fake door.
  • The setup only takes about 2 hours: Set up the campaign, refine the wording, test it, done.
  • Even though we release the campaign only to a subset of users, we are able to get hundreds of responses within a few hours.
  • And responses here are actual clicks on a button. So instead of measuring self-reported behavior (“Of course I want to know my review accuracy”) we were able to measure the actual behavior (button clicks)
  • Bonus benefit: As we are using a survey tool, we can add more questions to the flow in order increase the number of learnings we get from one experiment.
This is what the user experience of a fake door somewhat looks like.

The results of this experiment gave us insight in how many people are interested in checking their review accuracy after a review — measured by clicks on the check now button.

Conversion rate on the first popup considered a signal for interest.

Moreover, it has given us more detailed insight into when users would like to see such information. To be fair, this is self reported data and needs therefore to be strongly scrutinized— but it has given us strong enough of a signal to eliminate a couple of concepts from the race.

2nd question aiming at the timing of feedback.

Use cases

  • Use this tool when you have clearly defined desirability assumptions and your project.
  • Don’t use this tool when your hypotheses aren’t clear yet. In that case try a qualitative method first to uncover the user needs.

Limitations

Since we’re using a survey tool for purposes which it isn’t built for, there are a couple of drawbacks:

  1. Most glaringly, the user experience is not perfect. The promised/advertised feature doesn’t exist yet so the experience will inevitably end with some degree disappointment for users. Depending on how critical trust is in your industry you might need to be more cautious in using this tool to de-risk assumptions.
    We address this drawback by only collecting from a small segment (usually up to 500 per test), to reduce the amount of damage we’re causing to the users’ experience
  2. The tool triggers the pop-up surveys in the app based on emitted events from the app. However, adding new events to the tool is something you need engineers for. So unless you have a tool that has a good set of events to start with, some upfront investment is necessary.

This testing approach was developed in a collaborative effort with my stellar colleagues Anna Stutter Garcia, Mauro Fernández, and Lisa van Aswegen!

What new ways of learning from users can you explore?

--

--