What scientific evaluation of an app is and why do you need it for your healthtech product

Sonia Ponzo
Flo Health UK
Published in
6 min readMar 30, 2022

Article by Sonia Ponzo, director of science at Flo Health Inc.

Mobile apps can be a helpful resource for end-users and health care systems alike. They often fill in for services that governments and health care providers used to be able to offer. A clear example of this is the boom of mental health apps during the pandemic when in-person services were at capacity. A recent report from the Organisation for the Review of Care and Health Apps showed a 200% increase in searches for mental health apps during the first lockdown. In response, the number of available mental health apps doubled between January 2019 and January 2022. Similarly, digital solutions for remote patient monitoring started gaining popularity, with hospitals trialing these new “virtual wards.”

The increased popularity of digital health products makes their evaluation even more important. Evaluating digital health solutions is crucial to find out if the product is fulfilling its intended use. For example, if we develop a mobile application to reduce workplace stress, we expect people who use it to show a reduction in their stress levels. To know whether that is true, we need to monitor users’ stress levels and determine whether any observed change was actually due to our product.

Why is scientific evaluation important?

Several products end up in the market without any regulation, evidence base, or evaluation. The ultimate judgment lies with the user, who may not have the tools to determine whether a certain product is scientifically evaluated. In some cases, users may turn to their health care provider to obtain recommendations as to whether specific digital products are useful and safe. In this respect, health care professionals play a crucial role in directing their patients toward products that may help them (and may even be key to the patient’s health care journey). At the same time, this generates a high level of responsibility for the health care professionals, who take the risk upon themselves.

More and more researchers and advisory groups are compiling resources to help digital product developers evaluate their solutions as they build them. An example of this is the “Evaluating digital health products” guide published by Public Health England, which includes an exhaustive set of resources for app developers who want to evaluate their product.

How does scientific evaluation work?

Choosing the correct evaluation technique depends on several factors, including the stage of the product development process, the intended audience, the presence of risks posed to users, and the aim of the solution. For instance, a fully developed app aimed at treating depression in adolescents requires a much more extensive investigation than a prototype for a journaling tool.

Different types of study design can answer different questions about a product:

  • Qualitative studies
    Qualitative studies go in depth into how users feel about the product. Possible study designs include usability testing, focus groups, and interviews. This type of study is helpful for informing future product development. An example is this qualitative study run by Happify Health to gather users’ experience with their product and its potential impact on loneliness.
  • Descriptive studies
    Descriptive studies are used to get a snapshot of the product. With this study design, app developers can get descriptive statistics to find out how many active users they have and if they like the product by analyzing collected data, surveys, and audits. Developers can also run observational studies to investigate specific effects in real-life situations. An example is a retrospective observational study conducted on Noom app users to show the effects of using the app on weight reduction and maintenance.
  • Comparative studies
    Comparative studies are used to test the efficacy of a product. They require the target product (e.g., a weight loss app) to be compared against an alternative (e.g., a web-based collection of government guidelines for weight loss). The choice of the alternative can range from nothing to a competitor app already in the market. For instance, Headspace conducted a randomized controlled trial (RCT) to test the effects of their app on stress, affect, and irritability against a psychoeducational audiobook.

Randomized controlled trials are the gold standard of comparative studies. In an RCT, participants are randomly allocated to either the target intervention (e.g., the weight loss app) or the alternative (e.g., the web-based resource mentioned above). Measures of the intended outcome (in this case, weight loss) are taken before the trial starts and after the trial ends.

The randomization process allows developers to reduce any bias coming from participant selection. Let’s imagine the app developer recruited participants for the intervention group using flyers in local gyms, and for the control group, they left leaflets at the supermarket. The population they are using for their study is inherently biased. You’re much more likely to find individuals who are already committed to weight loss in a gym rather than the supermarket. On the other hand, if the developer decided to randomly allocate participants to either the intervention or the control group, they would have created more balanced groups. That’s why random allocation is a crucial step in an RCT: It allows the developer to conclude that observed effects on weight loss were actually due to the intervention rather than other factors.

An RCT is a rigorous, widely used way to test the efficacy of a product; however, they are often time and resource-consuming, and trials conducted on previous versions of the product may quickly become obsolete. New study designs, more in line with the constantly updating nature of the digital development process, are now becoming increasingly popular. This is a necessary step to make it easier for companies to incorporate some form of evaluation in their practices. Whatever the design chosen, it is crucial that app developers generate public-facing claims based on the type of evaluation they have carried out and with the right level of evidence.

What are the key challenges in evaluating digital health products?

Startups and smaller organizations face a number of challenges when attempting to evaluate their products. Often, they have competing priorities that make it difficult to allocate time and money to research and evaluation.

Another issue is the continuous development of the product. If a company proves the efficacy of their product in a clinical trial and then makes substantive changes to the product, that initial evaluation becomes obsolete. This is particularly important for companies trying to get their product registered as a medical device by regulatory bodies (e.g., the FDA or MHRA). The process is often long, expensive, and requires up-to-date evidence on the product, which adds to the costs.

On top of this, the evaluation of digital health products is not a straightforward process. Mobile apps are often referred to as “black boxes.” They normally have several features that interact with one another in ways that may not always be expected. Their efficacy may be due to the sum of all these effects, including user experience, the feel of the app, as well as the content. Thus, evaluating a digital product often requires companies to have experienced in-house research teams.

The responsibility for the evaluation of digital health products lies with app developers. Nevertheless, more flexible frameworks for digital health evaluation as well as substantive changes to regulatory processes are needed to help app developers with this challenging task.

Currently, we are looking for a Senior Workplace Scientist to join our team. Check out this and other open roles in the Medical and Scientific affairs section and apply now!

--

--