Mapping Health Apps

Iman Muse
Data science at Nesta
8 min readJun 20, 2023

I am a Data Analytics Apprentice in the Data Analytics Practice at Nesta. My apprenticeship involves 20% theoretical training with Multiverse along with various assessments. Part of the apprenticeship level 4 end point assessment involved an eight week project that would be useful for Nesta’s missions and includes objectives overseen by Multiverse.

Introduction

From mindfulness to diet tracking, health apps have the potential to help people lead healthier lives. To find out which healthy living topics are represented in the apps space, I created a dataset of health apps and used natural language processing (NLP) techniques to analyse it. By looking at additional data, such as app installations, I was able to see which topics are most popular on Google Play Store. My analysis of these clusters of apps also allowed new insight into the effect of Covid on health apps, which apps are popular when it comes to mental health and loneliness themes and which food delivery apps are most used.

In this blog I will discuss how I created the dataset and my analysis findings.

Creating the health app dataset

The full pipeline for this project

First, I collected the app data from Google Play Store by using an API from the google-play-scraper python package. This package outputs information about each app searched for, including app description, release date, number of downloads and similar apps. It also allows you to search for apps using search terms. For example you can find the name of the first app found when searching for the term “health” by running:

from google_play_scraper import search
results = search('health')
results[0]['title']
>>>'Samsung Health'

I created an initial dataset containing information about 5,000 apps by firstly searching for 12 keywords. Then I searched for similar apps to a list of around 30 apps known to be about health (eg, Strava) that members of Nesta’s healthy life (AHL) team put together. After preprocessing, 1,412 apps remained. One of the steps included in preprocessing was keeping the apps with one of the following manually selected genres: health & fitness, food & drink, medical, sports. Google Play Store uses genres to categorise apps, which was useful when filtering out irrelevant apps. However, the genre is not granular enough for the purposes of this project. I will be using NLP techniques to create additional, more defined categories.

Clustering the health apps into 15 categories

The purpose of creating the clusters was to identify themes based on semantic similarity of the app description. A sentence transformer from the S-BERT library was used to embed the descriptions of the apps into semantic space, allowing me to then cluster the apps by the semantic similarity of their descriptions using K-means clustering.

Initially I used TF-IDF to generate cluster names, which I ultimately kept in the pipelines for reproducibility. However, I renamed the clusters manually to ensure the titles of the clusters were more engaging for the user during the data analysis phase (see fig1). An example of a TF-IDF generated cluster name was “Cooking-recipe”, which I manually renamed to “Recipes and Meal Kits”.

Overview: Visual representation of the clustering of health apps into 15 clusters. The cluster with the highest number of apps is ‘Workout’ with 428. The cluster with the lowest number of apps is ‘Cycle Tracker Apps’ with 12 apps. Presentation: Each app is represented by a circle, the colour of each circle reflects one of the 15 categories that had been assigned using k means clustering. The figure shows clusters of circles represented by distinctive colours.
Clustering of health apps. Interactive with this plot in the Streamlit app.

After embedding the app descriptions and clustering them, I manually grouped some of the clusters which I felt were similar. For example, I merged the clusters “training-workouts”, “walking-training”, “fitness-workouts”, “workouts-fitness”, and “workout-classes” and manually labelled them to “Workouts”. These clusters have some distinction between them, for example a cluster could be focused on cardio whereas another could be focussed on strength training. However, the distinctions are not concrete as there will be some themes such as skipping that would appear in multiple workout clusters. Thus having these clusters separate may make it difficult to draw meaningful conclusions from analysis. After this manual grouping I arrived at 15 app clusters.

You can see the final clustering of health apps in the figure above and explore them further in the Streamlit app.

Analysis

Various pieces of analysis can then be done on the 15 app clusters (themes) and the metadata. I was interested in answering several questions which I will discuss in the next sections.

Which clusters are the most and least popular?

Overview: The bar chart shows the mean daily installations of apps per cluster. The mean daily installation can be a good indicator of the level of demand of apps in particular categories. Presentation: A bar chart representing mean daily installations for the apps in each of the 15 clusters. The mean daily installations of apps for each cluster is represented using rows stacked up vertically, with width indicating the mean daily installation.
Mean daily installations of apps per cluster

A measure of global app popularity could be the number of daily installations worldwide. By using this measure, by far the most popular type of app is in the food delivery cluster. Daily installation was worked out by calculating the days between the app creation date and the date the dataset was scraped. Then dividing the total installation by the number of days.

The app themes with the lowest average daily installations include “Quit Smoking or Drinking” and “Mental Health”. Both of these address societal issues, and may relieve pressure to the NHS by providing tips, advice and monitoring the daily individual behaviours. Therefore, it may be important to make improvements to these apps to increase installation rates. Additional research, such as analysing the reviews to identify whether or not customers have had a positive experience when using the apps, could be done.

What was the impact of COVID-19?

Overview: The percentage change in app clusters released between 2019 and 2021. Presentation: A bar chart representing the percentage change of apps released in 2019 and 2021 for each of the 15 clusters. The percentage change of released apps for each cluster is represented using rows stacked up vertically, with width indicating the percentage change.
The percentage change in app clusters released between 2019 and 2021
Overview: The percentage change of the number of mean daily installations in app clusters released between 2019 and 2021. Presentation: A bar chart representing the percentage change of mean daily installations in app clusters released between 2019 and 2021. The percentage change of mean daily installation for each cluster is represented using rows stacked up vertically, with width indicating the percentage change.
The percentage change of the number of average daily installations in app clusters released between 2019 and 2021 (This figure does not include “Monitor Blood Pressure, Insulin and BMI ” for scaling reasons. The value of this cluster is 16,965)

I measured the percentage change in the number of apps released and the daily installations between apps released in 2019 and apps released in 2021 to look at the impact of Covid-19 on the health app space.

“Food Delivery” apps have been affected by Covid-19, where the number of daily installations have dramatically increased but the number of apps released have actually decreased suggesting a maturing market.

“Mental Health” apps were also affected, for these both the number of daily installations and the number of apps released have increased. Developers may have identified the need for more apps in this area to relieve the effects that Lockdown might have had. This has proven to be received well by customers as there is a percentage increase in the number of daily installations.

The largest percentage increase in apps released were in the “Quit Smoking and Drinking” and “Hydration Reminder” clusters. Developers may have identified a gap in this area where developers believe customers would want more apps in these spaces preceding the events of Covid-19.

Which apps are most in need of development?

Ranking daily installs and average score per app averaged by cluster (Rank 0 refers to highest value and rank 14 refers to the lowest value)

App clusters which are highly installed but rated poorly may be good candidates for areas most in need of development. “Digital Health Management” apps (including apps that monitor heart rate and fitness) are downloaded by far the most, but are scored the worst — so there might be an opportunity to present better apps in this space. Similarly for the “Monitor Blood Pressure, Insulin and BMI” cluster. I looked at the four most-installed apps in the “Digital Health Management” cluster and saw that the most installed app (Google Fit) is poorly rated in comparison to its other fitness tracking competitor apps (like Samsung Health).

What is going on in the weight loss/diet app space?

Reducing the prevalence of obesity is the focus of Nesta’s healthy life mission, so I looked at the most installed apps in the “Weight Loss/Diet” cluster. The second most popular app was Yuka — Food & cosmetic scan which scans the ingredients of products and tells you their impact on your health. Nesta is interested in customer purchasing behaviour and, as this app may have stored information about food bought by customers, it could be of interest to Nesta.

It was interesting that three out of the top four most-installed apps in this cluster were to do with intermittent fasting, an alternative to calorie-restricted diets. This may reflect scientific evidence for the effectiveness of intermittent fasting. For example, a systematic review from 2020 [1] completed by the Canadian Family Physician, examined the evidence for intermittent fasting and found that all 27 intermittent fasting trials reviewed found weight loss of 0.8% to 13.0% of baseline weight with no serious adverse events. Twelve studies comparing intermittent fasting to calorie restriction found equivalent results and the five studies that included patients with type 2 diabetes documented improved glycemic control. [2]

Can apps for mental health be improved?

From the analysis of the impact of Covid-19, it is clear that there are some efforts being placed by developers to improve the quality of apps available for mental health as there was an increase of 116% in apps released and an increase of 450% in the mean daily installations.

However, apps in the mental health cluster still have the third lowest daily installation relative to the other clusters. According to NHS England, one in four adults and one in 10 children experience mental illness, and many more of us know and care for people who do.. Even though the improvements made in this area are significant, can more be done? Can mental health apps be made reliable and effective so general practices can refer patients to apps for additional help and support. As resources are stretched can technology be utilised to encourage improvements in mental health apps?

I looked at the mental health apps with the highest number of daily installations. These also generally had high scores, apart from Feelsy: Stress Anxiety Relief which has a high daily installation rate despite having a low score of 2.36. Addressing the issues in the apps reflected by the score could further increase the installation rate, creating a more beneficial impact towards its customers.

Are there any apps to help with loneliness?

The AHL team aims to gain a better understanding on whether loneliness has an impact on health and whether they can help improve social connection among overlooked high-risk groups.

Of the apps that contain the words “isolation”, “connectedness”, “loneliness”, or “belonging”, we found two that were the most relevant to AHL:

  1. TalkLife
  2. Woebot: Your Self-Care Expert

Conclusions

In this blog I have looked at the health app space to see what apps are currently in the market. I used data science techniques of sentence embeddings, dimensionality reduction and clustering to find clusters of similar health apps. 15 clusters were identified, the largest cluster was Workout (including apps that focus on aerobic and anaerobic exercises) and the smallest cluster was Cycle Tracker (including apps that track menstruation cycles).

Creating this app dataset and developing an interactive visualisation of the health app space, allows the healthy life team at Nesta to explore what’s going on in the health app space. My analysis of these clusters of apps also allowed new insight into what effect Covid had on health apps, which apps are popular in the mental health and loneliness themes and which food delivery apps are most used.

Footnotes

[1] Welton, S., Minty, R., O’Driscoll, T., Willms, H., Poirier, D., Madden, S., & Kelly, L. (2020). Intermittent fasting and weight loss: Systematic review. Canadian Family Physician, 66(2), 117–125.

[2] At Nesta we do not endorse any approach that relies on personal responsibility for achieving our objective of reducing obesity prevalence. Nesta’s approach is based on the premise that food environment approaches are instead a more effective way of reducing obesity at large scale.

--

--