Real Life is Messy. We Should Design For It.

On sensor data and the need for more transparency on signal quality.

We need a paradigm shift in the wearable tech industry. In order for doctors, individuals and researchers to better understand when data acquired from wearables can be trusted or not, we need more transparency on signal quality.

photo credit: Beatrix Boros

If you’ve been using wearables, chances are that you’ve had some issues trusting the data. Wrong step count while biking? Erratic heart rate during exercise? Inaccurate calories count? Yes, real life is messy and often things do not work as well as they do under supervised laboratory settings, which is typically where these technologies are developed and validated.

Wearables are not new. We get it, there are reasons why data might not be reliable. We can blame human error, misuse of the sensor (how tight did you wear that watch?). Perhaps the circumstances of our particular use case are just too complex to deal with or too far away from the main intended use of the technology (trying to quantify your gym workout with much wrist motion with an Apple Watch?).

Does this mean data from wearables are garbage? Of course not.

But if that is the case, why haven’t most companies given users signal quality metrics? Why do we still have no idea if our Apple Watch provides us with accurate heart rate data or random output on a given moment?

But this story is not just about you and your Apple Watch. There are greater implications to these issues. When inaccurate data are used to build higher level analytics in epidemiological research, researchers may reach potentially ill-derived conclusions.

Example of erratic pace and heart rate data. While issues are obvious (crazy fast pace from time to time, heart rate all over the place even for constant pace and flat terrain), no data quality metric is reported. “Big data” kind of analysis built on top of these data won’t know and will provide inaccurate results.

On wearables, signal quality, trust and data science.

Recently I took part in the Health Technology Forum at Stanford Medicine, where I was invited by Ernesto Ramirez, head of R&D at Fitabase. Ernesto moderated our panel on Digital Health and knows a great deal about wearables and research.

In our discussion about precision medicine and the type of data we should collect Ernesto asked, ‘Wearables are capturing how we move, phones are tracking where we’ve been, apps are tracking what we eat, social media can track our moods and affiliations, EMRs can keep up-to-date health status, treatment, and medication usage. All of this is great, but what kinds of data are we missing?’

To this point, I insisted that we need to make an effort to collect metrics related to signal quality above all else. The question should not end at What kinds of data are we missing? but rather Can we trust the data that are being measured and collected?

We have accepted that inaccuracies are the norm (e.g. physiological data and motion artifacts), but almost no sensor reports signal quality or any information about possible misuse or malfunction.

The importance of validating the quality of signals serves many different levels, from healthcare to consumer to scientific research.

1. Healthcare. For data that clinicians can access, providing signal quality metrics promotes trust in wearable sensors and subsequently aids an increased acceptance of digital health technology in health care [1]. In this context, signal quality measures can also help the monitoring staff (doctors, hospitals, etc.) understand if users or patients employ technology correctly or need further education on its use.

2. Consumers. If we want individuals to take action based on wearable sensors data and other digital health tools, they need to trust the data acquired through them. While we, the company, need to spend time in the lab validating the accuracy of our wearable against other reference systems (optical heart rate against ECG for example), we also need to spend time understanding how our system behaves in unsupervised real-life settings.

More importantly, we need to be more transparent about signal quality and how information is provided. No sensor will behave correctly 100% of the time and consumers should understand this.

3. Science. As researchers work on taking digital health to the next level, the ultimate goal is to pool data from thousands of individual users to understand the complex relations between health, lifestyle and environmental factors. This process can eventually lead to the development of new guidelines and aid clinical practice. However, currently we, as researchers, spend more time determining when we can actually trust the data than building analytics on top of them.

Wearable sensors can misbehave. Sometimes we can correct the output and deal with the issues internally without showing the user. Sometimes there is nothing we can do, and we need to inform the user (consumer or researchers) so that actions can be taken. This means either discarding data or better educating individuals on how to best use a specific technology.

Trying a different approach.

At Bloomlife we developed the first consumer wearable able to measure electrohysterography (EHG), the electrical activity of the uterine muscle, and translate that information into contraction tracking and counting. Our system needs to operate under conditions with limited movement. To put it simple, when dealing with physiological data, there are always challenges linked to motion artifacts, as well as other muscles interfering.

The Bloomlife wearable sensor.

EHG has been used to detect contractions in hospital settings, but when using these tools outside of the hospital, a series of additional challenges arise.

Data quality is paramount. The user wants to better understand physiological changes during her pregnancy. Research teams want to better understand complex relations between physiological, environmental factors and pregnancy complications such as pre-term birth and gestational hypertension. To serve these needs, we spend much of our time developing methods to assess signal quality and provide such information to all parties benefiting from the data.

Building a machine learning model for noise* detection

*Noise in the context of signal quality means ‘artifacts’, or other sources of poor signal quality, not actual noise or sounds.

Estimating data quality, especially for physiological signals, is nothing new. Decades of research have gone into this problem, with entire PhD theses dedicated to it [7]. For most physiological measurements, motion is the main challenge. This is the case for blood pressure measurements [2], EEG (brain waves), ECG, PPG (heart rate, heart rate variability) and EHG data (contractions) [3]. Frameworks to assess data quality have been developed, looking at physiological soundness (are the data in a reasonable range? do the data behave over time in a reasonable way?) or contextual soundness (acceptable values change depending on context, for example heart rate while sleeping or while exercising) [4]. Similarly, other methods exist to detect for example sensor detachment from the body [6].

Since we are dealing with a less common type of data at Bloomlife, we decided to design a study and investigate the relation between potential noise sources and EHG data quality. We asked a series of study participants to perform several activities that could create noise (talking, resting, coughing, turning, stretching, walking, sitting, standing, bending, rubbing the belly, contracting, etc.). We then extracted time and frequency domain features related to uterine activity, cardiac activity (the bloomlife sensor can measure both) and movement (acceleration based). Finally, we built a machine learning model able to detect noise so that we could easily identify periods of high data quality and periods of noise / poor data quality.

Instead of only building models for contraction detection - our main application - we went back to the lab and ran additional clinical studies to develop models that act in parallel to help determine when data can be trusted. We can communicate this additional information to both consumers and research partners.

Example of reference contractions (top row) and additional data acquired using the bloomlife sensors (uterine activity, cardiac activity, movement). Output of the artifact detection model used to determine when signal quality is sufficiently high is also shown. Short periods of noise can be dealt with without disrupting the output signal (bottom row, where automatically detected contractions are also shown). On the other hand, sometimes data quality is too low and the uterine activity signal cannot be reliably constructed, hence no output is provided and the user is informed.

Designing for issues in data quality

We are not unique in our attention to determining when data can and cannot be trusted. Chances are many other wearable companies have done similar research on their own sensors. The difference is in the communication. We don’t hide “bad” data.

We don’t ascribe to pretending that the problem does not exist or providing output no matter what.

I’d argue that hiding bad data does not help the wearable technology industry. We will lose trust. At Bloomlife, we believe signal quality metrics are extremely important in real life, and that users should be part of the process. For these reasons we went through different design iterations to decide how to communicate data quality issues to our users.

This is not an easy process. People are not used to seeing gaps in their data when continuously wearing a sensor (but maybe this will change with more transparency from other companies). For our system, we designed different ways to inform the user about potential issues as they go. The goal: inspire individual action to dynamically improve data quality.

Examples of different visualizations used to communicate to the users issues with signal quality. In the first figure nothing is shown, while in the second one uterine activity is shown but not colored, highlighting how those periods should not be trusted or used to count contractions.

Where to go from here

Trusting data and determining a sufficient level of accuracy in data continues to be one of the main challenges for both application development and scientific discoveries based on wearable sensors or digital health tools deployed in the wild [5].

Putting accurate, validated and trustworthy digital health tools into the hands of individuals already shows great potential for positive impact at the N = 1 level, through increased awareness and empowerment. Including the user in the ‘data quality’ loop will further help her/him trust the tools and collect meaningful data. Meaningful data leads to better insights at the population level to prevent disease, identify modifiable risk factors, and design interventions for health behavioral change.

It’s 2017. Wearables have been out for over a decade. The reality remains: data from wearables are still not always good. This reality makes it even more important to know when data are good and can be trusted by both users and the researchers relying on these data for scientific discovery.

Transparency is key. Companies and developers need to educate their customers and provide ways to identify periods in which data cannot be trusted. Real life is messy, and segments of poor data quality are simply unavoidable.

Transparency is not easy. In fact, it’s quite difficult. Companies will need to allocate resources to additional clinical studies, data science & UI/UX work. They will have to educate users on the challenges and issues that might arise.

But it is time for companies to step up and tackle these issues head on. This is the way forward for the wearable technology industry.

For wearable tech to find value, the data must have value.

Let’s start a conversation. How is your company tackling this issue?

Join the prenatal health revolution! Recommend and share this post. Follow us here or visit us at