Shazam your health!

Using voice and AI to improve the lives of millions of patients

Dr Guy Fagherazzi, PhD
10 min readSep 1, 2021
The Shazam app can identify any song in seconds

I am a music addict. When I don’t do research or take care of my kids, I dedicate a lot of my free time to listening to live DJ sets on Youtube from all over the world and look for new tracks to discover. And the Shazam app is always by my side in these moments. I have consistently been amazed by Shazam, how easy and quick it is to identify a new song you like after just a few seconds of recordings. The way the technology works is mind-blowing but very simple at the same time (which is the general recipe for success, right? #KeepItSimple). The raw audio signal is converted into a spectrogram — a visual representation of the frequencies of a signal with time — and is then compared with entries from a unique, large database. If it finds a match, it gives back the results with the name of the artist and the song. And it works regardless of your smartphone’s microphone quality or the environment in which you are recording the track (be it a quiet room or at a concert).

A spectrogram from a recording of a person with COVID-19 reading a text (Provided by the author).

Now let us imagine for a moment that we could apply the same approach to monitor your health. You record a few seconds of your voice, by reading a prespecified text or pronouncing a specific vowel (“aaaaaaah” on one single breath for instance), and your smartphone gives you back accurate information about a specific symptom or condition you are tracking. Crazy, right? Well, not that crazy.

Voice for health

The logic behind is quite simple: when you have a disease or a symptom (let us say chronic fatigue), some organs such as your heart, lungs, brain, muscles are usually impacted and, in return, as all of these organs are mechanically implied in voice production, it can affect the way you speak. Differences may not always be detectable by a human being — but that being said, I am pretty sure that you have already guessed that someone you know was not going well just by the way they sound on the phone, haven’t you? We now want to rationalize and professionalize this “intuition”.

Thanks to recent developments in audio signal processing and the power of artificial intelligence (AI), it is now possible to analyze huge amounts of data and detect small differences between people with or without a symptom and then build what we call a vocal biomarker. A vocal biomarker is a combination of signals from your voice that predicts a symptom. And just like Shazam, we convert your voice recording into a spectrogram, we extract thousands of audio features to deeply characterize your voice and we feed this to AI models to predict the presence of a given symptom or health condition.

Using voice for monitoring diseases is not a new idea. The first important work on the topic originated early in the century with speech recording (with landline phones, you remember them?) in people with Parkinson’s to assess disease progression. Since then, many other neurodegenerative, respiratory, or mental health diseases, such as depression, have been investigated. Now, scientists like us are studying other important diseases which were little investigated so far, like cancer, diabetes, or cardiovascular diseases.

And COVID-19 too, of course.

Lessons from COVID-19

Photo by engin akyurt on Unsplash

Oh boy, the COVID-19 pandemic is such an accelerator for digital technologies and AI. But the magical AI-based solution is not there yet, we are quite disillusioned when it comes to the actual use of AI technologies for clinical practice and COVID-19, but this is no surprise to me because guess what: good research takes time!

We are now in the middle of a worldwide pandemic affecting millions of people and voice-related research is currently organized in two main streams of research to help people living with COVID-19 or Long Covid when symptoms persist after the acute phase and convert into a chronic disease. The first stream aims at designing a kind of “COVID-19 screening test” based on voice. Peers from the Massachusetts Institute of Technology (MIT) have trained an AI algorithm where they can detect COVID-19 with very good accuracy, even in asymptomatic individuals. The second stream of research — the one my team is dedicating all its energy to — is on the remote monitoring of people with COVID-19 or Long Covid. We are training vocal biomarkers for monitoring the resolution of symptoms, or specific symptoms such as fatigue or the loss of taste and smell which are very frequent in people with COVID-19 or Long Covid.

Avoiding close contact between patients and healthcare professionals is also recommended for infectious diseases, so using voice to remotely monitor the presence or not of such symptoms can be of great use to track the health of people in isolation.

To develop such vocal biomarkers related to COVID-19, we have asked more than 400 people with COVID-19 confirmed by a PCR test to participate in a large research study in Luxembourg and we have asked them to regularly record their voice with an app at home.

We have been able to create a unique database of more than 5600 voice recordings of people with COVID-19 on which we have trained different AI models.

Less blood, more voice.

The use of voice holds big promises. Let us have a look at the situation today. If you are feeling unwell, you go to the doctor, they tell you to go to a lab and do a blood test, you wait for the results, you go back to your doctors with the results and you receive a new treatment. What a burden for the person who does not feel well or is worried about their health! Could we do better and improve this type of workflow? I am pretty sure we can.

So our mission is very clear. With voice, we want to decrease the burden for the patients, we want to provide a remote, non-invasive, instantaneous solution to monitor symptoms, diseases, and health.

The future is bright!

Let us take a few steps into the future.

The year is 2025. Meet Angie, a 45-year-old woman who has been treated for breast cancer and is now on hormone therapy. She has follow-ups every 6 months by her oncologist. The oncologist asks Angie if she is OK to install an app on her smartphone on which she will have to read a few lines of text every other day to check her level of fatigue. At the hospital, both the oncologist and the breast cancer nurses can view in real-time the evolution of her fatigue vocal biomarker on a dashboard, along with all the other patients currently monitored at the hospital. One day, an alert comes in on the hospital’s dashboard, indicating that Angie has had particularly high levels in her vocal biomarker of fatigue over multiple days. She is contacted by a nurse to better understand the situation and is scheduled for an emergency appointment with her oncologist. Thanks to her fatigue monitoring, the oncologist has identified very early a local recurrence of her breast cancer. It enabled early detection of the cancer recurrence, instead of waiting a few more months for her routine visit at the hospital. Angie has probably avoided the treatment of a more advanced form of cancer.

Flash forward five years to 2030. Jack, a 50-year-old man with type 1 diabetes. He is one of the 578 million people living with diabetes in the world. His condition requires him to inject insulin into his body multiple times per day if he wants to live. He is subject to stress, anxiety and has depressive symptoms because of the mental burden linked to the management of his disease in his everyday life (which is very frequent, 1 in 4 PWT1D and 1 in 5 PWT2D experience high levels of diabetes distress). His diabetologist suggests therapy with a psychologist and would like to monitor how diabetes distress evolves. He, therefore, prescribes a weekly recording of a few seconds of his voice with an app installed on the same smartwatch Jack is already using to monitor his blood glucose continuously. Eight weeks later, Jack goes back to his diabetologist, and the results are displayed on a graph: Jack’s vocal biomarkers of diabetes distress have improved regularly and the therapy with the psychologist seems to be working. Jack agrees that he sees an improvement in his fatigue level, managed his medication more easily, and had fewer hypos. Bingo!

Long is the way before clinical practice or use at home.

We are not here yet though. Hold your horses!

A lot of start-ups and private companies are on the market, a few academic labs like mine are on the topic as well. The field of voice technology is growing very fast and is therefore getting a lot of traction. Undeniably, there is very good research and innovation conducted by brilliant people, but there is also a good load of crap with people peddling snake oil based on just a bunch of slides!

So guys, when it comes to voice technology, please do like MC Hammer:

“Don’t believe the hype!”

What we have to avoid is to put bad digital health solutions on the market, and create massively biased results for some people — usually minorities. Just like the face recognition software that does not work for black people (by the way, go check the amazing Coded Bias documentary on Netflix!), we have to be careful with the validation of the AI algorithms.

Receiving an erroneous result for your disease because you have a different accent from the people in the data who were used to train the algorithm of the vocal biomarker? Not acceptable!

Joy Buolamwini. Coded Bias (Netflix).

To make sure this does not happen, priority number one is to collect diverse data to train the models. For voice, this means collecting data from teenagers, adults, elderly people, males, females, coming from all over the world, speaking different languages with different accents. The more diverse the data, the more inclusive the digital health solution will be. We also need to fight the replication crisis in science and make sure that the published results are reproducible: making the dataset open and the code open source is the best way to achieve it. We also need to co-design the future digital health solution which will encompass the vocal biomarkers with the users — the patients and the healthcare professionals. This will help us to imagine and prevent most of the risky situations and favor an optimal, efficient, healthy use of the digital solution with secure and privacy-preserving data storage and use.

Let us build the future of healthcare together!

With people from my lab, we have launched an international study to fill in the gap in the field of voice for healthcare. The research study is named Colive Voice. We want to collect voice recordings from all over the world to create a unique international and multilingual vocal database. Clinical and medical data are associated with these recordings. Colive Voice serves as a screening platform for the researchers to identify vocal biomarkers to track different health conditions such as cancer, diabetes, COVID-19, multiple sclerosis, inflammatory bowel diseases. The identification of such vocal biomarkers will allow better diagnosis, prevention, and remote monitoring of diseases.

The participation is anonymous — no need to install an app, no need to register and we do not store your name, email, etc… and it only takes about 20 mn. It is currently available in English, French, Spanish, and German. Many more languages will soon be added. We have even included 5 fun facts about voice to learn during the process.

We have an ambitious objective: we want to collect voice recordings from more than 50,000 volunteers from all over the world, so let us do it!

This is how you can help.

1. You can participate here from your smartphone’s browser.

2. You can help us more: talk to 5 of your friends and family members about Colive Voice and try to recruit them.

3. You can share this article or the link https://www.colivevoice.org/ on social media and spread the news.

And with this, soon, we will be able to Shazam our health!

With Colive Voice, we will:

  • Create a unique platform to train AI algorithms to identify vocal biomarkers which will be used to monitor the health of millions of patients and ease their daily lives.
  • Publish open-source algorithms to be used by all (#OpenScience Yeah!)
  • Develop digital solutions for patients which integrate vocal biomarkers
  • Create open datasets for data scientists and researchers to improve the performances of the vocal biomarkers.

Colive Voice can be accessed here: https://www.colivevoice.org/

--

--

Dr Guy Fagherazzi, PhD

Director of the Department of Population Health & Head of the Deep Digital Phenotyping Research Unit @ Luxembourg Institute of Health. #DigitalHealth #AI