Application of Speech Recognition Technology in Speech-Related Disabilities: An Analysis and Forecast

HMB431
17 min readMar 23, 2016

--

Introduction:

Speech recognition technology (SRT) is software that is widely used in most “smart” devices. The advent of Apple’s Siri created excitement around this venture and introduced users to this technology at their fingertips. SRT picks up on utterances by the speaker and translates them into meaningful text (1). Siri, Cortana, and Google Now are all basic forms of this type of technology but future and more impactful cases can also be made. Applications of SRT in healthcare have been explored, currently used as an interface or dictation tool. Further advances are currently being researched by applying computing and statistical tools to develop SRT into a tool for screening speech and language-related diseases, such as dementia and Alzheimer’s by research groups at the University of Toronto and also at IBM. This paper will argue that SRT will have implications in screening and/or assistance in diagnosis of patients with dementia or other speech disabling diseases, but it will face market-related challenges in the near future. Current research in artificial intelligence and statistics shows optimistic and promising applications of SRT in detecting speech in disabled individuals. The market space is challenging due to high costs, but it poses low competition and upward market growth. Ongoing research teams and startups are advancing current technologies for more applications in healthcare, showing use of SRT in healthcare. The public is not aware of the advancements made in speech recognition, but there is potential for positive public perception. Last, the technological advances needed for the products in this space to profit are forecasted to occur in the long-term, as opposed to the near future.

Scope of Current SRT Research

First, current and ongoing research is laying a solid foundation for the application of speech recognition as a diagnostic tool. Speech recognition research can be classified into four categories: human to human dialogue, human to human monologue, human to machine dialogue, and human to machine monologue (1). While a lot of resources have been focused on developing software that facilitates deliberate human to machine interactions, including both one way and two way interactions, recent advances are developing more applications on passive extraction of information from utterance (1). One such example is using utterance interpretation to diagnose disorders such as dementia, or even cancer, such as laryngeal cancer (1). A handful of neurological and intellectual disorders demonstrate effect in voice (2). For example, research showed that patients who have common dementia syndromes show deficiencies in voice processing, with a different profiles of voice deficiency in different diseases (2). Research conducted in extracting speech from those with speech deterioration-related diseases can be built on to advanced SRTs.

Artificial intelligence (AI) development is the factor that is facilitating the speech recognition research space, contributing to its applications in healthcare products. In the current field, hidden Markov models and Gaussian mixture models, derived from probability theory, are used by most SRTs. Recently, new discoveries in deep neural networks have been seen to provide better results than Gaussian models when testing against speech recognition benchmarks, such as large vocabularies and datasets (3). Artificial neural networks, named after biological neurons, are a tool used in machine learning to explore networks of large and abstract data and pattern recognition (4). Neural networks have shown advancements in pattern recognition and machine learning, both of which allow an AI based SRT to predict and facilitate speech recognition. A recently discovered alternative to deep neural networks are convolutional neural networks that have found to work better than deep neural networks when given tasks related to large-vocabulary continuous speech recognition (5). The power of deep neural network is supported by advancements in machine learning algorithms and hardware upgrades.3 An IBM research group has taken up this healthcare challenge of detecting utterance changes. They developed a transcription server prototype that automatically transcribes speech content obtained by inexpensive equipment. Then an automatic voice analysis is performed by machine learning tool to identify key features that are indicative of dementia such as voice quality, continuity and fluency, and semantics including vocabulary analysis (6). Research in these computing and statistical areas will allow machines to offer better identification of minor change in speech patterns that are observed in the early onset of dementia or Parkinson’s, as well as in other speech-related diseases.

Research in machine interpretation is also an important area to assist patients with speech difficulties. It can be used for personal assistance to understand unconventional utterance from people with speech difficulties, including patients with stroke, intellectual disabilities, and dementia, where the patients fail to convey a message of good vocal and semantic qualities. In an interview with Dr. Frank Rudzicz, a computational linguistic professor at the University of Toronto studying artificial intelligence, he expressed that using AI to interpret broken semantics is a very promising field in that big data will be used to analyze dictated speech, pick up key element of the speech, and translate the message into phrases that another individual can now understand. Perhaps the biggest breakthrough for SRT is its potential as a diagnostic tool as current implications have been focusing on text substitution. In neurodegenerative diseases such as dementia and Parkinson’s Disease, early detection is difficult when the symptoms are not severe (7). Automated speech recognition can make use of the language deficits mentioned before as a marker to allow early diagnosis when subtle speech alterations may be unnoticeable for physicians. A study Dr. Rudzicz was involved in achieved an 81% accuracy in differentiating between healthy controls and Alzheimer’s Disease patients based on participants’ short language recordings of picture descriptions (8). Two studies done on Parkinson’s Disease using voice recordings of emotional speech and repeated phonation obtained accuracies of 73.33% and 75.2% on distinguishing between controls and patients, respectively (7,9). These research illustrated that more research is devoted to develop applications towards implementing AI in healthcare. Overall, these technologies lay the foundation for potential products that can be applied in a clinical setting.

Market and Competition Analysis

As speech recognition products that detect early stage of dementia or speech pathologies have not yet been presented on the market, there are few competitions to name. The Porter’s five forces model demonstrates that the market is still in its infancy, with the future of speech recognition as a tool to assist with diagnosis of dementia remains to be seen. On the bright side, the overall speech recognition industry is expected to worth 5.1 billion by 2024 (23). There are several reasons for the scarce competition in the speech recognition market focused on healthcare, with high costs of R&D being a critical factor. Nuance, a company that offers speech recognition for doctoral medical detections, has $100 million invested in R&D annually (10). Also, experience is required by experts from different disciplines such as computer science, engineering and neuroscience. The high costs of money and time in R&D pose a barrier to entry for companies wanting to enter the speech recognition market in healthcare. Due to high capital expenditure, there is an entry deterrence price, where new competitors may be deterred because the initial profits will most likely be less than the cost of entering the market. Also, public acceptance of incorporating SRT to assist or detect medical conditions is around 58% based on the survey ( Appendix, Figure 5), as people are used to traditional diagnosis by a doctor in the hospital. Nonetheless, the technology of assisting early stage Alzheimer’s diagnosis is unsustainable as one diagnosis per patient, although the technology could be used as a complementary tool to analyze patient’s health status. Regulation issues also pose a barrier to entry, requiring FDA approval of any product that assists in diagnosing or mitigating disease, which can often take years of research and clinical trials to obtain (11). With all these factors in play creating a high barrier to entry, the market faces challenges in the near future to attract new companies as the revenue stream is underdeveloped, leaving the market with few notable competitors.

On the other hand, low competition in the large market of early stage dementia diagnosis could attract potential technology to enter the market. According to the Alzheimer’s Society Canada, patients with Alzheimer’s disease may climb up to 1.4 million by 2030, with a cost of 293 billion per year by 2040 (12). The availability of automatic speech recognition (ASR) will greatly decrease the cost per patient, more importantly, to assist with earlier diagnosis of Alzheimer’s disease, as ASR is easy, quick and inexpensive (13). Although related ASR products have not yet been presented in the healthcare market, on-going research has shown a promising future for speech recognition as an assistant in early dementia diagnosis. A four-year project run by IBM Research-Haifa incorporation with Dem@Care is developing an integrated system of sensors to diagnose dementia, and provide follow up treatment by measuring a patient’s condition over time at home (6). The Watson system developed by IBM is capable of accessing a patient’s condition by detecting patterns in voice recordings. The data received from the system is used to gauge the effectiveness of the patient’s treatment, identify signs of dementia symptoms in speech, and to support preventive care decision making. For instance, the system analyzes health status of the patient and feedback to the doctor as a reference for better judgment on prescriptions and treatments (6, 14). The on-going research has taken a big step forward for speech recognition product on the healthcare market for early detection in dementia. With a significantly increased number in dementia patients each year across the world, the market is waiting to revive and expand as ASR technology is refining.

Current Trends in Voice Recognition Technology

Speech recognition technology has been implemented as personal assistance in various healthcare settings by start-ups and researchers, which indicates optimism in the market and public acceptance. Bringing healthcare services to the public through intelligent personal assistants on mobile devices with voice user interfaces has been the focus of many startup companies, allowing increased accessibility, patient support, and cost-effectiveness of the providers. Sense.ly, launched in 2014, is a virtual clinical platform which allows customized patient care and disease monitoring in between hospital visits and after discharge (15). The virtual nurse, Molly, keeps track of patients’ wellness through questionnaires using speech recognition, clinical measurements, and vitals, as well as reminders for medication intake and appointments. Clinicians use the data collected to adjust personalized care, follow up through videos, or assess risks for readmission, if necessary. Another example is UK-based Babylon Health which developed an AI-driven app launched earlier this month, allowing one to voice record medical questions and promptly receive responses from health professionals (16). It will also provide possible courses of action to take based on symptoms described, matching them to its database as well as personal medical history (17). Combined with additional features to video consult with doctors and get prescriptions delivered to nearby pharmacies, the startup has attracted notable investors such as the founders of Google DeepMind (16,17). Sense.ly and Babylon demonstrate the applicability and benefits of speech recognition in the health care context, driving continuous growth and implementations, as well as increased public acceptance of the technology in the healthcare system.

Emergence of personal assistive tools using speech recognition to address needs of persons with disabilities also shows potentials in expanding and diversifying the market, in addition to raising public acceptance through targeted users. Transcense, aims to help the 360 million hearing impaired individuals around the world to overcome communication barriers by developing the first Audio Visual Aid, Ava (18). Similar to previously mentioned companies, this start-up uses speech recognition on mobile devices, allowing greater accessibility and to visualize a conversation for hard-of-hearing people. Effective communication between participants can be achieved, especially in a large group where it is easy to lose track of a conversation. An alternative application of speech recognition in healthcare is accommodating special needs and providing long-term care at home for the elderly via a personal assistive robot. The research team led by Assistant Professor Dr. Frank Rudzicz from the University of Toronto is developing a mobile robot “ED” to help patients with Alzheimer’s Disease in daily activities through speech recognition and visual prompts (19) An experiment to evaluate the speech-based communication between human and robot revealed that technical aspects of speech recognition are a major limitation. Reasons for this include language differences in elders as well as speech and communication impairments in dementia patients. Although the experiment indicated that the application of the mobile robot still requires further research and improvements, it has the potential to bring about advancements in both the technology and the healthcare system. Thus, the development of diversified applications of speech recognition to assist targeted population with their needs, as well as the existing delivery of healthcare services to the general public through speech recognition-based personal assistant on mobile devices reveal an optimistic, upward trend in the market.

Public and Consumer Perception

Market growth is hindered by the public perception of SRT and its implications in health care. Consumer concerns revolve around accuracy and trust, with a preference for face-to-face interactions in health care settings. The public perception of SRT is highly focused on the accuracy rather than its practical current and potential uses, probably due to the lack of knowledge of the existing and developing technology. Based on a non-randomized survey of 100 individuals (largely university students), 89% have access to SRT (such as Siri, Cortana, and Google Now) ( Appendix, Figure 1). Out of these 89 individuals, 42 reported never using SRT on their devices (Appendix, Figure 2). This defines a large and untapped population that could be targeted for generating positivity and acceptance. Those using built-in voice features on their devices did so for convenience. When asked whether individuals would use a speech assisted medical application, 57% said yes and 43% said no. The main reason for using the application was to cut costs and save time and the main reason for not using it was the preference of a medical practitioner ( Appendix, Figures 5, 6). These trends indicate that consumers are willing to utilize applications that provide medical advice or instructions, but may not trust the accuracy of information or be comfortable with the interaction at this time. Public misperceptions surrounding SRT exist that need to be addressed as these technologies develop.

Perception of SRTs by the consumer and public will become receptive once more beneficial speech-related applications come to the market. There is an increased need for providing patients and families with the best care available in order to reduce the time spent in hospitals and nursing homes for the elderly. The intent here is to use SRT to ease the care for elderly, and allow for more interactive healthcare to decrease the burden on clinics and hospitals. It also helps doctors make informed decisions and help improve the quality of life for the elderly, their families and society.6 Based on a research study, 61% of physicians who have used SRT indicated that it enhanced record keeping and 51% reported that it would save time benefiting patients and leads to more efficient care, thereby, contributing to ease of use (20). Another study concluded that the main highlight of assistive SRT is cost effectiveness and efficacy. The study also reported that the SRT has high potential for smart phone compatibility and its ability to be differentiated. The only concern was the accessibility to such technology (21). The ability to reduce costs and increase efficiency would be an incentive for increased usage of speech assisted technology. Dr. Munteanu, a computer science professor at the University of Toronto in computer science, hopes to see widespread social acceptance of speech-related technologies in the upcoming years, as well as social-type movements in robots that will make them easier to talk to. Therefore, SRT has numerous applications and marketing and enhancing accessibility of such technology can actively increase the number of users and ultimately have positive implications for the market.

Future Implications and Forecast

In the future, SRT will be an attractive market venture after the research of its application in healthcare can be solidified. The technology and research following diagnostic application in these disease cases in clinical settings could very well be seen going to market over the next two years. Until now, SRT in the healthcare industry required physicians to learn how to “talk” to the computer for it to be understood by the system, which compelled the doctors to familiarize themselves by changing their way of speaking and slowly adapting to the system (22). However, due to rapid technological advances in SRT systems, one can forecast that future speech recognition systems will contain extensive amounts of built-in vocabulary and may be programmed to recognize all types of technological and scientific terminology of the medical profession. SRT will soon be able to learn by “listening” to the physician through advancements in pattern recognition and statistics. Overcoming technical challenges through intense research will help move the field forward.

One can also forecast various SRT applications in healthcare settings. Today, when continuous speech dictations have become very common in SRT, multimodal error correction methods have also progressed as a solution to repair recognition errors. According to Dr. Munteanu, multimodal error correction can become extremely useful in SRT, in order to compensate for errors. Current dictation technology on the market lacks a way for the user to go back and “edit” an error. Research in the area of error correction will improve the usability of SRT. Another potential application that could be implemented in healthcare facilities is SRT that could act complementary to the physician-patients interaction. Dr. Munteanu also sees a technology that would recognize speech in the background of a conversation and would interrupt if a medical error was recognized. Of course, this type of technology is not yet feasible, as much further research in pattern recognition and statistics must be conducted. Dr. Munteanu does not see SRT to be revolutionary in the process of dictating records in the future. In terms of medical dictation, he says it will only save time and may be “helpful in clinical settings because it’s a more sanitary and touch-free way of doing things.” Such applications may aim to ease life for the user by conveniently following computing demands or dictating as the user says them out loud. Overall, the future of speech recognition in health care looks optimistic, if SRT deals with accessibility, diagnosis, and improving doctor-patient interactions.

In the near future, developments in SRT are likely to overcome the challenges faced by individuals with dementia and Alzheimer’s disease. Even though the assistive robot and speech recognition targeting the rapidly increasing aging population are still in the early stage of development, along with Ava that addresses communication needs, there are potential implications of the technology in people with dementia or other speech disabling diseases as the research and technology mature enough to go to market. With the rise of cloud based processing power of big data and the fact that machine interpretation of human utterance is becoming more mature, one can predict that within two to five years of time, voice recognition technology used for voice related disease diagnosis will mature and be present in market, like IBM’s technology previously mentioned. It may also be integrated into devices in doctors’ offices, running in the background to record human utterance. It will interact with humans to give a warning upon detecting early signs of voice pathologies. Diagnosis of language-related pathologies with the technology will leave the job to machines, saving time that would have otherwise spent in treating patients or monitoring to discriminate the disease states. Overall, research in the field of SRT in dementia-related applications will develop rapidly.

Concluding Remarks

In conclusion, SRT used for screening and diagnosis of patients with dementia and other speech-related diseases is showing promise but the market is not welcoming to these technologies at this time. Research conducted in the areas of artificial intelligence, namely machine learning and dementia speech recognition has strong results, which can be applied, but the market is currently a challenging one to enter due to high costs, despite low competition. Commercialized technologies and startups are applying SRT in practical ways, indicating optimism and traction. Some public and user misperceptions lie in the promise and accuracy of such technology, but there is willingness for acceptance. In the near future, more research based advances will be observed in this field and dementia-related applications will slowly become visible in the market. All considered, the dementia-focused speech recognition market is expected to grow in the upcoming years.

Authors: Bian Xiaobo, Kanika Singhal, Sabrina Shilun Fang, Shankar Shyam Krishna, Vaidehi Patel, Wen-Chien (Jenny) Hsiao

Supervised by Dr. Jayson Parker, Human Biology, University of Toronto

References:

1. Chou W, Juang B. Pattern recognition in speech and language processing. Boca Raton: CRC Press; 2003.

2. Hailstone J, Crutch S, Warren J. Voice Recognition in Dementia. Behavioural Neurology 2010;23(4):163–164.

3. Hinton G, Deng L, Yu D et al. Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups. IEEE Signal Process Mag 2012;29(6):82–97.

4. Lytton W. Computational Neuroscience. Encyclopedia of the Neurological Sciences. 2014;:844–847.

5. Sainath T, Kingsbury B, Saon G et al. Deep Convolutional Neural Networks for Large-scale Speech Tasks. Neural Networks 2015;64:39–48.

6. IBM Research: A new kind of dementia treatment [Internet]. Research.ibm.com. 2016 [cited 2016 Mar 22];Available from: http://www.research.ibm.com/articles/dementia-treatment-diagnosis.shtml

7. Naranjo L, Pérez C, Campos-Roca Y, Martín J. Addressing voice recording replications for Parkinson’s disease detection. Expert Systems with Applications 2016;46:286–292.

8. Fraser K, Meltzer J, Rudzicz F. Linguistic Features Identify Alzheimer’s Disease in Narrative Speech. Journal of Alzheimer’s Disease 2015;49(2):407–422.

9. Zhao S, Rudzicz F, Carvalho L, Marquez-Chin C, Livingstone S. Automatic Detection of Expressed Emotion in Parkinson’s Disease. 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (Icassp) 2014;

10. Speech recognition for healthcare | Nuance [Internet]. Nuance.com. 2016 [cited 2016 Mar 22];Available from: http://www.nuance.com/for-healthcare/by-solutions/speech-recognition/index.htm

11. Is The Product A Medical Device? [Internet]. Fda.gov. 2016 [cited 2016 Mar 22];Available from: http://www.fda.gov/MedicalDevices/DeviceRegulationandGuidance/Overview/ClassifyYourDevice/ucm051512.htm

12. Dementia numbers in Canada | Alzheimer Society of Canada [Internet]. Alzheimer.ca. 2016 [cited 2016 Mar 22];Available from: http://www.alzheimer.ca/en/About-dementia/What-is-dementia/Dementia-numbers

13. Laske C, Sohrabi H, Frost S et al. Innovative diagnostic tools for early detection of Alzheimer’s disease. Alzheimer’s & Dementia 2015;11(5):561–578.

14. Satt A, Sorin A, Toledo-Ronen O, Barkan O, Kompatsiaris I, Kokonozi A. Evaluation of speech-based protocol for detection of early-stage dementia. In: Interspeech. 2013.

15. Sense.ly | Virtual visits, real care. [Internet]. Sense.ly. 2016 [cited 2016 Mar 22];Available from: http://sense.ly/

16. babylon health [Internet]. babylon. 2016 [cited 2016 Mar 22];Available from: http://www.babylonhealth.com/

17. The Artificially Intelligent Doctor Will Hear You Now [Internet]. MIT Technology Review. 2016 [cited 2016 Mar 22];Available from: https://www.technologyreview.com/s/600868/the-artificially-intelligent-doctor-will-hear-you-now/#/set/id/601000/

18. Ava — Communicate beyond barriers [Internet]. Ava.me. 2016 [cited 2016 Mar 22];Available from: http://www.ava.me/

19. Rudzicz F, Wang R, Begum M, Mihailidis A. Speech Interaction with Personal Assistive Robots Supporting Aging at Home for Individuals with Alzheimer’s Disease. ACM Trans Access Comput 2015;7(2):1–22.

20. Lyons J, Sanders S, Fredrick Cesene D, Palmer C, Mihalik V, Weigel T. Speech recognition acceptance by physicians: A temporal replication of a survey of expectations and experiences. Health Informatics Journal 2015;

21. McDonald R, Thomacos N, Inglis K. Review of current and emerging assistive technologies for the reduction of care attendant hours: cost effectiveness, decision making tools and emerging practices. Melbourne: Monash University: 2013.

22. Parente R, Kock N, Sonsini J. An Analysis of the Implementation and Impact of Speech-Recognition Technology in the Healthcare Sector. Perspect Health Inf Manag 2004;1:5.

23. Speech and Voice Recognition Market To Hit $5.1 Billion — FindBiometrics [Internet]. FindBiometrics. 2015 [cited 2016 Mar 23]; Available from: http://findbiometrics.com/speech-voice-recognition-market-to-hit-5-1-billion-26111/

24. Nuance Announces Fiscal 2013 and Fourth Quarter Results”. MarketWatch. Retrieved May 20, 2015.

25. “Earnings Release FY15 Q4”. Microsoft. July 21, 2015. Retrieved July 24, 2015.

26. “Apple Reports Record Fourth Quarter Results”. Apple Inc. October 27, 2015. Retrieved October 27, 2015.

27. Statista. Google: global annual revenue 2015 | Statistic. 2016. Available at: http://www.statista.com/statistics/266206/googles-annual-global-revenue/. Accessed March 2, 2016

All images used in this article provide source information as needed. Attribution may not be required for all images. Survey graphs taken from surveymonkey.com as per their terms of use.

Appendix:

Figure 1: The majority of individuals (89%) have access to some speech recognition technology.
Figure 2: The majority of surveyed individuals(45.65%) never use SRT on their devices.
Figure 3: The majority of the surveyed individuals who did SRT (34.52%), used it On-the-go.
Figure 4: The majority of surveyed individuals, who did not use the technology, stated that they did not need it (63.64%).
Figure 5: 58% of surveyed individuals said would be open to using a speech assisted medical application.

--

--