Big Data and Artificial Intelligence in Mental Health

Amy Yang
Trends in Data Science
11 min readMay 17, 2021

Introduction

The rapid growth of big data and artificial intelligence (AI) is seen to create great potential for the mental health industry. The development of big data and AI makes it possible to collect, analyse and interpret digital data in mental health. Big data is utilised to capture healthcare data from various sources including electronic health records, wearable tracking devices, genetic testing, and others (Beam & Kohane, 2018). Once the data from diverse sources are prepared and organised, AI will be used to recognise the pattern of datasets, work on generalizations and make predictions for future outcomes (Kamran Ul haq et al., 2020). As summarised by Shatte et al. (2019, p.1435), the application of big data and AI (especially machine learning) will improve the development of mental health in four key domains including “(i) detection and diagnosis of mental health conditions; (ii) prognosis, treatment and support; (iii) public health; and (iv) research and clinical administration”.

Challenges in Mental Heath

Mental health problems have been common worldwide. According to the World Health Organization (2001), one in four people will experience mental health conditions in their lives. Based on the 2017 study for the Global Burden of Disease, there are around 264 million and 284 million people affected by depression and anxiety respectively around the world (GBD 2017 Disease and Injury Incidence and Prevalence Collaborators, 2018). The prevalence of mental health issues calls for more attention to the main challenges within the development of mental health services and innovative solutions to tackle these challenges.

One of the key challenges in mental health is the detection of mental illnesses or disorders (Saravia et al., 2016). This challenge has been reflected in detecting depressive disorder which is one of the main mental illnesses causing disability across the globe (World Health Organization, 2019). According to Guntuku et al. (2017), only around 50% of patients with depression have been detected by primary care practitioners. The same challenge also exists in the detection of suicide risks (Roy,2020). As pointed out by Roy (2020), roughly 60%-70% of patients with suicide risks may not be identified by primary care physicians. The low detection of mental illness is most likely related to the existing stigma in society. As indicated by Heim et al. (2018, p.2) “Mental health-related stigma is a key barrier to mental health care”. People with mental health conditions are frequently affected by social bias and discrimination (Martínez & Farhan, 2019). The social stigma may inhibit individuals with mental illness from disclosing their authentic feelings and thoughts to primary care practitioners. Several innovative approaches have been discussed in papers and research on how to improve the detection of mental illnesses (Shatte et al., 2019). Further details about these approaches are discussed in the following section.

Diagnosis of mental illness is considered as another key challenge in mental health. As suggested by Martínez and Farhan (2019), symptoms for mental health disorders may not be straightforward and may overlap among different diagnoses. This is commonly seen in diagnosing unusual and rare mental disorders. Bellak (1985) believed that around 10% of all cases of schizophrenia (and also affective psychosis) should be diagnosed as the little-known mental disorder category “attention-deficit disorder psychosis (ADD psychosis)”, which has some unique features different from those of the schizophrenic syndrome. To improve the diagnostic accuracy for mental health disorders, a wide range of observable cues are required to be measured reliably regarding cognition, behavior, sociality, and biological patterns (Liang, 2019). However, no reliable data of observables are available for diagnostic assessment at present and the diagnosis of mental illness is still largely based on the assessments from clinical interviews and self-reports (Liang, 2019). Besides, the self-reports may not be reliable enough for diagnostic assessment. As discovered by Atkinson (1997), the rating for self-reports may be biased by affective state and impacted by poor awareness and life events lately. To provide valuable clinical evidence for the diagnosis of mental health conditions, Liang (2019) proposed several data sources for mental health including smartphones and wearable devices, social media, electronic health records, medication records, and a great number of unstructured records.

Improving Detection and Diagnosis of Mental Health Problems with Big Data and Artificial Intelligence

Emerging big data analytics and AI present opportunities to overcome the barriers in detecting and diagnosing mental health problems. The great potential has been identified in the following data sources for mental health with the application of big data analytics and AI.

Social Media

Self-disclosure in social media has become increasingly common these days. As pointed out by Balani and De Choudhury (2015), high self-disclosure was found in posts shared on different mental health forums on Reddit. These self-disclosed data may help to increase our understanding of users’ behaviours and emotions and detect the mental health problems of some users who may not be willing to disclose their true feelings to physicians.

The methodologies of natural language processing (NLP) and machine learning (ML) create the possibilities to detect mental health disorders on social media data. As suggested by Ismail et al. (2019), NLP techniques enabled several features to be derived from the written text on social media including personality, demographic data and mental status. Sentiment analysis is one of the popular tools to extract features on social media and applied to understand a user’s emotional state (Wongkoblap et al. 2017). Apart from sentiment analysis, other approaches for feature extraction include N-gram, and Linguistic Inquiry and Word Count (LIWC) (Ismail et al. 2019). After being extracted and selected, the features are fed to the ML algorithms to build a predictive model, which is used to detect mental health problems via social media (Wongkoblap et al. 2017). Recent studies have been worked on this area. Reece and Danforth (2017) applied ML tools to detect users’ depression with their Instagram photos. Mitchell et al. (2015) discovered significant linguistic signals to identify schizophrenia suffers on Twitter by using NLP and ML techniques.

Smartphones and Wearables

Nowadays, smartphones and wearable devices are widely used to monitor individuals’ physical activities and health conditions including step count, heart rate, sleeping hours, and so on. With the advancements in big data and ML techniques, these data could be collected and analysed to reduce the inaccurate detection of mental health illness due to the lack of shared information by patients with clinical practitioners. This opportunity is presented in the research by Gjoreski et al. (2017) where 70% of stress events were detected with a precision of 95% from the data collected by a wrist device including acceleration, blood volume pulse, electrodermal activity, heart rate, inter-beat interval, and skin temperature. During this study, a context-based stress detector was developed by using ML techniques.

Opportunities are also being identified in the diagnosis of mental disorders with the utilisation of big data and AI. Specific features of mental health illness could be recognised with ML techniques from the data collected by smartphones and wearables. These features may act as supportive clinical evidence to increase diagnostic accuracy. As indicated by Faurholt-Jepsen et al. (2019), ‘objective smartphone data may represent a potential diagnostic behavioural marker in bipolar disorder and may be a candidate supplementary method to the diagnostic process in the future.’ Within their study, ML techniques were applied to calculate the classification accuracy of objective smartphone data including the number of calls and text messages per day, the duration of phone calls per day and the amount of time the smartphone screen was ‘on’ per day.

Electronic Health Records (EHRs) and Other Medical Records

EHRs are considered an important data source in healthcare systems (Liang, 2019). NLP techniques have been employed on EHRs to detect mental health problems. Downs et al. (2017) proposed an NLP approach to detect suicidality in EHRs of adolescents with autism spectrum disorders where NLP tools were used to extract the documents with suicidality related (SR) mentions and classify the documents/patients of SR positive and SR negative. Other records like clinical notes and conversations for counselling sessions also heavily rely on the application of NLP to be analysed due to the unstructured text (Graham et al., 2019). Analysis of these records (including EHRs, clinical notes etc.) will allow for a better understanding of symptoms for mental health problems, which may help to overcome the challenge in diagnosing mental illness with overlapping features. Based on the project carried out by Jackson et al. (2016), fifty symptoms of severe mental illness (schizophrenia, schizoaffective disorder, and bipolar disorder) were identified from clinical EHR text with NLP techniques.

Potential Issues Ahead

As discussed in the previous section, the application of big data analytics and AI has created great opportunities to improve the detection and diagnosis of mental health problems. At the same time, several potential issues inherent in the implementation of these opportunities are also drawing our attention.

The quality of data collected from diverse sources could be one of the main concerns. The data could lack reliability. For example, data from social media are likely to include accounts either duplicate, malicious, or otherwise ‘fake’ (Monteith et al., 2015). Sometimes clinical notes from patients’ interviews could also be unreliable. As explained by Halford (2020), due to the existing stigma of mental health problems, patients may have a concern about how the data related to their mental health problems are used and may not be honest to their therapists. Besides, the size of data may be not big enough to optimize the potential of ML. As Kamran Ul haq (2020) pointed out, ML models predict more accurate outcomes through getting trained by more input data with distinct features. However, as mentioned earlier, some mental disorders could be unusual and rare. The data from some rare mental disorders could be small and may lead to inaccurate results. Moreover, fragmentation of data could be challenging. As indicated by Hidalgo‐Mazzei et al. (2016), the fragmented market of smartphones and wearable devices is problematic and unaddressed. The market of smartphones includes various mobile operating systems (such as Android, iOS, Windows Phone, Blackberry, etc.) and wearable devices are provided by more than 20 different companies.

Ethical issues emerging from big data analytics and AI should also be considered. As the mental health data include a lot of sensitive information (such as depression, suicide attempts, previous history of abortions, etc.), software companies that collect these personal data should guarantee the data are stored safely and protect them from cybersecurity risks. Also, clear policies should be established by companies and institutions to determine who is authorised to access these data (Passos et al., 2019). As suggested by Passos et al. (2019, p.163), ‘physical and remote access to stored data may also give an individual opportunity of duplicating a data set and releasing this information’. Besides, to protect patient’s privacy, the mental health data used by health research should be anonymized (Russ et al., 2019).

Conclusion

In this paper, we discuss two main challenges in mental health — detection and diagnosis of mental health problems and describe the great opportunities presented by big data analytics and AI in overcoming these challenges. Along with these innovative approaches is the concern for potential issues in both data quality and ethics. This paper is currently limited and further research is required to support the benefit of these opportunities.

References

Atkinson, M., Zibin, S., & Chuang, H. (1997). Characterizing quality of life among patients with chronic mental illness: a critical examination of the self-report methodology. The American Journal of Psychiatry, 154(1), 99–105. https://doi.org/10.1176/ajp.154.1.99

Balani, S., & De Choudhury, M. (2015). Detecting and Characterizing Mental Health Related Self-Disclosure in Social Media. Proceedings of the 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, 1373–1378. https://doi.org/10.1145/2702613.2732733

Beam, A., & Kohane, I. (2018). Big Data and Machine Learning in Health Care. JAMA : the Journal of the American Medical Association, 319(13), 1317–1318. https://doi.org/10.1001/jama.2017.18391

Bellak, L. (1985). ADD Psychosis as a Separate Entity. Schizophrenia Bulletin, 11(4), 523–527. https://doi.org/10.1093/schbul/11.4.523

Claudia Martínez, & Imogen Farhan. (2019). Making the right choices: Using data-driven technology to transform mental healthcare. Reform. https://reform.uk/research/making-right-choices-using-data-driven-technology-transform-mental-healthcare

Downs, J., Velupillai, S., George, G., Holden, R., Kikoler, M., Dean, H., Fernandes, A., & Dutta, R. (2017). Detection of Suicidality in Adolescents with Autism Spectrum Disorders: Developing a Natural Language Processing Approach for Use in Electronic Health Records. In Advances in Printing and Media Technology (Vol. 2017, p. 641–)

Faurholt-Jepsen, M., Busk, J., Þórarinsdóttir, H., Frost, M., Bardram, J., Vinberg, M., & Kessing, L. (2019). Objective smartphone data as a potential diagnostic marker of bipolar disorder. Australian and New Zealand Journal of Psychiatry, 53(2), 119–128. https://doi.org/10.1177/0004867418808900

GBD 2017 Disease and Injury Incidence and Prevalence Collaborators. (2018). Global, regional, and national incidence, prevalence, and years lived with disability for 354 diseases and injuries for 195 countries and territories, 1990–2017: a systematic analysis for the Global Burden of Disease Study 2017. Global Health Metrics, 392(10159), 1789–1858. https://doi.org/10.1016/S0140-6736(18)32279-7

Graham, S., Depp, C., Lee, E., Nebeker, C., Tu, X., Kim, H., & Jeste, D. (2019). Artificial Intelligence for Mental Health and Mental Illnesses: an Overview. Current Psychiatry Reports, 21(11), 1–18. https://doi.org/10.1007/s11920-019-1094-0

Gjoreski, M., Luštrek, M., Gams, M., & Gjoreski, H. (2017). Monitoring stress with a wrist device using context. Journal of Biomedical Informatics, 73, 159–170. https://doi.org/10.1016/j.jbi.2017.08.006

Guntuku, S., Yaden, D., Kern, M., Ungar, L., & Eichstaedt, J. (2017). Detecting depression and mental illness on social media: an integrative review. Current Opinion in Behavioral Sciences, 18, 43–49. https://doi.org/10.1016/j.cobeha.2017.07.005

Halford, E. A. (2020, November 14). Data science in mental health. Medium. https://towardsdatascience.com/data-science-in-mental-health-ccd09ba2148a

Heim, E., Kohrt, B., Koschorke, M., Milenova, M., & Thornicroft, G. (2018). Reducing mental health-related stigma in primary health care settings in low- and middle-income countries: a systematic review. Epidemiology and Psychiatric Sciences, 29, e3–e3. https://doi.org/10.1017/S2045796018000458

Hidalgo‐Mazzei, D., Murru, A., Reinares, M., Vieta, E., & Colom, F. (2016). Big Data in mental health: a challenging fragmented future. World Psychiatry, 15(2), 186–187. https://doi.org/10.1002/wps.20307

Ismail, N., Du, M., & Hu, X. (2019). Social Media and Psychological Disorder. In Social Web and Health Research (pp. 171–192). Springer International Publishing. https://doi.org/10.1007/978-3-030-14714-3_9

Jackson, R., Patel, R., Jayatilleke, N., Kolliakou, A., Ball, M., Gorrell, G., Roberts, A., Dobson, R., & Stewart, R. (2017). Natural language processing to extract symptoms of severe mental illness from clinical text: the Clinical Record Interactive Search Comprehensive Data Extraction (CRIS-CODE) project. BMJ Open, 7(1), e012012–e012012. https://doi.org/10.1136/bmjopen-2016-012012

Kamran Ul haq, A., Khattak, A., Jamil, N., Naeem, M., & Mirza, F. (2020). Data Analytics in Mental Healthcare. Scientific Programming, 2020, 1–9. https://doi.org/10.1155/2020/2024160

Liang, Y., Zheng, X., & Zeng, D. (2019). A survey on big data-driven digital phenotyping of mental health. Information Fusion, 52, 290–307. https://doi.org/10.1016/j.inffus.2019.04.001

Mental health and development targeting people with mental health conditions as a vulnerable group. (2010). World Health Organization.

Mitchell M, Hollingshead K, Coppersmith G. Quantifying the language of schizophrenia in social media. In: Proceedings of the 2nd workshop on Computational linguistics and clinical psychology: From linguistic signal to clinical reality. 2015. p. 11–20.

Monteith, S., Glenn, T., Geddes, J., & Bauer, M. (2015). Big data are coming to psychiatry: a general introduction. International Journal of Bipolar Disorders, 3(1), 1–11. https://doi.org/10.1186/s40345-015-0038-9

Passos, I., Mwangi, B., & Kapczinski, F. (2019). Personalized Psychiatry: Big Data Analytics in Mental Health. Springer International Publishing AG.

Reece, A., & Danforth, C. (2017). Instagram photos reveal predictive markers of depression. EPJ Data Science, 6(1), 1–12. https://doi.org/10.1140/epjds/s13688-017-0110-z

Roy, A., Nikolitch, K., McGinn, R., Jinah, S., Klement, W., & Kaminsky, Z. (2020). A machine learning approach predicts future risk to suicidal ideation from social media data. NPJ Digital Medicine, 3(1), 78–78. https://doi.org/10.1038/s41746-020-0287-6

Russ, T., Woelbert, E., Davis, K., Hafferty, J., Ibrahim, Z., Inkster, B., John, A., Lee, W., Maxwell, M., McIntosh, A., & Stewart, R. (2019). How data science can advance mental health research. Nature Human Behaviour, 3(1), 24–32. https://doi.org/10.1038/s41562-018-0470-9

Saravia, E., Chang, C., De Lorenzo, R., & Chen, Y. (2016). MIDAS: mental illness detection and analysis via social media. Proceedings of the 2016 IEEE/ACM International Conference on Advances

in Social Networks Analysis and Mining, 1418–1421.

https://doi.org/10.1109/ASONAM.2016.7752434

Shatte, A., Hutchinson, D., & Teague, S. (2019). Machine learning in mental health: a scoping review of methods and applications. Psychological Medicine, 49(9), 1426–1448. https://doi.org/10.1017/S0033291719000151

Stewart, R., & Davis, K. (2016). “Big data” in mental health research: current status and emerging possibilities. Social Psychiatry and Psychiatric Epidemiology, 51(8), 1055–1072. https://doi.org/10.1007/s00127-016-1266-8

Wongkoblap, A., Vadillo, M., & Curcin, V. (2017). Researching Mental Health Disorders in the Era of Social Media: Systematic Review. Journal of Medical Internet Research, 19(6), e228–e228. https://doi.org/10.2196/jmir.7215

World Health Organization. (2001, September 28). The world health report 2001: Mental disorders affect one in four people. WHO | World Health Organization. https://www.who.int/news/item/28-09-2001-the-world-health-report-2001-mental-disorders-affect-one-in-four-people

World Health Organization. (2019, November 28). Mental disorders. WHO | World Health Organization. https://www.who.int/news-room/fact-sheets/detail/mental-disorders

--

--

Amy Yang
Trends in Data Science

Research Assistant @ UNSW | Data science student @UTS. Enthusiastic about data-driven insights and data science applications