Mental Disorder Analysis with Elasticsearch and Kibana (Phase 1)

How to leverage powerful search and analytics for extracting and exploring natural language data.

elvis
DAIR.AI
4 min readFeb 28, 2019

--

Bipolar disorder affects over 1% of the global population. In the US alone, it is estimated that around 3% of the population are affected by bipolar disorder at some point in their life. The number is staggering as we begin to include more mental disorders such as depression and dementia.

AI for Mental Health

Given the rising number of mental disorders that affect millions of people all over the world, it has become a challenge to diagnose and treat every single case. Andrew Ng spoke about this challenge and what role AI plays in mental health in this interesting post.

An AI company known as Woebot is also at the forefront of building technologies that help to ease the negative effects of mental health issues such as depression through Cognitive Behavioural Therapy (CBT). Although the idea is not to replace the traditional method of treatment, accessibility to an online coach (in the form of smart, empathetic conversational agents) has the potential to help with the process of treatment and alleviating mood which plays a major role in a person’s mental well-being. Accessibility is key!

Power of Analytics and Search

Many studies have recently shown that machine learning (ML) models can accurately predict cases of mental disorders such as bipolar disorder and depression on social media [1, 2].

Regardless of how accurate ML and deep learning models can perform on the mental disorder predictive task, there is still more work that needs to be done to further understand these health problems from an experimental and analytics standpoint. In other words, it is enough to just make a prediction of whether someone is suffering from a mental disorder or should we go the extra mile to better understand the behaviors involved? As responsible social engineers and data scientists, we should! In delicates cases like these, more intuition of important metrics and behaviors (e.g., subconscious linguistic phenomena, mood distribution, mood stability, etc.) are needed.

I propose that in the conquest to better understand patients from a research standpoint, it is important to leverage advanced search and analytic technologies to perform deep natural language processing and analyses.

Preliminary Analysis

In this post, I share a preliminary analysis (phase 1) conducted on several self-reported cases of bipolar disorder, which were used in the experimental work proposed by Saravia et al. (2016).

The preliminary analysis involves a feature engineering pipeline that aims to discover interesting and hidden linguistic phenomena and emotional behavior of online mental disorder cases. Leveraging Elasticsearch search capabilities and Kibana’s visualization capacity, the goal of the analysis are as follows:

  • To broadly show how to leverage Elasticsearch’s ingest pipeline and custom analyzers for preprocessing and feature engineering
  • To introduce common best practices for dealing with natural language data in Elasticsearch
  • To discover insights that assist to improve feature engineering and ML models

Overall, this project aims to show how to leverage fast search and querying to power a visual dashboard that could potentially assist a researcher or medical expert to monitor and better understand a patient or cohort. The project is in the preliminary phase and barely touches the surface of the true potential of machine learning, natural language processing, search, and analytics in the war against mental disorders.

Phase 1

As mentioned previously, the project is in a preliminary phase and will be updated in the months to come. For now, those who are interested can access the current project code in this GitHub repository. All material (code, data, queries etc.) for reproducing the experiments and analyses performed in this project will be updated and made available in the same repository.

I am focusing on the Elastic stack for analytics, preprocessing, and dashboards due to its fast query and search engine. Such a platform could become the backbone for all things data when it comes to applications related to clinical NLP. Below you can preview a high-level overview of the current framework used to conduct the preliminary analysis.

And here is a short demonstration of the current dashboard I have built:

Methods on how to render the above dashboards will be released in an upcoming post. More details about the preliminary analysis can also be found in this set of slides.

More coming soon! If you want to get seriously involved in this project, reach me directly at ellfae@gmail.com — I am always open to collaboration!

--

--