Exploring the Impact of Covid-19 on Mental Health in Singapore

Yung
Omdena
Published in
13 min readSep 8, 2021

This article is co-written by Chan Jiong Jiet, Claudia Chan, Tan Jamie and Jerome Yue.

Introduction

(Source: unsplash)

The covid-19 pandemic has been, in many ways, disruptive to our daily lives in Singapore. Although the nation has taken it in its stride, it has taken its toll on Singaporean’s mental health, which includes their emotional, psychological, and social welfare. Over the summer, many thoughtful individuals participated in the Omdena Singapore Local Chapter’s project titled, Exploring the Impact of Covid-19 on Mental Health in Singapore. Overall, the Local Chapter aims to promote real-world AI4Good solutions through running open-source projects involving local AI enthusiasts and NGOs, and facilitating the sharing of knowledge through case study-based education. In this project, we uncovered some of the influences covid-19 have on Singaporeans, so non-governmental organizations (NGOs) in Singapore are better informed when making decisions, curating content and designing activities.

At a glance, the project involved data scraping from multiple sources, data cleaning and performing exploratory data analysis (EDA) to build a topic classifier, risk prediction model and a visualization dashboard. In this article, we’re going to highlight the process of collecting data, cleaning it so that it is ready for use and the problems that we faced. Following, we will share some insights about the effects of Covid-19 on the mental health status of Singaporeans found through EDA and visualizing the data. The overview is as follows:

Methodology

Data collection

Prior to data scraping, the team needed a standardized list of query keywords to identify relevant posts and videos from the respective platforms. Multiple discussions led to two final lists for Covid pandemic and mental health-related keywords. Additionally, through this activity, members had an opportunity to mingle and work together, and also developed a better understanding of the project topic.

Data scraping approach

After finalizing the list of keywords, the team divided into smaller groups to scrape data from their respective social media platforms. In this process, a similar keyword-by-keyword approach was adopted across the groups as well. This approach was agreed on as initially, we wanted to use pandemic keywords in conjunction with mental health keywords. The purpose of the initial approach was to ensure that collected data were more relevant to the topic. However, with this approach, we would miss out on numerous pre-Covid data. As such, we stuck to scraping using keyword-by-keyword instead.

Tools and libraries used

In the process of data scraping, many libraries and official APIs were used (Table 1). Notably, the news article team also explored ParseHub, a non-programmers friendly web scraping tool for beginners.

Table 1 Data collection tools — Source: Omdena

Issues faced — Geotag parameter

As seen in figure 1, data scraping has brought us plenty of data. However, this did not come easy as every group encountered a different challenge with the data they collected. As a member of the Twitter task group, we will be discussing in detail the geo-tag issues we faced under this section.

Unlike other platforms scraped, the geo-tag feature in our libraries was used to filter relevant posts to scrape. However, this gave an output that consisted of tweets from other nations as well. This affects the consistency of our data since we were only interested in the mental state of Singaporeans in this project.

To use the geo-tag parameter, we have to specify the latitude, longitude and radius of Singapore. This led to our first problem. Due to the close proximity between Singapore and Malaysia, it was inevitable that a substantial number of tweets would be gathered from Malaysia in the process. Hence, to mitigate this problem, we reduced the radius passed into the geo-tag parameter and filtered out tweets that contain words like ‘Malaysia’ and ‘Johor’.

The geo-tag works by detecting the location users have specified under individual tweets and in their profiles. This was a problem as it meant that there would be numerous tweets mentioning Singapore from users outside of the nation. The remedy for this was to use the OSMnx python package and the ‘place’ column from our data, which contained the latitude and longitude of tweets, to filter out tweets outside of Singapore. That said, it was still problematic because only a small percentage of tweets had the place tag. Thus, we did some manual cleaning by dropping tweets with certain keywords like ‘war’ after confirming that they were irrelevant to Singapore.

Figure 1 Summary of data collected — Source: Omdena

Data preprocessing

  • Dropped irrelevant columns
  • Removed hyperlinks, alphabets, digits.
  • Removed mentions, punctuations, code formatted text.
  • Removed stopwords with NLTK (Python Natural Language Toolkit library)
  • Expand contractions (don’t, wouldn’t, I’d → do not, would not, I had)
  • Lemmatization (go, gone, went, goes → go)
  • Further cleaning when doing EDA (iterative process)

The list above describes the steps taken for data preprocessing. The first four steps are self-explanatory, thus this section will focus on explaining the remaining steps. Contractions are shortened versions of words (refer to the example above), and we expanded the words so that our model can capture the data more accurately. To expand contractions, we obtained a list of contractions from a Github repo to replace the words in our data. The sequence of data preprocessing is important as expanding contractions should be done before removing punctuations. The sequence of preprocessing is particularly pertinent in NLP, and some steps were repeated twice to ensure that the data was fully cleaned.

The last step of preprocessing was lemmatization. Lemmatization is important as it allows us to map a word to its root form (refer to the example above). Furthermore, during the EDA stage, further analysis (topic modelling, Bag-of-Words (BoW), unigram, bigram and trigram) allowed us to identify and conduct additional cleaning of the data.

EDA (Exploratory Data Analysis) and Data Visualization

One of the project goals was to create an interactive dashboard on the project’s insights and findings. As the dashboard was targeted for public education, the team brainstormed questions that the public may be interested to learn. Two such questions were “how did the concerns of Singaporeans change over the period of COVID-19” and “how did the feelings of Singaporeans change over the period of COVID-19”.

To answer these questions, the team worked on combining different datasets from Twitter, Reddit and News Articles, to present a fuller picture of the answers. In addition, the team made the assumption that data scraped from media platforms were representative of Singaporeans in general.

Figure 2 Treemap. Source: Omdena Singapore

To understand more about the concerns of Singaporeans during the pandemic, keywords were extracted from posts and visualised into a treemap. The treemap of word frequency can be interpreted according to its size and colour; a bigger box and darker colour indicates a word that has been used more frequently in online posts. An interesting insight discovered is that the treemap presents different concerns during each phase of COVID-19.

To elaborate, before and during the Start of COVID-19, words such as ‘spread’ and ‘coronavirus’ ranked highly. This is parallel to the worry of Singaporeans about the transmissibility of the new virus. During the lockdown, also known as Circuit Breaker, words relating to the situation were widely mentioned — ‘circuit breaker’, ‘lockdown’, ‘pandemic’ and ‘covid-19’. Additionally, ‘mask’ was also frequently used. This corresponds to the huge debate about the proper donning and quality of various masks. Work from home, also abbreviated as ‘wfh’, was mentioned often as well, with many employees adjusting to the disruptions that came along with working from home. After Circuit Breaker, as the number of infected individuals declined, the Singapore government started to introduce different levels of restrictions to open up public facilities. At the same time, vaccines were made available and the government actively encouraged everyone to be vaccinated. However, as there were different vaccines, this sparked a discourse about the efficacy and side effects of each vaccine, which is consistent with the increased use of ‘vaccine’ in posts as shown in the treemap.

Figure 3 sentiment analysis. Source: Omdena Singapore

To answer the second question aforementioned, sentiment analysis was performed on online posts, which helped the team to understand if a post is negative, positive or neutral. The python library used for this analysis was Valence Aware Dictionary and Sentiment Reasoner (VADER). Posts were read by VADER and given a compound score that ranged between -1 to 1, indicating the extremity of negative or positive sentiments of the text. The posts were then binned into their respective posted months, and the average compound score for each month was taken to output the above graph.

From the figure, it can be observed that overall sentiment is on an upward trend. Though one may have expected sentiments to be at the lowest during Circuit Breaker, the opposite occurred and sentiments peaked the highest before declining. A possible explanation for this phenomenon may be the positive spirit people had at the beginning of COVID-19 till Circuit Breaker.

Singapore is no stranger to coronavirus having experienced SARS (Severe Acute Respiratory Syndrome) — a coronavirus, in 2003. During that time Singapore had managed to eradicate SARS from its land and many had high hopes that the same would happen again for COVID-19, hence the positive sentiments. However, as the battle against COVID-19 dragged out, Singaporeans started to experience mental fatigue, even more so when the government announced that safety measures, such as masks and social distancing, would be the new norm. This then resulted in the decline of sentiments.

An additional explanation for the increase in sentiments from January 2020 to May 2020 may also be attributed to the religious festivals that took place; Hari Raya Puasa, Vesak Day and Good Friday. During these religious festivals, a celebratory mood is often evoked and people tend to feel more positive which leads to positive and encouraging posts on social platforms.

It was also observed during EDA that tweets with campaigns encouraging Singaporeans to show their appreciation for frontline workers helped increase sentiments.

Figure 4 showing a sample of positive sentiment tweets regarding frontline workers

The implications are profound when the observations of a positive sentiment score are combined with the general supportiveness of Singaporeans. By organising such campaigns that show support for frontline workers, words of positivity are also shared across Singaporeans, which helps to lift the mood and morale of social media users. As such, government agencies tasked to engage and rally Singaporeans can consider organising similar campaigns in the future; campaigns that encourage users to post messages promoting positivity and inclusion. With social media’s strong network effects, a natural word of mouth virility is created, reaching many users and thus increasing impact.

Empath Analysis and ScatterText

As text scraped from social media can contain a myriad of topics, finding a multitude of topics that aptly summarises or describes data can be difficult. As such, to help generate lexical categories, Empath, a tool made by researchers at Stanford University was used. Empath helps to generate lexical categories by looking at relevant keywords in the scraped text. For instance, if the sentence “The man is bleeding after getting punched” is passed into Empath, categories like violence will be generated. This helps to inform the data analyst that the sentence is related to a topic of violence.

In this project, Empath analysis was used for its efficiency in exploring topics about the collected data, especially given the tight timeline. As the project draws text data from multiple sources, Empath helps to automatically find keywords and map them to a broader category without running unsupervised methods, such as Latent Dirichlet Allocation, that require validation. It also helps ensure consistency in finding categories as it has been pre-validated by the creators of Empath using more than 1.8 billion words to train the model.

Figure 5 showing the frequency of categories generated by Empath

After running Empath on Twitter, Reddit and News Articles data, the highest observed category is negative emotion. This is concerning as it shows that social media and news outlets are generating content that relates to negative emotions. The other categories found are also largely related to the pandemic, which is expected due to the data collection process. Emotions, businesses, school, travelling, sleep problems etc. are issues that are commonly discussed in social media and news outlets, thus showing the widespread effect of the pandemic on Singaporeans. In this case, Empath has shown concrete evidence on how COVID-19 has affected Singaporeans.

In the following paragraphs, analysis focused only on data scraped from Twitter due to the constraint of time. As the team delved deeper into the data, ScatterText, a python library was used to aid in the tricky visualisation of the actual text and its respective empath category. ScatterText is a useful library that helps visualize binary categories — positive and negative text in this situation, and their associated empath categories. This additional segmentation can help us distinguish empath categories that are strongly associated with a sentiment. ScatterText also helps to bold the keywords that Empath looks at in order to produce an empath category, thus making the overall model interpretable.

Figure 6 showing ScatterText output

In figure 6, the frequency of empath categories relating to positive and negative sentiments is shown on the vertical and horizontal axes respectively. A simpler way to understand the graph is the higher up the empath category lies, the more it is commonly seen in tweets with positive sentiments. The more rightwards the empath category lies, the more it is commonly seen in tweets with negative sentiment.

Figure 7 showing the negative_emotion empath category with the corresponding keyword and negative sentiment

ScatterText also allows for the search of relevant text data corresponding to the sentiment and empath category. Additionally, the words in bold are text that ScatterText and Empath look at to generate a lexical category. For instance, from the above figure, it can be observed that the lexical category negative emotions appears to be linked to keywords such as worst, lose, crazy, scary. However, it can be noticed that there is no particular topic of focus in the data as the topic ranges from business to lockdown sentiments. This suggests that negative emotion appears in many aspects of Singaporean’s life. This may be interesting to psychologists or mental health advocates — to use an all round approach when listening to problems faced by Singaporeans instead of focusing on a narrow topic.

Figure 8 showing text relating to positive_emotion

Looking at texts related to the lexical category positive emotion, some keywords picked up are keep, reassurance, wish, hope, and great. Unlike in the lexical category negative emotion, the topics discussed in the above data appear to be words of encouragement and hope. Linking to the government campaigns above proposed in the Sentiment Analysis section, this reinforces the need to promote positive talk on social media as exposure to positive talk may lift the mood and morale of social media users.

However, one limitation of empath and sentiment analysis is the inability to understand context. For instance, sentences like “The protocols initiated by the government are not bad” can be classified as a negative sentiment as “bad” is analyzed. This emphasizes the need to use explainable models as it allows for performance and sanity checks, which ensures that the output results make sense. That said, given the overall satisfactory performance, analysts should not be deterred to use VADER or Empath as it is a good starting point for data exploration to gain better intuition of the data.

Conclusion

In this project, data exploration for text-related data is important because it helps the project members gain better intuition of the data, ask better questions, and generate findings that are relevant to readers that can be delivered given the data. In general, the findings show that during the pandemic period since last year, negative sentiments run high across news articles, tweets, and Reddit posts. Interestingly, there are periods where positive sentiments rose and it is attributed to festivities or social media campaigns that promote positive speech on the web. The Tableau Dashboard summarizes the final outcome of this project, and you can click on this link to view the dashboard.

Gif preview of the dashboard.

That said, this project is not without its limitations and shortcomings. For instance, VADER, the sentiment analyzer, may not be sophisticated enough to understand the context of the sentence before generating a sentiment score. Also, plotting the frequencies of Empath categories over time can help us understand better the changes in sentiment and topics of the textual data over time (e.g pre-covid vs lockdown or the before/after-effects of the circuit breaker).

Overall, the project has provided a great opportunity to understand how different steps of a data analytics project come together to understand a problem. Many of us were new to open source projects and were shy to speak up at first. However, thanks to the leaders, conversation initiators and active collaborators the project was completed smoothly. Our leader once told us during the briefing that the amount of experience and lessons we would learn depended on our amount of contribution. Reflecting back, we utterly agree with that. Lastly, we would also like to thank Omdena as a platform that allows us to apply our skills, learn from each other and contribute for social good.

--

--