What are the people saying about remote working? A study of Twitter communities.

Written By: Nesreen El-Rayes, Francis Zhang, Roshani Bharati

Source: Creative Review

More and more people are forced to work from home after the pandemic started. Remote work gives employees the opportunity to work outside the traditional office environment. Though remote work style was embraced occasionally in some companies earlier, the pandemic is making it mainstream and there is still a debate on whether this work style is here to stay for a long time. The long term impact of remote work style on companies and employees are still to be studied. With this scenario going around, it is interesting to see what people think about this new shift in the work paradigm. The experience and opinion of the people about the remote work setting might also give managers ideas about the pros and cons of making work remote. Social media data gives us an opportunity to peek into the mind of the people. Therefore, our project entails analyzing user opinions related to remote working using social media data.

Data Collection and preparation

Twitter, being one of the popular social networking platforms, we decided to get the tweets from the Twitter users for our study. The tweets are the unedited raw opinion of the users which can give us direct access to their experience and thought process. For the analysis, we obtained data from Twitter API using NodeXL pro. Tweets related to the five hashtags: #remoteworking #remotework #workfromhome #workingfromhome and #wfh were collected.

We retrieved 10K tweets from each of the 5 hashtags listed above. NodeXL returns tweets over a range of time but gives higher weight to the recent dates. We merged all the data for different hashtags in one excel file manually and saved it into a CSV file. Table 1 shows the breakdown of counts before cleaning or merging.

Table 1: Nodes and Vertices by Hashtag

The data collected from NodeXL pro had some duplicate samples and the samples in other non-English languages. We removed those samples. After merging and removing duplicates and cleaning, we had 27702 vertices and 96942 edges. For the network relationship, any edge between the same two vertices that represent a relationship (Tweet or retweet or mention) was merged if more than one edge falls between the same two vertices and has the same exact timestamp. In the merged CSV file for all hashtags, we had the id and screen name of user1, user2, the text of the tweets, relationship, date, time, and location of the tweet.

Network Analysis

The figure1 below shows the overall network structure of the tweets that we had collected.

Figure 1: Overall Network Structure

Upon further analysis, we found that there are16 Nodes that are the top influencers in terms of in-degree, Out-Degree, and Centrality. We defined influencers as nodes with 300 or more in-degree. Surprisingly, our network has many public figures(Famous people) who were expected to be categorized as influencers but their Page rank, In-degree, and centrality scores are relatively low scores.

Figure 2: Top Influencers

Our network is also composed of 334 nodes with 1 million followers or more. Those nodes were classified in our analysis as famous nodes (includes Organizations and people). All the edges representing the different relationships were kept. The figure below shows the network for famous nodes with respect to the whole network and the edges connected to them. The nodes with Images are the ones with more than 2 million followers.

Figure 3: Famous nodes with respect to the Network

Figure 4 shows the network structure surrounding 3 of the influencer’s nodes(more than 300 in-degrees) we had identified. The rest of the nodes had a similar network structure to the one shown below.

Figure 4: Three Topologies from the top 10 nodes by indegrees

We further analyzed the famous node in the network to understand what they are talking about. Following are the key hashtags and words that appeared together in the tweets by those nodes.

Table 2: Pair of words that appeared in Famous nodes network

We can see that the famous nodes that appeared in the network were mostly talking about the future of work in relation to artificial intelligence, machine learning, digital transformation, the internet of things, and the corresponding skills required. From it, we can infer that concerns about technological development and employee skills development are on rising along with the popularity of the remote work setting.

Community Detection

There are three widely used community detection algorithms. Clauset-Newman-Moore, Wakita-Tsurumi, and Grivan-Newman.

Newman and Girvan remove edges harshly based on betweenness, but it was not applicable to our network as it works better with a small network, and our data set is a large one. Clauset-Newman-Moore did reflect meaningful diagrams when looking at all the groups together. Wakita-Tsurumi worked best in our case, and the clustering was calculated based on that technique.

Figure 5: Top 10 Communities/Groups(Same Colors representing groups as in Figure 6)

Figure 5 shows the network structure of the top 10 groups that we extracted from the data. The colors used are the same as the colors in the Grid shown in Figure 6. Figure 6 shows the top 10 groups according to their size in the network, and the numbers in the left bottom corners reflect the number of vertices in each group. A total of 6755 nodes are in the Top 10 groups. For each one of these communities, we analyzed the top items to find the common theme as the algorithm does not provide that information. Furthermore, we extracted the top tweeters and mentioned nodes in each group.

Figure 6: Top 10 Groups by size

Group description

Group 1: top ten tweets that were retweeted in total (424 times) were related to technologies needed for next wave globalization and the future of work. (Some of which: Digital Transformation, AI, 4th Industrial revolution )

Group 2: top ten tweets that were retweeted in total (207 times) were grouped based on users promoting Remote work or working from anywhere through sharing articles, tips, or research results(They did not share their opinion), but the titles of the articles shared reflect the favor to that setup.

(Examples: lessons learned from CEOs from working anywhere, an invitation to conferences to discuss remote working, handling security and compliance with remote work, firms that offer helping organizations to the switch to remote setup)

Group 3: top ten tweets that were retweeted in total (297 times) were related to robotics, and using technologies to foster remote working.

Group 4: top ten tweets that were retweeted in total (305 times) were related to listing tools or technologies collaboration, but they are very generic.

Group 5: top ten tweets that were retweeted in total (572 times) were related to the user's opinions about working from home. (Some of which: life balance, flexibility, burnout, loneliness, Ergonomic issues, mental health, commuting) (190 out of 572)33% of the tweets and retweet count reacted to liking the remote work setup. While (158 out of 572), approx 28% agree that each setup has pros and cons, but it should be up to the employees to decide the setup that helps them be productive.

Group 6: top ten tweets that were retweeted in total (536 times) were grouped together as it had in common #pakistan, or a name that sounds Pakistani was listed.

Group 7: top ten tweets that were retweeted in total (65 times) were related to employees’ reaction to remote work and employer’s expectations, the flexibility of working, maintaining work-life balance, sustainable business, and calling business to change.

Group 8: top ten tweets that were retweeted in total (42 times), and it reflected the skills needed to handle remote work. (Including but not limited to: motivation, meetings management, organizing workflow, communication, and building relationships with other team members )

Group 9: top ten tweets that were retweeted in total (150 times) were related to the future of work. That group looks very similar to group 1, but with more wording related to forecasting what will happen in the next years.(Some points mentioned: Cisco’s EVP in the year 2025, Intelligent workplaces, multi-generation workforce and if companies are ready, predictions for rations of remote workers )

Group 10: top ten tweets that were retweeted in total (67 times) were related to books and tutorial posts related to working from home.

Figure 7: Three Communities: G2, G5, G7

Based on the characteristics of the tweets related to each group, groups 2,5, and 7 had tweets related to the user opinion and experience about remote work setting. Therefore, we focused our further analysis on these groups.

Content Mining

For content mining, some exploration of the tweets was done first. It was found that there were on average 23 words in the tweets. The minimum number of words a tweet had was 1 and the maximum number of words was 93.

Table 3: Tweet text descriptive statistics

There were about 13,447 unique hashtags used in the tweets along with the hashtags that we used to collect the tweets. Figure 7 shows the top 20 popular hashtags that appeared with the remote working hashtags in the tweets. The majority of users in the dataset were talking about the pandemic, cybersecurity, and technology when they were talking about working from home. The hashtags like #productivity, #mentalhealth, #leadership, and #worklifebalance also appeared which shows the concerns that people had during remote work.

Figure 8: Top hashtags that appeared with hashtags studied

The co-occurrence of the hashtags together in a tweet shows what people are thinking concurrently with remote working. So, hashtags in a Tweet have important connotations about the theme of the tweet. Therefore, we calculated the correlation between different hashtags based upon their co-occurrence in the tweets.

Figure 9: Correlation between Hashtags

We can see from the heatmap plot above that there was not a very high correlation among hashtags which denotes that none of the popular hashtags in the tweets were consistently being used with each other. The hashtag #futureofwork had a particularly negative correlation with working from home. This is because though the hashtag #futureofwork and #workingfromhome were the two most popular hashtags in the tweets, the hashtag #futureofwork was mostly used to denote the technological transformation surrounding the business and work rather than to signify the work from home setting.

After some basic exploration, the next step was to clean the tweets and prepare them for further modeling. First of all, it was found that not all the tweets we had extracted were people’s opinions about remote working. There were tweets about job postings for the remote work. Some people were posting their pictures and other opinions regarding the election, covid, and other news with the #workfromhome hashtag but the tweets were completely unrelated to their opinion about remote working. We removed those tweets.

Once the unrelated tweets were removed, further preprocessing on the tweets were done, for example:

  • There were URLs in the tweets which were removed.
  • The tweet also mentioned other Twitter users which were removed.
  • All the hashtags were also removed so that we do not get confusing results. To illustrate, the #Futureofwork hashtags were mostly used to denote technologies and related advancement. Therefore, if we use these hashtags, we might get false-positive results that people are referring to remote work as the future of work when they are referring to technological advancement as the future of work.
  • The next step was to remove all special characters, punctuation marks, and newline characters.
  • The words with 3 or fewer characters were also removed.
  • After that, the text in the tweets was tokenized and the stopwords were removed.
  • The words related to the remote work setting like “remote work”, “work from home”, “working from home” were also removed so that they do not influence the result as they have the highest occurrence in the text.

Topic Modeling

After the tweets were cleaned, the topic modeling was done to extract the key topics that tweets represented. The LDA (Latent Dirichlet Allocation) model was implemented using the Gensim library in python for the topic modeling. LDA considers each tweet as a collection of topics in a certain proportion and each topic as a collection of keywords again in a certain proportion. Following are the five topics extracted from the tweets:

Figure 10: LDA topic modeling output for the overall network

The five topics extracted by the LDA model capture very general keywords from the tweets. We can see that the tweets were talking about security, business, technology, opportunity, productivity, future, and mental health. To see more specific topics the Twitter users were talking about, we segregated networks into groups based upon the topic of their discussion. Among the groups, groups 2, 5, and 7 predominantly had tweets related to user opinion and activities related to remote working. So, we focused our further analysis on these 3 groups.

Figure 11: LDA topic modeling output for group 2

From the topics extracted from group 2, we can see that the users were mostly talking about sharing, help, research, report, and surveys related to remote working. It also had keywords like productivity, efficiency, growth, change, and reinvent. The people in these groups were mostly involved in sharing their new experiences, research findings, and insights to help the new remote work style.

Figure 12: LDA topic modeling output for group 5

From the topics extracted from group 5, we can see that the users were mostly talking about the stress and mental health issues surrounding remote work settings. There were also keywords like loneliness, burnout, isolation, and detach which indicates the negative consequences of the remote work setting that users were talking about. Whereas, there were also topics related to productivity, travel, and commute which denotes that users were talking about some relationship between productivity and not be required to travel. The topic of micromanagement, screening, watch and productivity also came up which indicates that people must feel less managed and watched in remote work settings than in the office.

Figure 13: LDA topic modeling output for group 7

The five topics extracted from the tweets by group 7 looked mostly positive. They were talking about productivity, lessons learned, leadership, security, environment, and health. The keywords like increase with productivity, optimal, suited, forward, balance shows that people were linking productivity increase to the remote work setting and were talking about how the setting let them work from anywhere. There were also repeated keywords about security which must be related to the cybersecurity issues related to the remote work setting.

Sentiment Analysis

When dealing with such a dataset with a large amount of text, sentiment analysis can help us to understand the polarity of the people’s opinion. In this project, we used TextBlob text processing and the NLTK sentiment analyzer module for analyzing the sentiments in the tweets. TextBlob helps us in finding the polarity of each tweet. NLTK Vader can provide us the scores of polarity so that it shows us the change of people’s sentiment over time.

Figure 14: Sentiment analysis results in the overall tweets

From figure 14, we can see that the overall sentiment in the tweets that we collected was positive, followed by neutral sentiment, and then negative. We further focused our analysis on the three groups that we identified earlier.

Figure 15: Sentiment analysis of group 2

Figure 15 shows that the tweets from group 2 were mostly positive. From the topic modeling, we found that the keywords generated for these groups mostly dealt with sharing research, infographic, experiences, and insights with each other. It also had keywords related to increase, productivity, and growth which all are confirmed to be a positive sentiment towards remote working by the sentiment analysis.

Figure 16: Sentiment analysis of group 5

Group 5 has proportionally more negative sentiment tweets than the other groups, which coincides with the result that we got from topic modeling. From topic modeling, we had found that this group discussed the topic related to mental health, loneliness, burnout, etc which indicated negative sentiment towards remote work style. They also discussed commuting and micromanagement in the office which is reduced in remote work and contributed to the positive sentiment in the tweet.

Figure 17: Sentiment analysis of group 7

For group 7, the sentiments are more neutral than polarized. It confirms the finding from the topic modeling. The keywords found from topic modeling from this group weren’t as polarized as other groups.

NLTK Vader analysis gives the sentiment score to each word in the sentence and calculates the sentiment breakdown score of the tweets over time(Positive, Negative, Neutral, and compound). It contains a compound score that is computed by summing the valence scores of each word in the lexicon and it is used to know the overall sentiment. The compound score is determined based on the classification of every single word. Because there are multiple tweets in one day, mean scores are used for line chart drawing. As we can see in the figures above, the sentiment score for all groups was high for neutral sentiment followed by the positive and then the negative. It proves that there were more positive words in the tweets than negative ones. The neutral sentiment will be highest because there are more neutral words in the sentence. For example: In the sentence “ Remote work is efficient.”, “Remote”, “work” and “is” words are neutral and the word “efficient” is positive.

Conclusion

Our analysis shows that the overall sentiment towards the remote work style was positive among the people. When it comes to remote work, the key topics people discussed were mental health, productivity, security, technologies, loneliness, work-life balance, travel, and micromanagement. There were also a group of people who were sharing researches, insights, and experiences with each other which indicates that people are still trying to explore the new work style and there are new findings to be done.

Ethics and Limitations

  • We have followed Twitter protocols on data usage for the data collection and analysis. So, there are no relevant ethical issues related to the tweets collection. However, we do not have direct consent from Twitter users to mine their opinion. Therefore, we have refrained from showing the individual tweets of users and only presented collective analysis results.
  • Some biases might arise in the analysis due to the limitation in data collection and analysis. Since there is a limitation in the data collection imposed by Twitter, we were able to collect only a fraction of tweets related to remote working from Twitter.
  • Excluding the tweets of the accounts that do not have English as a language, excluded a big portion across the different countries, and accordingly, any insights especially about the sentiment are applicable more on the U.S.
  • Other various NLP preprocessing techniques can be added to increase the accuracy of Aspect based Sentiment Analysis such as slang words and sarcasm detection, Emoji transformation, and coreference resolution.
  • In this study, we collected tweets from anyone using Twitter during the data collection period. If we were to focus our analysis on a specific group of people, for example, people from a particular organization or place, the result can be more coherent and specific.

References:

Hansen, D., Shneiderman, B., & Smith, M. A. (2010). Analyzing social media networks with NodeXL: Insights from a connected world. Morgan Kaufmann.

Topic Modeling in Python with Gensim. (2020). Retrieved 26 October 2020, from https://www.machinelearningplus.com/nlp/topic-modeling-gensim-python/

Python, B., & Bansal, S. (2020). Beginners Guide to Topic Modeling in Python and Feature Selection. Retrieved 16 November 2020, from https://www.analyticsvidhya.com/blog/2016/08/beginners-guide-to-topic-modeling-in-python/

Hutto, C.J. & Gilbert, Eric. (2015). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Proceedings of the 8th International Conference on Weblogs and Social Media, ICWSM 2014.

Kaur, Chhinder & Sharma, Anand. (2020). Twitter Sentiment Analysis on Coronavirus using Textblob.

S. Ao, “Sentiment Analysis Based on Financial Tweets and Market Information,” 2018 International Conference on Audio, Language, and Image Processing (ICALIP), Shanghai, 2018, pp. 321–326, DOI: 10.1109/ICALIP.2018.8455771.

--

--