Analyzing conversations by using transformers

Luis Carlos Manrique Ruiz
The Zeals Tech Blog
5 min readMay 6, 2022

The Covid-19 pandemic sprawling around the world made relevant changes in our society. One of these is the possibility of working remotely while fulfilling the expectations of stakeholders, clients, and investors. For that reason, companies implemented different software for maintaining and improving communications. Software such as Google Meet, Microsoft Teams, and Slack are examples of this type of technology.

Zeals is a company where many foreigners work in the development department, and the common language is English. However, the primary language remains Japanese in other departments, such as sales, business, etc.

Sometimes, there are long discussions that involve members from different teams. These conversations contain needs and expectations that need to be addressed. However, companies or members don’t have much time to interact face-to-face, and the interactions or conversations in different channels and using various software could be easily forgotten in oblivion.

For that reason, the goal of the present article is to introduce the development of a conversation held in an internal channel by using Natural Language Processing (NLP) techniques.

Let’s start

As the reader may know, there are multiple conversations daily on the different communication channels. The following article introduces some of the most critical insights from a relevant discussion.

0. Preparation:

Date:

This conversation was held at the end of March of 2022.

Reason to analyze it:

  • Because it could help Zeals understand the needs, expectations, or thoughts of different departments.
  • It may help to understand feelings about specific situations from a scientific/mathematical point of view.

Length:

  • This conversation contains more than 50 interventions from members of 7 divisions in the company.

Language:

  • Since this conversation was given mainly in English, just the comments within that language are considered.

Software:

This analysis was conducted by using Python 3.9

1. Data Collection

The data is organized in a CSV file and contains some meta parameters provided by Slack. These include a message, user_id, and timestamp.

Adding further information related to departments, divisions, etc.

The information related to departments and divisions is collected from a different source, and they are matched with the corresponding member.

Merging data frames

2. Data cleansing

During the conversation, there were complex paragraphs. Those were split into different sentences, and a timestamp was assigned based on the conversation flow.

Also, stop words were dropped from the corpus. Following Dr. Ganesan’s publication:

Stop words are basically a set of commonly used words in any language, not just English.

The reason why stop words are critical to many applications is that, if we remove the words that are very commonly used in a given language, we can focus on the important words instead.

Taken from: What are Stop Words?

Stop words can be defined mainly as determiners, coordinating conjunctions, and prepositions. The reader may go deeper into their definitions by checking the references. Also, for our analysis, we use other custom words such as names.

Adding custom names and other words:

ie: name1, name2, name3,…

3. Sentiment Analysis

Sentiment analysis is a popular tool for analyzing different kinds of texts. It calculates the degree of sentiments such as negativity or positivity.

For our study, we will use transformers, and sentiment analysis functions provided by the DistilBert base uncased finetuned SST-2 model.

We used Google Colab to install and execute the models.

Installing and loading sentiment analysis function

Let’s take a look at some examples:

Example of expected results

Later we defined a threshold to identify neutral messages. In theory, it could be 0.5, but for our investigation, we defined them between 0.4 to 0.7.

Sentiment analysis over time

By our definition, negative comments are shown in red, while neutral in black, and the remaining in blue.

The analysis of this conversation shows a mean of 50.18% positive values, while 49.12% shows a negative sentiment.

Violin plot of sentiments

4. Wordcloud

To plot the word cloud, we can proceed in the following way:

The data frame called orgDf contains the information to be analyzed and shown.

General

The results from some divisions in the company are shown as follows:

Culture and experience

For Culture and experience division, some words are worth mentioning, words such as: will, result, and securities are essential for them.

Product division

For product divisions, words such as business, investors, hire, and grad are often used during the conversation.

System division

For system divisions, words such as developer, experience, graduate, career, and know were relevant in the conversation.

Summary

  • It was possible to estimate the sentiment analysis in this conversation.
  • Using word cloud is possible to understand the needs or ideas of different departments or divisions.
  • Further improvements such as lemmatization and stemming can be implemented.
  • Finally, we can analyze any conversation and extract valuable insights from it.

References

Grammatics

Transformers

--

--