Unveiling Insights from Amazon Employee Reviews: A Deep Dive into Sentiment Analysis and Topic Modeling

20 min readSep 5, 2023

Introduction

In the age of e-commerce giants, understanding employee sentiments can be as crucial as deciphering customer preferences. With vast datasets at our disposal, we stand at the brink of a data-driven revolution. Today, we embark on a journey into the world of Amazon Employee Reviews, armed with the task of developing a system that can unravel the intricate tapestry of emotions and topics hidden within.

A Brief Overview of the Task: Working on Amazon Employee Reviews Dataset

At the heart of this mission lies a dataset, “Amazon Jan 2023.csv,” waiting to reveal its tales of employee experiences, opinions, and perspectives. Our task is clear: to develop a system that can discern not only the positive, neutral and negative sentiments lurking within these comments but also to untangle the web of topics that occupy the thoughts and discussions of Amazon’s dedicated workforce.

The Significance of Sentiment Analysis and Topic Modeling in the Corporate World

Before we dive into the specifics of our endeavor, let’s pause to reflect on the broader significance of sentiment analysis and topic modeling in today’s corporate world. These two powerful tools have transformed the way organizations perceive and engage with their employees.

Sentiment analysis is the art of decoding emotions from text, provides invaluable insights into employee morale, job satisfaction, and overall sentiment. It allows organizations to address concerns, boost employee engagement, and ultimately foster a more positive work environment.

Topic modeling, on the other hand, goes beyond sentiment, helping organizations identify the key subjects that occupy the minds of their workforce. By understanding these topics, companies can tailor their strategies, policies, and communication to align with employee interests and concerns.

In the pages that follow, I will unravel the techniques and methodologies employed in sentiment analysis and topic modeling. We will witness firsthand the transformation of raw data into actionable insights, with each comment serving as a window into the collective consciousness of Amazon’s employees.

Join me on this enlightening journey as I navigate the Amazon Employee Reviews dataset, decode sentiments, and unveil the topics that shape the experiences of those who power one of the world’s leading e-commerce giants. Welcome to the intersection of data and corporate understanding.

Data Understanding

Installing Dependencies: To embark on this analytical journey, we first needed to ensure we had the necessary tools in our toolbox. We installed key dependencies, including bertopic and transformers, to facilitate our data analysis.

2. Importing Dependencies: Our analytical toolkit comprised an array of libraries, from data manipulation to natural language processing (NLP) and machine learning. Here’s a glimpse of some of the critical dependencies we imported:

3. Reading the Data: The cornerstone of our analysis was the Amazon Employee Reviews dataset, aptly named “Amazon Jan 2023.csv.” We loaded the dataset into our working environment for exploration:

4. Data Overview: Before delving into the data itself, we conducted a quick review to understand its structure:

df.info(): This revealed that our dataset consisted of a single column, "Comment," with data stored as objects and also shows that we have 99 rows.

No missing values were detected in our dataset, making it ready for analysis:

Exploring a Data sample: To get a feel for the data, we peeked at random comments:

With a firm foundation laid for our analysis, we were ready to dive into the world of sentiment analysis and topic modeling. Stay tuned as we unravel the hidden insights buried within Amazon Employee Reviews.

Sentiment Analysis

In our quest to decode the Amazon Employee Reviews dataset, we begin with the foundational concept of “sentiment analysis.” This powerful technique, also known as opinion mining, plays a pivotal role in understanding the emotional tone and context of text data. Let’s dive into the heart of sentiment analysis and explore why it’s a crucial component of our analysis.

Understanding Sentiment Analysis
Sentiment analysis is the art of harnessing the power of language to uncover the emotions, opinions, and attitudes expressed in text. At its core, it seeks to answer a fundamental question: Is the text expressing a positive, negative, or neutral sentiment?

Positive Sentiment: When a comment exudes positivity, it typically reflects satisfaction, contentment, or enthusiasm. In the context of Amazon Employee Reviews, positive sentiments could signal a happy and fulfilling work experience.
Neutral Sentiment: A neutral sentiment implies a lack of strong emotion or opinion. These comments often fall into the realm of objective statements or straightforward descriptions, without leaning heavily towards positivity or negativity.
Negative Sentiment: Conversely, negative sentiments convey dissatisfaction, disappointment, or criticism. In the dataset, these comments might highlight areas of concern or dissatisfaction with Amazon's work environment or policies.

The Importance of Sentiment Analysis
Why is sentiment analysis so crucial, especially in the corporate world? Here are a few key reasons:

Employee Engagement: Sentiment analysis can be a powerful tool for gauging employee engagement and job satisfaction. By analyzing employee feedback, organizations can identify areas for improvement and take proactive measures to enhance workplace conditions.
Customer-Centric Approach: For customer-centric businesses like Amazon, employee sentiment can have a direct impact on customer experiences. Satisfied employees are more likely to deliver exceptional service, which in turn benefits customers.
Risk Mitigation: Early detection of negative sentiments can help organizations address potential issues before they escalate. This proactive approach minimizes risks associated with employee turnover and disengagement.

As we move forward in our analysis of Amazon Employee Reviews, sentiment analysis will serve as our compass, guiding us through the emotional landscape of employee feedback. With each comment, we’ll uncover the underlying sentiments that shape the narratives within the dataset.

Sentiment Analysis with RoBERTa: Unveiling Employee Sentiments

In the realm of Natural Language Processing (NLP), RoBERTa stands as a powerful ally, enabling us to extract sentiment insights from Amazon Employee Reviews with precision. Let’s dissect the code step by step to understand how this process unfolds.

Load Model and Tokenizer: Our journey begins by tapping into the capabilities of the “cardiffnlp/twitter-roberta-base-sentiment” model. To do this, we load both the model and its corresponding tokenizer, which will help us preprocess and analyze the text data:

Generating Sentiment Scores: Now, we’re ready to dive into the heart of sentiment analysis. We define a function, polarity_scores_roberta(comment), which takes a comment as input and returns sentiment scores. These scores represent the likelihood of the comment falling into each sentiment category (Negative, Neutral, Positive).

The code defines a function called ‘polarity_scores_roberta’ that takes a comment as input. Within the function, the `tokenizer` is used to encode the `comment` and return the encoded tensor.
The `model` is then used to generate sentiment scores based on the encoded tensor. The function uses the model to get outputs, applies `softmax` function to the outputs along the specified axis, and then detaches the outputs as numpy array.

Mapping Sentiment Labels: Next, we define a function, analyze_sentiment(comment), which takes a comment as input and returns the corresponding sentiment label (Negative, Neutral, Positive). Here's how it works:

The code first defines a function called `analyze_sentiment` that takes in a single comment as input. Inside the function, the comment is passed to a tokenizer along with some additional parameters. The tokenizer returns tokenized inputs that are suitable for input to the model.
Next, the model is called with the tokenized inputs and the `.logits` attribute is used to obtain the raw output scores. These scores are then used to determine the sentiment by comparing the index of the highest score with a condition. If the index is 1, the sentiment is considered `Neutral`. Otherwise, it is considered `Positive`.
The `analyze_sentiment` function returns the sentiment as the variable scores.

With these functions in place, we’ve harnessed the power of RoBERTa to analyze employee sentiments in the Amazon Employee Reviews dataset. The “Sentiment” column in our data now provides clear insights into whether each comment expresses a Negative, Neutral, or Positive sentiment.

Results

After applying RoBERTa-based sentiment analysis to the Amazon Employee Reviews dataset, we were able to uncover the emotions and sentiments hidden within the comments. Let’s take a closer look at some of the results to gain a deeper understanding of our findings.

Positive Sentiment: In the world of employee reviews, positivity often reflects job satisfaction and contentment. Let’s examine a comment that was classified as “Positive” and explore the sentiment scores associated with it:

Neutral Sentiment: In the realm of employee feedback, neutrality often indicates a balanced and objective perspective. Let’s explore a comment classified as “Neutral” and delve into the sentiment scores associated with it:

Overall Sentiment Distribution: To gain a comprehensive overview of the sentiments expressed in the entire dataset, let’s take a glance at the sentiment distribution:

One observation that stands out prominently is the significant volume of negative sentiments. As we delved into the dataset, we encountered a substantial number of comments expressing discontent and criticism. While Amazon’s reputation as an e-commerce giant is well-established, these negative sentiments serve as a reminder that even industry leaders must continuously strive to address concerns and create a supportive work environment.
The presence of negative sentiments should not be seen solely as a challenge but as an opportunity for growth and improvement. They provide organizations like Amazon with valuable feedback, highlighting areas that require attention. It’s essential to view these sentiments as constructive criticism, an invaluable resource for shaping a brighter future for both the company and its employees.

Our journey doesn’t end here. Armed with sentiment analysis, we’ll now venture into the realm of topic modeling, where we unravel the subjects that occupy the minds of Amazon’s dedicated employees. Stay tuned for the next chapter in our data-driven exploration.

Topic Modeling

In our quest to delve deeper into the Amazon Employee Reviews dataset, we've ventured into the fascinating world of topic modeling. This technique allows us to uncover the underlying themes and subjects that occupy the minds of Amazon's dedicated employees. We explored two distinct approaches: Latent Semantic Analysis (LSA) and BERTopic, each offering unique insights into the dataset.

What is Topic Modeling?

At its core, topic modeling is a machine learning technique designed to uncover the underlying themes, subjects, or topics within a collection of text documents. It’s a data-driven approach that goes beyond simple keyword analysis, delving into the nuances of language to identify patterns and group related words and phrases into coherent themes.
Think of topic modeling as a magnifying glass for text data. It helps us unearth the latent structures within a corpus of documents, revealing the key discussions, ideas, or subjects that people are talking about. These topics are not predefined; they emerge organically from the data, making topic modeling a versatile tool for uncovering hidden insights.

Why is it Valuable in Understanding Employee Feedback?

Understanding employee feedback is a vital aspect of organizational development and growth. Employee sentiments and opinions can provide a treasure trove of insights that help companies enhance their work environment, policies, and overall employee satisfaction. This is where topic modeling steps in as an invaluable ally:

Uncovering Themes: Employee feedback is often a diverse mix of thoughts, concerns, and suggestions. Topic modeling helps us categorize this feedback into meaningful themes or topics. For example, it can reveal whether employees are discussing work-life balance, compensation, management, or other critical aspects of their experience.
Quantifying Sentiments: Beyond identifying topics, topic modeling allows us to quantify the prevalence of each theme within the dataset. We can gauge the frequency and distribution of positive, negative, or neutral sentiments associated with each topic. This quantitative insight aids in prioritizing areas of improvement.
Actionable Insights: By understanding the prevalent topics and sentiments, organizations can take concrete actions to address concerns and enhance positive aspects of the workplace. Whether it’s optimizing work hours, improving management practices, or enhancing benefits, topic modeling guides strategic decision-making.

As we embark on our journey through topic modeling, we will witness firsthand how this technique transforms raw text data into structured insights, enabling us to navigate the intricate landscape of employee feedback within the Amazon Employee Reviews dataset.

Preparing Text Data for Topic Modeling

Before we can embark on our journey of uncovering topics within employee feedback, we need to ensure that our text data is in optimal shape for analysis. This entails a series of crucial preprocessing steps to clean and tokenize the text, making it ready for topic modeling.

Cleaning and Tokenizing Text Data
To prepare the text data, we’ve developed a comprehensive preprocessing function that performs the following key tasks:
1. Lowercasing: The text is converted to lowercase to ensure consistency and avoid case-related variations.
2. Punctuation Removal: Punctuation marks, which do not typically carry meaningful information, are removed. This step streamlines the text and ensures that punctuation doesn’t interfere with topic modeling.
3. Tokenization: We employ the TweetTokenizer, a specialized tokenizer, to split the text into individual words or tokens. Unlike standard tokenization methods, this tokenizer is well-suited for social media and informal text, capturing the nuances of language effectively.
4. Stopword Removal: Common stopwords such as “the,” “and,” “is,” and more are often filtered out. These words are frequently used but contribute little to the overall meaning of the text. We also extend this list with custom stopwords that are specific to our analysis.
5. Lemmatization: Lemmatization is applied to reduce words to their base forms. This ensures that words with different inflections (e.g., “running” and “ran”) are treated as the same word, reducing redundancy and aiding in topic identification.
6. Rejoining Tokens: After the cleaning and processing steps, the tokens are joined back together to form coherent, preprocessed text.

Now, let’s take a look at the code that implements these text preparation steps:

With our text data now cleaned and tokenized into a more structured and coherent format, we’re well-equipped to proceed with our topic modeling endeavors. In the next sections, we’ll explore the intricacies of Latent Semantic Analysis (LSA) and BERTopic, which will help us unveil the thematic threads woven within Amazon’s employee reviews.

Implementing Topic Modeling

In this section, we will dive into the practical aspects of implementing topic modeling, a crucial step in unveiling the underlying themes and subjects within Amazon employee reviews. We’ll explore two powerful techniques: Latent Semantic Analysis (LSA) and BERTopic. Let’s get started.

Topic Modeling using Latent Semantic Analysis (LSA)

Step 1: TF-IDF Vectorization
Our journey begins with transforming the preprocessed text data into a numerical format suitable for LSA. We use TF-IDF (Term Frequency-Inverse Document Frequency) vectorization, a common choice for text analysis. Here’s how we did it:

Step 2: Finding Optimal K (Number of Topics)
To determine the optimal number of topics (n_components), we employ the "Explained Variance" method. This involves calculating explained variances for a range of n_components values and plotting the results. The ideal n_components value is where the explained variance starts to level off or reaches a point of diminishing returns:

The graph helps us decide the optimal number of topics for LSA. The graph was kinda hard to decipher so we would tried different n_components to see what works

Step 3: Applying LSA
Now, we apply LSA with the chosen n_components to extract topics from the TF-IDF matrix. We also print the generated topics and their associated keywords:

Step 4: Organizing Topic Assignments and Labeling
To make the topic assignments more comprehensible and informative, we create a structured DataFrame that labels each review with its assigned topics based on the highest weight. Here’s how we achieve this:

While LSA has provided us with valuable insights by generating topics based on our initial assumption of the number of components, it’s essential to confirm and validate these topics using an alternative approach. To achieve this, we will turn to BERTopic.

In the next section, we will explore BERTopic, another robust topic modeling technique, to verify and potentially refine our topic assignments. This additional step will ensure the accuracy and reliability of our topic modeling results.

Topic Modeling using BERTopic

BERTopic leverages BERT (Bidirectional Encoder Representations from Transformers), a state-of-the-art language model, to generate rich contextual embeddings for words and documents. This enables BERTopic to capture intricate relationships and meanings within the text, resulting in more accurate topic assignments.
It integrates `UMAP` for dimensionality reduction, further enhancing its ability to visualize and interpret high-dimensional data. This combination of BERT-based embeddings and UMAP-driven visualization makes BERTopic a powerful choice for topic modeling tasks involving text data like comments, reviews, and articles.

Step 1: UMAP Dimensionality Reduction and BERTopic
For BERTopic, we begin by reducing the dimensionality of the text data using UMAP. Then, we apply the BERTopic model. Here’s how it’s done:

UMAP(Uniform Manifold Approximation and Projection) is a dimensionality reduction technique that’s often used in combination with `BERTopic` for visualization purposes. It’s used to reduce the high-dimensional vector representations generated by BERT embeddings into a lower-dimensional space while preserving the underlying structure of the data as much as possible.

Step 2: Exploring Topics
After applying BERTopic, we explore the extracted topics. We can visualize the top keywords for each topic and gain insights into the thematic content:

Step 4: Topic Assignments and Selection
After successfully applying BERTopic, we proceed to assign topics to each review and select the most relevant columns for analysis. Here’s how it’s done:

With these final steps, we have successfully assigned topics to each review using BERTopic. The selected columns now offer a more detailed insight into the nature of these topics, making it easier to analyze and interpret the results.

Choosing the Right Model: LSA vs. BERTopic

When it comes to topic modeling, selecting the right technique is crucial for obtaining meaningful insights from your data. In your analysis of Amazon employee reviews, you explored two prominent approaches: Latent Semantic Analysis (LSA) and BERTopic. Each of these methods has its own set of advantages and drawbacks, and understanding their pros and cons can help you make an informed choice.

Latent Semantic Analysis (LSA)

Pros:
1. Simplicity: LSA is relatively straightforward to implement. It involves techniques such as TF-IDF vectorization and Singular Value Decomposition (SVD), making it accessible to those new to topic modeling.
2. Interpretability: LSA provides easily interpretable results. Each topic consists of a list of words that are strongly associated with that topic, allowing for straightforward topic labeling and interpretation.
3. Scalability: LSA can handle large datasets with ease, making it suitable for analyzing extensive collections of text documents.

Cons:
1. Limited Context: LSA’s ability to capture the context and semantics of words and phrases is limited. It primarily relies on term-frequency statistics and may not capture more complex linguistic relationships.
2. Topic Overlap: LSA topics can overlap, resulting in less distinct and clear-cut topics. This can make it challenging to assign unique interpretations to each topic.
3. Hyperparameter Tuning: Determining the optimal number of topics (n_components) in LSA can be challenging and often involves manual tuning or heuristic approaches.

BERTopic

Pros:
1. Contextual Understanding: BERTopic leverages BERT embeddings, a state-of-the-art language model. This enables it to capture rich contextual information and intricate relationships between words, resulting in more accurate and context-aware topics.
2. Distinct Topics: BERTopic tends to produce more distinct and semantically meaningful topics. It can identify subtle differences in the meaning of words and phrases, leading to clearer topic assignments.
3. Dimensionality Reduction: BERTopic incorporates dimensionality reduction techniques like UMAP, which helps visualize high-dimensional data in a lower-dimensional space while preserving important patterns.

Cons:
1. Complexity: BERTopic is more complex to implement compared to LSA. It requires pre-trained models, such as BERT, and may involve hyperparameter tuning for UMAP.

Choosing the Right Model
The choice between LSA and BERTopic ultimately depends on one’s specific objectives, dataset characteristics, and the level of interpretability you require. If you prioritize simplicity, scalability, and a basic understanding of topics, LSA may be a suitable choice. However, if you aim for more nuanced, context-aware topics with the ability to handle complex language patterns, BERTopic is the way to go.

In our analysis, we wisely combined both techniques to cross-validate results and ensure the robustness of our findings. This approach showcases the commitment to rigor and dedication to extracting valuable insights from the Amazon employee reviews dataset.

In the world of topic modeling, there’s no one-size-fits-all solution. The right choice depends on the specific nuances of your data and research goals. By leveraging the strengths of both LSA and BERTopic, we’ve demonstrated a comprehensive and insightful approach to understanding employee feedback.

Interpreting Topic Results

Interpreting and labeling the identified topics in topic modeling is a critical step to make sense of the underlying themes within employee feedback. Here, we’ll examine the topics generated by both LSA and BERTopic and provide interpretations for each one. Assigning topics to employee feedback can help you understand the predominant concerns and sentiments expressed by Amazon employees.

LSA Generated Topics:

Topic 1: Work-Life Balance
Keywords: work, good, hour, great, time, working, Amazon, job, life, place
Interpretation: This topic reflects positive sentiments related to the work environment at Amazon. Employees in this category seem to appreciate the overall atmosphere, work hours, and the company as a workplace.

Topic 2: Long Hours and Draining Work
Keywords: long, hour, draining, break, become, day, personally, gap, short, week
Interpretation: Employees in this topic express concerns about long and draining working hours, emphasizing the need for breaks and work-life balance improvements.

Topic 3: Balanced Work-Life Culture
Keywords: life, culture, work, good, great, place, balance, balanced, overtime, everything
Interpretation: This topic highlights a positive work-life culture, where employees appreciate the balance between work and personal life, as well as the overall positive aspects of the workplace.

Topic 4: Salary and Pay Concerns
Keywords: salary, good, low, little, competitive, manager, culture, pay, peer, day
Interpretation: Employees in this category raise concerns about salary and pay, indicating a mix of positive and negative sentiments regarding compensation and management.

Topic 5: Employee Well-being and Satisfaction
Keywords: care, long, job, dont, salary, low, great, draining, everything, employee
Interpretation: This topic focuses on employee well-being and satisfaction, with mentions of concerns related to long hours, salary, and the overall work experience.

Topic 6: Overtime and Work-Life Balance
Keywords: balance, life, overtime, mandatory, people, time, salary, led, enjoy, excellent
Interpretation: Employees in this topic discuss the balance between overtime and work-life, highlighting both positive and negative aspects, including mandatory overtime.

Topic 7: Pay and Job Satisfaction
Keywords: job, pay, care, amount, good, week, one, manager, time, worst
Interpretation: This topic centers around pay and job satisfaction, with mentions of pay amounts, managerial aspects, and overall job experiences.

Topic 8: Employee Benefits and Paid Leave
Keywords: employee, break, care, day, paid, place, week, salary, low, would
Interpretation: Employees in this category discuss employee benefits, paid leave, and their experiences in the workplace, including salary-related concerns.

Topic 9: Shifts and Work Culture
Keywords: time, great, shift, everything, low, people, pay, culture, feel, perk
Interpretation: This topic explores shift-related aspects, work culture, and employee perceptions, including mentions of low pay and perks.

Topic 10: Learning and Development Opportunities
Keywords: learn, management, place, fun, employee, run, flexible, awful, opportunity, work
Interpretation: This topic focuses on opportunities for learning and development, including management, workplace culture, and the overall work experience.

Interpreting BERTopic Results:

BERTopic 0: Work Environment and Management
Keywords: work, time, management, good, day, hour, job, break, place, like
Interpretation: This topic appears to revolve around aspects related to the work environment, time management, and daily job experiences. Employees in this category discuss the quality of work, time-related concerns, and the overall atmosphere at Amazon.

BERTopic 1: Team and Company Experience
Keywords: amazon, working, work, team, loved, ive, thing, ppl, great, people
Interpretation: This topic seems to focus on employees’ experiences working with teams and their overall impressions of the company. It includes mentions of teamwork, positive sentiments, and people-related aspects.

The Correlation Between LSA Generated Topics and BERTopic

Analyzing the correlation between the topics generated by Latent Semantic Analysis (LSA) and BERTopic can provide valuable insights into the differences in their approaches and outcomes.

Why BERTopic Generated Fewer Topics Than LSA

Model Complexity: BERTopic is based on state-of-the-art language models like BERT (Bidirectional Encoder Representations from Transformers). These models are highly complex and capture intricate relationships within the text. However, their complexity can sometimes result in a more consolidated representation of topics. In contrast, LSA is a simpler technique that relies on matrix factorization and may lead to a finer-grained division of topics.
Dimensionality Reduction: BERTopic uses Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction. UMAP aims to preserve the overall structure of data while reducing its dimensionality. Depending on the UMAP parameters chosen, it may prioritize maintaining global structure over local variations, resulting in fewer, more consolidated topics.
Text Embeddings: BERTopic leverages BERT embeddings, which capture the context and semantics of words and phrases exceptionally well. These embeddings tend to group similar words and concepts together, which can result in fewer distinct topics.

In summary, the difference in the number of topics generated by BERTopic and LSA is primarily due to the adaptability and data-driven nature of BERTopic, which can result in a variable number of topics based on the characteristics of the dataset. It’s essential to choose a topic modeling approach that aligns with the goals and nature of the data analysis.

Utilizing the Results: Improving Employee Satisfaction and Engagement at Amazon

The insights gained from sentiment analysis and topic modeling of employee reviews on Amazon can be invaluable for Amazon in several ways. Leveraging these findings can lead to substantial improvements in employee satisfaction and engagement, ultimately benefiting both the company and its workforce. Here are some practical applications:

Identifying Pain Points: By analyzing the sentiments expressed in employee reviews, Amazon can pinpoint specific areas where employees are dissatisfied or facing challenges. These pain points can range from work-related issues to concerns about work-life balance, compensation, or management.
Topic-Based Action Plans: With the results of topic modeling, Amazon can categorize employee feedback into distinct topics. For example, topics related to work-life balance, compensation, and company culture can be identified. This allows Amazon to create targeted action plans for each topic.
Tailored Interventions: Armed with a deeper understanding of the concerns and sentiments of different employee segments, Amazon can develop tailored interventions. For instance, if certain groups of employees express dissatisfaction with work hours, Amazon can explore flexible scheduling options.
Employee Wellness Initiatives: Understanding sentiments related to work-life balance and stress can inform Amazon’s initiatives related to employee well-being. Offering wellness programs, mental health support, and stress management resources can help improve the overall employee experience.
Feedback Loops: Establishing continuous feedback loops with employees based on the findings can foster a culture of open communication. Amazon can encourage employees to provide feedback regularly and use this input for ongoing improvements.
Measurement and Evaluation: Regularly measuring sentiment and tracking progress on identified action plans is crucial. Amazon can use key performance indicators (KPIs) to evaluate the effectiveness of its initiatives and make data-driven adjustments as needed.

In conclusion, the results obtained from analyzing employee reviews using sentiment analysis and topic modeling can serve as a roadmap for Amazon to enhance its workplace environment, address employee concerns, and ultimately boost employee satisfaction and engagement. By actively listening to its employees and taking meaningful actions based on their feedback, Amazon can create a more positive and fulfilling work experience for its workforce, leading to increased productivity and long-term success.

Conclusion

In conclusion, our journey through sentiment analysis and topic modeling has provided valuable insights into understanding employee feedback using the Amazon Employee Reviews dataset. We’ve explored two powerful techniques: Latent Semantic Analysis (LSA) and BERTopic, each offering its own set of advantages and limitations. Here are the key takeaways from our exploration:

Understanding Employee Feedback: Employee feedback is a crucial component of HR management. Sentiment analysis and topic modeling offer effective ways to process and derive meaningful insights from this data.
Sentiment Analysis: We used sentiment analysis to classify comments as positive, neutral, or negative. This technique allows organizations to gauge overall employee sentiment, identify areas of concern, and take proactive measures to improve workplace satisfaction.
LSA vs. BERTopic: We compared two topic modeling approaches. LSA is effective at uncovering topics within the data but requires manual selection of the number of topics. BERTopic, on the other hand, automatically identifies topics but may generate fewer topics due to its complexity.
Topic Interpretation: Both LSA and BERTopic generated topics that reflect various aspects of employee experiences, including work-life balance, work hours, company culture, salary, and job satisfaction. These topics provide valuable insights into the factors that influence employee sentiment.
Potential Impact in HR: Sentiment analysis and topic modeling have immense potential in HR management. By analyzing employee feedback, organizations can enhance employee engagement, address pain points, and make data-driven decisions to create a more positive and productive workplace.

In today’s competitive business landscape, understanding and acting upon employee feedback are essential for maintaining a motivated and productive workforce. Sentiment analysis and topic modeling empower organizations to gain deeper insights into employee sentiments and perceptions, leading to more informed HR strategies and a stronger, more engaged workforce.

As we conclude this journey, we encourage organizations to embrace the power of sentiment analysis and topic modeling in HR to foster a workplace culture where employees feel valued, heard, and motivated to excel. By doing so, organizations can not only improve their bottom line but also create a more positive and fulfilling work environment for their most valuable asset: their employees.

References

BERTopic Github repo[MaartenGr/BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics. (github.com)]
Analytics Vidhya [https://www.analyticsvidhya.com/blog/2018/10/stepwise-guide-topic-modeling-latent-semantic-analysis/]
Topic Modeling with Latent Semantic Analysis by Aashish Nair [https://towardsdatascience.com/topic-modeling-with-latent-semantic-analysis-58aeab6ab2f2]
Egger, R. and Yu, J. (2022). A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts. Frontiers in Sociology, 7. doi:https://doi.org/10.3389/fsoc.2022.886498.
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI Blog, 1(8), 9.